-
libsais
libsais is a library for linear time suffix array, longest common prefix array and burrows wheeler transform construction based on induced sorting algorithm.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
The old engineered state of art was difsufsort but now there is libsais that makes use of prefetching (would be interesting to see how both react to huge caches). As for datasets, there are many classical ones. From rough order of size: Silesia Corpus, Manzini Corpus, Pizza&Chili Corpus, Large Text Compression Benchmark Corpus, etc.
The old engineered state of art was difsufsort but now there is libsais that makes use of prefetching (would be interesting to see how both react to huge caches). As for datasets, there are many classical ones. From rough order of size: Silesia Corpus, Manzini Corpus, Pizza&Chili Corpus, Large Text Compression Benchmark Corpus, etc.