-
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
distinctelements
A pure PHP implementation of the Distinct Elements in Streams algorithm for estimating the number of distinct elements in a set.
I was involved with implementing the DNF volume counting version of this with the authors. You can see my blog post of it here:
https://www.msoos.org/2023/09/pepin-our-probabilistic-approx...
And the code here: https://github.com/meelgroup/pepin
Often, 30% of the time is spent in IO of reading the file, that's how incredibly fast this algorithm is. Crazy stuff.
BTW, Knuth contributed to the algo, Knuths' notes: https://cs.stanford.edu/~knuth/papers/cvm-note.pdf
He actually took time off (a whole month) from TAOCP to do this. Also, he is exactly as crazy good as you'd imagine. Just mind-blowing.
I took a crack at implementing this in Go. For anyone curious I settled for algorithm 2 as I can just use a map as the base set structure.
https://github.com/tristanisham/f0
Whipped up a quick PHP version for fun:
https://github.com/jbroadway/distinctelements