Computer Scientists Invent an Efficient New Way to Count

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • pepin

    A probabilistic approximate DNF counter

  • I was involved with implementing the DNF volume counting version of this with the authors. You can see my blog post of it here:

    https://www.msoos.org/2023/09/pepin-our-probabilistic-approx...

    And the code here: https://github.com/meelgroup/pepin

    Often, 30% of the time is spent in IO of reading the file, that's how incredibly fast this algorithm is. Crazy stuff.

    BTW, Knuth contributed to the algo, Knuths' notes: https://cs.stanford.edu/~knuth/papers/cvm-note.pdf

    He actually took time off (a whole month) from TAOCP to do this. Also, he is exactly as crazy good as you'd imagine. Just mind-blowing.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • f0

    An implementation of the CVM algorithm for the distinct elements problem. (by tristanisham)

  • I took a crack at implementing this in Go. For anyone curious I settled for algorithm 2 as I can just use a map as the base set structure.

    https://github.com/tristanisham/f0

  • distinctelements

    A pure PHP implementation of the Distinct Elements in Streams algorithm for estimating the number of distinct elements in a set.

  • Whipped up a quick PHP version for fun:

    https://github.com/jbroadway/distinctelements

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Any way to speed up dnf tab autocomplete?

    1 project | /r/Fedora | 19 May 2023
  • Fedora 37 on USB stick with persistency?

    1 project | /r/Fedora | 1 Mar 2023
  • still don't understand why dnf metadata only downloads with 10-20KB/s speed on here. meanwhile on speedtest it's roughly 4MB/s

    1 project | /r/Fedora | 1 Feb 2023
  • Ask HN: What Next After Ubuntu?

    7 projects | news.ycombinator.com | 29 Jan 2023
  • Fedora Package Management

    4 projects | /r/Fedora | 9 Jan 2023