rolling-quantiles VS t-digest

Compare rolling-quantiles vs t-digest and see what are their differences.

rolling-quantiles

Blazing fast, composable, Pythonic quantile filters. (by marmarelis)

t-digest

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means (by tdunning)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
rolling-quantiles t-digest
2 9
133 1,924
- -
2.0 3.3
12 months ago 5 months ago
C Java
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

rolling-quantiles

Posts with mentions or reviews of rolling-quantiles. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-09-14.

t-digest

Posts with mentions or reviews of t-digest. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-07-21.
  • Ask HN: How do you deal with information and internet addiction?
    1 project | news.ycombinator.com | 8 Feb 2023
    > I get a lot of benefit from this information but somehow it feels shallow.

    I take a longer view to this. For example, a few years ago I read about an algorithm to calculate percentiles in real time. [0]

    It literally just came up at work today. I haven't used that information but maybe two times since I read it, but it was super relevant today and saved my team potential weeks of development.

    So maybe it's not so shallow.

    But to your actual question, I have a similar problem. The best I can say is that deadlines help. I usually put down the HN and Youtube when I have a deadline coming up. And not just at work. I make sure my hobbies have deadlines too.

    I tell people when I think something will be done, so they start bugging me about it when it doesn't get done, so that I have a "deadline". Also one of my hobbies is pixel light shows for holidays, which come with excellent natural deadlines -- it has to be done by the holiday or it's useless.

    So either find an "accountability buddy" who will hold you to your self imposed deadlines, or find a hobby that has natural deadlines, like certain calendar dates, or annual conventions or contests that you need to be done by.

    [0] https://github.com/tdunning/t-digest

  • Ask HN: What are some 'cool' but obscure data structures you know about?
    54 projects | news.ycombinator.com | 21 Jul 2022
    I am enamored by data structures in the sketch/summary/probabilistic family: t-digest[1], q-digest[2], count-min sketch[3], matrix-sketch[4], graph-sketch[5][6], Misra-Gries sketch[7], top-k/spacesaving sketch[8], &c.

    What I like about them is that they give me a set of engineering tradeoffs that I typically don't have access to: accuracy-speed[9] or accuracy-space. There have been too many times that I've had to say, "I wish I could do this, but it would take too much time/space to compute." Most of these problems still work even if the accuracy is not 100%. And furthermore, many (if not all of these) can tune accuracy to by parameter adjustment anyways. They tend to have favorable combinatorial properties ie: they form monoids or semigroups under merge operations. In short, a property of data structures that gave me the ability to solve problems I couldn't before.

    I hope they are as useful or intriguing to you as they are to me.

    1. https://github.com/tdunning/t-digest

    2. https://pdsa.readthedocs.io/en/latest/rank/qdigest.html

    3. https://florian.github.io/count-min-sketch/

    4. https://www.cs.yale.edu/homes/el327/papers/simpleMatrixSketc...

    5. https://www.juanlopes.net/poly18/poly18-juan-lopes.pdf

    6. https://courses.engr.illinois.edu/cs498abd/fa2020/slides/20-...

    7. https://people.csail.mit.edu/rrw/6.045-2017/encalgs-mg.pdf

    8. https://www.sciencedirect.com/science/article/abs/pii/S00200...

    9. It may better be described as error-speed and error-space, but I've avoided the term error because the term for programming audiences typically evokes the idea of logic errors and what I mean is statistical error.

  • Monarch: Google’s Planet-Scale In-Memory Time Series Database
    4 projects | news.ycombinator.com | 14 May 2022
    Ah, I misunderstood what you meant. If you are reporting static buckets I get how that is better than what folks typically do but how do you know the buckets a priori? Others back their histograms with things like https://github.com/tdunning/t-digest. It is pretty powerful as the buckets are dynamic based on the data and histograms can be added together.
  • [Q] Estimator for pop median
    1 project | /r/statistics | 16 Sep 2021
    Yes, but if you need to estimate median on the fly (e.g., over a stream of data) or in parallel there are better ways.
  • How percentile approximation works (and why it's more useful than averages)
    8 projects | news.ycombinator.com | 14 Sep 2021
    There are some newer data structures that take this to the next level such as T-Digest[1], which remains extremely accurate even when determining percentiles at the very tail end (like 99.999%)

    [1]: https://arxiv.org/pdf/1902.04023.pdf / https://github.com/tdunning/t-digest

  • Reducing fireflies in path tracing
    1 project | /r/GraphicsProgramming | 3 Aug 2021
    [2] https://github.com/tdunning/t-digest
  • Reliable, Scalable, and Maintainable Applications
    1 project | dev.to | 8 Apr 2021
    T-Digest
  • Show HN: Fast Rolling Quantiles for Python
    2 projects | news.ycombinator.com | 1 Mar 2021
    This is pretty cool. The title would be a bit more descriptive if it were “Fast Rolling Quantile Filters for Python”, since the high-pass/low-pass filter functionality seems to be the focus.

    The README mentions it uses binary heaps - if you’re willing to accept some (bounded) approximation, then it should be possible to reduce memory usage and somewhat reduce runtime by using a sketching data structure like Dunning’s t-digest: https://github.com/tdunning/t-digest/blob/main/docs/t-digest....

    There is an open source Python implementation, although I haven’t used it and can’t vouch for its quality: https://github.com/CamDavidsonPilon/tdigest

What are some alternatives?

When comparing rolling-quantiles and t-digest you can also consider the following projects:

timescale-analytics - Extension for more hyperfunctions, fully compatible with TimescaleDB and PostgreSQL 📈

EvoTrees.jl - Boosted trees in Julia

Folly - An open-source C++ library developed and used at Facebook.

node-faststats - Quickly calculate statistics of a running stream of data

tdigest - t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

PSI - Private Set Intersection Cardinality protocol based on ECDH and Bloom Filters

AspNetCoreDiagnosticScenarios - This repository has examples of broken patterns in ASP.NET Core applications

minisketch - Minisketch: an optimized library for BCH-based set reconciliation

tdigest - PostgreSQL extension for estimating percentiles using t-digest

Caffeine - A high performance caching library for Java

swift - the multiparty transport protocol (aka "TCP with swarming" or "BitTorrent at the transport layer")