Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression. Learn more →
T-digest Alternatives
Similar projects and alternatives to t-digest
-
timescale-analytics
Extension for more hyperfunctions, fully compatible with TimescaleDB and PostgreSQL 📈
-
-
Sonar
Write Clean Java Code. Always.. Sonar helps you commit clean code every time. With over 600 unique rules to find Java bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark (by CamDavidsonPilon)
-
rolling-quantiles
Blazing fast, composable, Pythonic quantile filters.
-
pyroscope
Continuous Profiling Platform. Debug performance issues down to a single line of code [Moved to: https://github.com/grafana/pyroscope]
-
tdigest
PostgreSQL extension for estimating percentiles using t-digest (by tvondra)
-
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
-
-
plurid-data-structures-typescript
Utility Data Structures Implemented in TypeScript
-
-
-
-
entt
Gaming meets modern C++ - a fast and reliable entity component system (ECS) and much more
-
AspNetCoreDiagnosticScenarios
This repository has examples of broken patterns in ASP.NET Core applications
-
ann-benchmarks
Benchmarks of approximate nearest neighbor libraries in Python
-
opentelemetry-specification
Specifications for OpenTelemetry
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
t-digest reviews and mentions
-
Ask HN: What are some 'cool' but obscure data structures you know about?
I am enamored by data structures in the sketch/summary/probabilistic family: t-digest[1], q-digest[2], count-min sketch[3], matrix-sketch[4], graph-sketch[5][6], Misra-Gries sketch[7], top-k/spacesaving sketch[8], &c.
What I like about them is that they give me a set of engineering tradeoffs that I typically don't have access to: accuracy-speed[9] or accuracy-space. There have been too many times that I've had to say, "I wish I could do this, but it would take too much time/space to compute." Most of these problems still work even if the accuracy is not 100%. And furthermore, many (if not all of these) can tune accuracy to by parameter adjustment anyways. They tend to have favorable combinatorial properties ie: they form monoids or semigroups under merge operations. In short, a property of data structures that gave me the ability to solve problems I couldn't before.
I hope they are as useful or intriguing to you as they are to me.
1. https://github.com/tdunning/t-digest
2. https://pdsa.readthedocs.io/en/latest/rank/qdigest.html
3. https://florian.github.io/count-min-sketch/
4. https://www.cs.yale.edu/homes/el327/papers/simpleMatrixSketc...
5. https://www.juanlopes.net/poly18/poly18-juan-lopes.pdf
6. https://courses.engr.illinois.edu/cs498abd/fa2020/slides/20-...
7. https://people.csail.mit.edu/rrw/6.045-2017/encalgs-mg.pdf
8. https://www.sciencedirect.com/science/article/abs/pii/S00200...
9. It may better be described as error-speed and error-space, but I've avoided the term error because the term for programming audiences typically evokes the idea of logic errors and what I mean is statistical error.
On sketches, there is a genre of structure for estimating histogram-like statistics (median, 99th centile, etc) in fixed space, which i really like. Two examples:
t-digest https://github.com/tdunning/t-digest
-
Monarch: Google’s Planet-Scale In-Memory Time Series Database
Ah, I misunderstood what you meant. If you are reporting static buckets I get how that is better than what folks typically do but how do you know the buckets a priori? Others back their histograms with things like https://github.com/tdunning/t-digest. It is pretty powerful as the buckets are dynamic based on the data and histograms can be added together.
-
How percentile approximation works (and why it's more useful than averages)
There are some newer data structures that take this to the next level such as T-Digest[1], which remains extremely accurate even when determining percentiles at the very tail end (like 99.999%)
[1]: https://arxiv.org/pdf/1902.04023.pdf / https://github.com/tdunning/t-digest
-
Show HN: Fast Rolling Quantiles for Python
This is pretty cool. The title would be a bit more descriptive if it were “Fast Rolling Quantile Filters for Python”, since the high-pass/low-pass filter functionality seems to be the focus.
The README mentions it uses binary heaps - if you’re willing to accept some (bounded) approximation, then it should be possible to reduce memory usage and somewhat reduce runtime by using a sketching data structure like Dunning’s t-digest: https://github.com/tdunning/t-digest/blob/main/docs/t-digest....
There is an open source Python implementation, although I haven’t used it and can’t vouch for its quality: https://github.com/CamDavidsonPilon/tdigest
-
A note from our sponsor - InfluxDB
www.influxdata.com | 27 Mar 2023
Stats
tdunning/t-digest is an open source project licensed under Apache License 2.0 which is an OSI approved license.