t-digest vs timescale-analytics

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

t-digest		timescale-analytics
	Project
9	Mentions	8
1,922	Stars	336
-	Growth	4.5%
3.3	Activity	6.0
4 months ago	Latest Commit	4 days ago
Java	Language	Rust
Apache License 2.0	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

t-digest

Posts with mentions or reviews of t-digest. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-07-21.

Ask HN: How do you deal with information and internet addiction?
1 project | news.ycombinator.com | 8 Feb 2023

> I get a lot of benefit from this information but somehow it feels shallow.
I take a longer view to this. For example, a few years ago I read about an algorithm to calculate percentiles in real time. [0]
It literally just came up at work today. I haven't used that information but maybe two times since I read it, but it was super relevant today and saved my team potential weeks of development.
So maybe it's not so shallow.
But to your actual question, I have a similar problem. The best I can say is that deadlines help. I usually put down the HN and Youtube when I have a deadline coming up. And not just at work. I make sure my hobbies have deadlines too.
I tell people when I think something will be done, so they start bugging me about it when it doesn't get done, so that I have a "deadline". Also one of my hobbies is pixel light shows for holidays, which come with excellent natural deadlines -- it has to be done by the holiday or it's useless.
So either find an "accountability buddy" who will hold you to your self imposed deadlines, or find a hobby that has natural deadlines, like certain calendar dates, or annual conventions or contests that you need to be done by.
[0] https://github.com/tdunning/t-digest
Ask HN: What are some 'cool' but obscure data structures you know about?
54 projects | news.ycombinator.com | 21 Jul 2022

I am enamored by data structures in the sketch/summary/probabilistic family: t-digest[1], q-digest[2], count-min sketch[3], matrix-sketch[4], graph-sketch[5][6], Misra-Gries sketch[7], top-k/spacesaving sketch[8], &c.
What I like about them is that they give me a set of engineering tradeoffs that I typically don't have access to: accuracy-speed[9] or accuracy-space. There have been too many times that I've had to say, "I wish I could do this, but it would take too much time/space to compute." Most of these problems still work even if the accuracy is not 100%. And furthermore, many (if not all of these) can tune accuracy to by parameter adjustment anyways. They tend to have favorable combinatorial properties ie: they form monoids or semigroups under merge operations. In short, a property of data structures that gave me the ability to solve problems I couldn't before.
I hope they are as useful or intriguing to you as they are to me.
1. https://github.com/tdunning/t-digest
2. https://pdsa.readthedocs.io/en/latest/rank/qdigest.html
3. https://florian.github.io/count-min-sketch/
4. https://www.cs.yale.edu/homes/el327/papers/simpleMatrixSketc...
5. https://www.juanlopes.net/poly18/poly18-juan-lopes.pdf
6. https://courses.engr.illinois.edu/cs498abd/fa2020/slides/20-...
7. https://people.csail.mit.edu/rrw/6.045-2017/encalgs-mg.pdf
8. https://www.sciencedirect.com/science/article/abs/pii/S00200...
9. It may better be described as error-speed and error-space, but I've avoided the term error because the term for programming audiences typically evokes the idea of logic errors and what I mean is statistical error.
Monarch: Google’s Planet-Scale In-Memory Time Series Database
4 projects | news.ycombinator.com | 14 May 2022

Ah, I misunderstood what you meant. If you are reporting static buckets I get how that is better than what folks typically do but how do you know the buckets a priori? Others back their histograms with things like https://github.com/tdunning/t-digest. It is pretty powerful as the buckets are dynamic based on the data and histograms can be added together.
[Q] Estimator for pop median
1 project | /r/statistics | 16 Sep 2021

Yes, but if you need to estimate median on the fly (e.g., over a stream of data) or in parallel there are better ways.
How percentile approximation works (and why it's more useful than averages)
8 projects | news.ycombinator.com | 14 Sep 2021

There are some newer data structures that take this to the next level such as T-Digest[1], which remains extremely accurate even when determining percentiles at the very tail end (like 99.999%)
[1]: https://arxiv.org/pdf/1902.04023.pdf / https://github.com/tdunning/t-digest
Reducing fireflies in path tracing
1 project | /r/GraphicsProgramming | 3 Aug 2021

[2] https://github.com/tdunning/t-digest
Reliable, Scalable, and Maintainable Applications
1 project | dev.to | 8 Apr 2021

T-Digest
Show HN: Fast Rolling Quantiles for Python
2 projects | news.ycombinator.com | 1 Mar 2021

This is pretty cool. The title would be a bit more descriptive if it were “Fast Rolling Quantile Filters for Python”, since the high-pass/low-pass filter functionality seems to be the focus.
The README mentions it uses binary heaps - if you’re willing to accept some (bounded) approximation, then it should be possible to reduce memory usage and somewhat reduce runtime by using a sketching data structure like Dunning’s t-digest: https://github.com/tdunning/t-digest/blob/main/docs/t-digest....
There is an open source Python implementation, although I haven’t used it and can’t vouch for its quality: https://github.com/CamDavidsonPilon/tdigest

timescale-analytics

Posts with mentions or reviews of timescale-analytics. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-02-22.

Timescale raises $110M Series C
8 projects | news.ycombinator.com | 22 Feb 2022

Hi! So the team is over 100 at this point, but engineering effort is spread across multiple products at this point.
The core timescaledb repo [0] has 10-15 primary engineers (although we are aggressively hiring for database internal engineers), with a few others working on DB hyperfunctions and our function pipelining [1] in a separate extension [2]. I think generally the set of folks who contribute to low-level database internals in C is just smaller than other type of projects.
We also have our promscale product [3], which is our observability backend powered by SQL & TimescaleDB.
And then there is Timescale Cloud, which is obviously a large engineering effort (most of which does not happen in public repos).
And we are hiring. Fully remote & global.
https://www.timescale.com/careers
[0] https://github.com/timescale/timescaledb
[1] https://www.timescale.com/blog/function-pipelines-building-f...
[2] https://github.com/timescale/timescaledb-toolkit
[3] https://github.com/timescale/promscale ; https://github.com/timescale/tobs
Function pipelines: Building functional programming into PostgreSQL
3 projects | news.ycombinator.com | 19 Oct 2021

(NB: Post author here)
This is in the TimescaleDB Toolkit extension [1] which is licensed under our community license for now and it's not available on DO. It is available on our cloud service fully managed. You can also install it and run it for free yourself.
[1]: https://github.com/timescale/timescaledb-toolkit
How percentile approximation works (and why it's more useful than averages)
8 projects | news.ycombinator.com | 14 Sep 2021
How PostgreSQL aggregation works and how it inspired our hyperfunctions’ design
2 projects | news.ycombinator.com | 5 Aug 2021

Absolutely! We're actually developing a lot of that: https://github.com/timescale/timescaledb-toolkit/tree/main/d...
A number of the things you're looking for we've done experimentally and we'll be stabilizing over the next few releases. So we'd love some feedback while we're still able to futz with the API without making breaking changes.
But the two you're asking about are, I think, going to be covered by hyperloglog (we just reimplemented the internals with HLL++) and stats_agg family of functions, which have both 1D (which will give you avg, stddev, variance, etc) and 2D (co-variance, slope, intercept, x-intercept etc as well as all the 1D functions).
Would also love issues if you think we're missing other stuff, going to be generalizing this and want to make it useful for folks.
(NB: Post author here.)
Postgres downsampling performance
1 project | /r/PostgreSQL | 7 Jun 2021

If you know that you're going to be doing downsampling at the hourly level then a continuous aggregate on the hour is probably a good idea. We're also building some functions to make some of the continuous aggregate stuff for these sorts of cases easier/more accurate in more cases, especially if you need things like exact averages when you don't have the same number of points in an hour and want to re-aggregate on top of the continuous agg. See: https://github.com/timescale/timescale-analytics/pull/141/files
TimescaleDB Raises $40M
7 projects | news.ycombinator.com | 5 May 2021

Fair point about adaptive chunking. You sound like a long-term user!
There is always a trade-off between getting features to users quickly to experiment and incrementally improve, versus doing it always very conservatively.
When we launched adaptive chunking (introduced in 0.11, deprecated in 1.2), we explicitly marked it as beta and default off, to hopefully reflect that. [1]
The approach we are now taking with Timescale Analytics [2] is to have an explicit distinction between experimental features (which will be part of a distinct"experimental" schema in the database, and must be expressly turned on with appropriate warnings) and stable features. Hopefully this can help find a good balance between stability and velocity, but feedback welcome!
[1] https://github.com/timescale/timescaledb/releases/tag/0.11.0
[2] https://github.com/timescale/timescale-analytics/tree/main/e...

What are some alternatives?

When comparing t-digest and timescale-analytics you can also consider the following projects:

EvoTrees.jl - Boosted trees in Julia

orioledb - OrioleDB – building a modern cloud-native storage engine (... and solving some PostgreSQL wicked problems) 🇺🇦

tdigest - t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

PSI - Private Set Intersection Cardinality protocol based on ECDH and Bloom Filters

Telegraf - The plugin-driven server agent for collecting & reporting metrics.

minisketch - Minisketch: an optimized library for BCH-based set reconciliation

promscale - [DEPRECATED] Promscale is a unified metric and trace observability backend for Prometheus, Jaeger and OpenTelemetry built on PostgreSQL and TimescaleDB.

tdigest - PostgreSQL extension for estimating percentiles using t-digest

pgx - Build Postgres Extensions with Rust! [Moved to: https://github.com/tcdi/pgrx]

AspNetCoreDiagnosticScenarios - This repository has examples of broken patterns in ASP.NET Core applications

tsbs - Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data