How we made data aggregation on PostgreSQL better and faster

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

ClickHouse

208 34,359 10.0 C++

ClickHouse® is a free analytics DBMS for big data

ClickHouse [1] is a "fast open source (almost)SQL analytics database" you are looking for :)
[1] https://clickhouse.com/

materialize

117 5,585 10.0 Rust

The data warehouse for operational workloads. (by MaterializeInc)

https://materialize.com/ is billed to be that. That behavior is not trivial to implement.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
pg_ivm

19 779 6.3 C

IVM (Incremental View Maintenance) implementation as a PostgreSQL extension

Interesting. I check back on the wiki page every couple of years to see what progress is being made.
The basic idea seems to be to track the primary keys of the base tables in the incremental view and then use triggers to update those rows when source rows are updated.
The meat of the project is over here for anyone that’s interested - in particular this section about the limitations is pretty interesting (and expected).
https://github.com/sraoss/pg_ivm/#supported-view-definitions...

TimescaleDB

82 16,547 9.8 C

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

(NB - blog author/Timescale employee)
One thing we're improving as we move forward in documentation and other areas is explaining why doing joins (and things like window functions) is difficult in continuous aggregates and not the current focus. Honestly, it's part of the reason most databases haven't tackled this problem before.
Once you add in joins or things that might refer to data outside of the refresh window (LAG values for example), things get really complicated. For instance, if you join to a dimension table and a piece of metadata changes, does that change now need to be updated and reflected back in all of this historical aggregate data that's outside of the automatic refresh policy? Same with a window function - if data within a window hasn't changed but data that *might* be hit because of the window function reference does change, continuous aggregates would have to know about that for each query and track those changes too.
I'm not saying it's impossible or that it won't be solved someday, but the functionality with continuous aggregates that keeps the aggregate data updated automatically (without losing any history) *and* being able to perform fast joins on the finalized data is a very useful step that's not available anywhere else within the Postgres ecosystem.
RE: CAGG on top of a CAGG - you're certainly not the only person to request this[1] () and we understand that. Part of this is because of what I discussed above (tracking changes across multiple tables), although having finalized data might make this more possible in the future.
That said (!!!), the cool thing is that we already *have* begun to solve this problem with hyperfunction aggregates and 2-step aggregation, something I showed in the blog post. So, if your dataset can benefit from one of the hyperfunction aggregates that we currently provide, there are lots of cool things you can do with it, including rollups into bigger buckets without creating a second continuous aggregate! If you haven't checked them out, please do! [2][3]
[1]: https://github.com/timescale/timescaledb/issues/1400

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

GlareDB: An analytics DBMS for distributed data

1 project | news.ycombinator.com | 20 Sep 2023
GlareDB – Your Data Pipeline, Simplified

1 project | news.ycombinator.com | 19 Sep 2023
Querying Postgres Tables Directly from DuckDB

6 projects | news.ycombinator.com | 30 Sep 2022
TimescaleDB 2.7 vs. PostgreSQL 14

4 projects | news.ycombinator.com | 22 Sep 2022
I'm building a real-time monitoring system for metrics of servers in Rust !

2 projects | /r/rust | 21 Jun 2021

How we made data aggregation on PostgreSQL better and faster

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Database SQL Analytics time-series-database Rust
Post date: 21 Jun 2022

ClickHouse

materialize

InfluxDB

pg_ivm

TimescaleDB

Related posts

GlareDB: An analytics DBMS for distributed data

GlareDB – Your Data Pipeline, Simplified

Querying Postgres Tables Directly from DuckDB

TimescaleDB 2.7 vs. PostgreSQL 14

I'm building a real-time monitoring system for metrics of servers in Rust !

How we made data aggregation on PostgreSQL better and faster

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Database SQL Analytics time-series-database Rust Post date: 21 Jun 2022

ClickHouse

materialize

InfluxDB

pg_ivm

TimescaleDB

Related posts

GlareDB: An analytics DBMS for distributed data

GlareDB – Your Data Pipeline, Simplified

Querying Postgres Tables Directly from DuckDB

TimescaleDB 2.7 vs. PostgreSQL 14

I'm building a real-time monitoring system for metrics of servers in Rust !

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Database SQL Analytics time-series-database Rust
Post date: 21 Jun 2022