The Next Generation of Materialize

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

materialize

117 5,580 10.0 Rust

The data warehouse for operational workloads. (by MaterializeInc)

As mentioned in the blog post, clusters allow horizontal scalability and daisy chaining, so you can allocate more memory for your views even if you run up against the limits of how much memory you can fit on a single machine. We've got plans in the works to support out-of-core execution, too.
> Also they do not integrate at all with custom data types in Postgres IME. E.g. an enumeration in your table will mean materialize can’t read the table as a source. Lame.
We're aware of this and are working on a fix. There are two tracking issues, if you'd like to follow along:
* #6818 (https://github.com/MaterializeInc/materialize/issues/6818) is specifically about supporting PostgreSQL enum types

mssql-changefeed

4 14 6.6 Go

We do something similar, but in 2), instead of using the outbox pattern, we make use (in several different settings) of integers that are guaranteed to increment in commit order, then each consumer can track where their cursor is on the feed of changes. This requires some more coordination but it means that publishers of changes don't need one outbox per consumer or similar.
Then you can have "processes" that query for new data in an input table, and update aggregates/derived tables from that simply by "select * ... where ChangeSequenceNumber > @MaxSequenceNumberFromPreviousExecution"...
The idea here implemented for Microsoft SQL for the OLTP case:
https://github.com/vippsas/mssql-changefeed

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ksql

4 5,819 10.0 Java

The database purpose-built for stream processing applications.
postgresql-contrib

3 13 10.0 PLpgSQL

I use PG with an alternative materialized views implementation[0] that is pure PlPgSQL and that exposes real tables that can be used to write to in triggers, and where the views can be marked stale too.
This means hand-coding triggers to keep the materializations up to date, or else to mark them as out of date (because maybe some operations would be slow or hard to hand-code triggers for), but this works remarkably well.
As a bonus, I get an update history table that can be used to generate updates to external systems.
In principle one can get the AST for a VIEW's query from the PG catalog and use that generate triggers on the tables it queries to keep it up to date. In practice that's only trivial for some kinds of queries, and I've not written such a tool yet.
[0] https://github.com/twosigma/postgresql-contrib/blob/master/m...

risingwave

27 6,331 10.0 Rust

SQL stream processing, analytics, and management. PostgreSQL simplicity, unrivaled performance, and seamless elasticity. 🚀 10x more productive. 🚀 10x more cost-efficient.

Please also take a look at https://github.com/risingwavelabs/risingwave if you are looking for advanced streaming databases. It is under Apache License and also support on-prem deployment (docker, kubernetes) with full function set of distributed clustering, compute-storage disaggregation, etc..

pg_ivm

19 779 6.3 C

IVM (Incremental View Maintenance) implementation as a PostgreSQL extension

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Proton, a fast and lightweight alternative to Apache Flink

7 projects | news.ycombinator.com | 30 Jan 2024
We Built a Streaming SQL Engine

3 projects | news.ycombinator.com | 21 Oct 2023
Query Real Time Data in Kafka Using SQL

7 projects | dev.to | 23 Mar 2023
What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?

1 project | /r/dataengineering | 23 Jan 2023
How to handle partial updates and bulk updates in the source systems

1 project | /r/dataengineering | 5 Jan 2023

The Next Generation of Materialize

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
SQL Rust Stream Processing Database Kafka
Post date: 3 Oct 2022

materialize

mssql-changefeed

InfluxDB

ksql

postgresql-contrib

risingwave

pg_ivm

Related posts

Proton, a fast and lightweight alternative to Apache Flink

We Built a Streaming SQL Engine

Query Real Time Data in Kafka Using SQL

What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?

How to handle partial updates and bulk updates in the source systems

The Next Generation of Materialize

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com SQL Rust Stream Processing Database Kafka Post date: 3 Oct 2022

materialize

mssql-changefeed

InfluxDB

ksql

postgresql-contrib

risingwave

pg_ivm

Related posts

Proton, a fast and lightweight alternative to Apache Flink

We Built a Streaming SQL Engine

Query Real Time Data in Kafka Using SQL

What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?

How to handle partial updates and bulk updates in the source systems

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
SQL Rust Stream Processing Database Kafka
Post date: 3 Oct 2022