Using Redpanda with Materialize and dbt for a faster, safer Kappa architecture

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

mz-hack-day-2022

2 55 4.3 Python

Discontinued Official repo for the Materialize + Redpanda + dbt Hack Day 2022, including a sample project to get everyone started!

This is state-of-the-art Kappa Architecture: Redpanda as a fast, durable log; Materialize for SQL based streaming; and dbt for dataOps. This stack combines speed, ease of use, developer productivity, and governance. Best of all, you do not need to invest in setting up a large infrastructure: this entire stack can be packaged to run as a single Docker Compose project in your own laptop or workstation. You can try it out for yourself using this sample project.

materialize

117 5,580 10.0 Rust

The data warehouse for operational workloads. (by MaterializeInc)

Materialize is a database purpose-built for streaming analytics. It incrementally updates query results (defined using SQL materialized views) as new data arrives, without requiring manual refreshes. The original reference implementation of the Kappa Architecture used Apache Samza, and since then multiple other streaming frameworks have popped up which require imperative programming skills in languages like Java, Scala, or Python. However, SQL is the lingua franca for batch data processing and analytics, so it makes sense to use it for streaming as well. Through its use of ANSI-standard SQL, Materialize puts the power of stream processing back into the hands of data engineers and analysts. Materialize is also wire-compatible with Postgres and can work with its broad ecosystem of tools and integrations.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
redpanda

70 8,822 10.0 C++

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!

For additional information or help with Redpanda, specifically, we encourage you to join our community Slack channel or download the binary from GitHub.

git

616 8,080 0.0 C

A fork of Git containing Windows-specific patches. (by git-for-windows)

Materialize is another supported execution environment, thanks to its wire compatibility with Postgres. With a standard database backend, dbt can perform periodic refreshes through the use of incremental models. Effectively this is like running mini-ETL jobs to update data. With dbt and Materialize, you can define your logic in dbt for Materialize to execute continuous real-time updates. No matter how frequently your data arrives, your models will stay up-to-date without manual or configured refreshes. As a transformation framework, dbt provides facilities for packaging, testing and documentation of models, pipelines, and schema. Especially when paired with a version control tool like Git, this combination gives data teams a powerful, self-documenting development stack for streaming data pipelines.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

We Built a Streaming SQL Engine

3 projects | news.ycombinator.com | 21 Oct 2023
Query Real Time Data in Kafka Using SQL

7 projects | dev.to | 23 Mar 2023
What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?

1 project | /r/dataengineering | 23 Jan 2023
How to handle partial updates and bulk updates in the source systems

1 project | /r/dataengineering | 5 Jan 2023
Headless BI with streaming data

2 projects | dev.to | 22 Sep 2022

Using Redpanda with Materialize and dbt for a faster, safer Kappa architecture

This page summarizes the projects mentioned and recommended in the original post on dev.to
Streaming Kafka SQL redpanda HacktoberFest
Post date: 16 Mar 2022

mz-hack-day-2022

materialize

InfluxDB

redpanda

git

Related posts

We Built a Streaming SQL Engine

Query Real Time Data in Kafka Using SQL

What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?

How to handle partial updates and bulk updates in the source systems

Headless BI with streaming data

Using Redpanda with Materialize and dbt for a faster, safer Kappa architecture

This page summarizes the projects mentioned and recommended in the original post on dev.to Streaming Kafka SQL redpanda HacktoberFest Post date: 16 Mar 2022

mz-hack-day-2022

materialize

InfluxDB

redpanda

git

Related posts

We Built a Streaming SQL Engine

Query Real Time Data in Kafka Using SQL

What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?

How to handle partial updates and bulk updates in the source systems

Headless BI with streaming data

This page summarizes the projects mentioned and recommended in the original post on dev.to
Streaming Kafka SQL redpanda HacktoberFest
Post date: 16 Mar 2022