Using Redpanda with Materialize and dbt for a faster, safer Kappa architecture

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • mz-hack-day-2022

    Discontinued Official repo for the Materialize + Redpanda + dbt Hack Day 2022, including a sample project to get everyone started!

  • This is state-of-the-art Kappa Architecture: Redpanda as a fast, durable log; Materialize for SQL based streaming; and dbt for dataOps. This stack combines speed, ease of use, developer productivity, and governance. Best of all, you do not need to invest in setting up a large infrastructure: this entire stack can be packaged to run as a single Docker Compose project in your own laptop or workstation. You can try it out for yourself using this sample project.

  • materialize

    The data warehouse for operational workloads. (by MaterializeInc)

  • Materialize is a database purpose-built for streaming analytics. It incrementally updates query results (defined using SQL materialized views) as new data arrives, without requiring manual refreshes. The original reference implementation of the Kappa Architecture used Apache Samza, and since then multiple other streaming frameworks have popped up which require imperative programming skills in languages like Java, Scala, or Python. However, SQL is the lingua franca for batch data processing and analytics, so it makes sense to use it for streaming as well. Through its use of ANSI-standard SQL, Materialize puts the power of stream processing back into the hands of data engineers and analysts. Materialize is also wire-compatible with Postgres and can work with its broad ecosystem of tools and integrations.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • redpanda

    Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!

  • For additional information or help with Redpanda, specifically, we encourage you to join our community Slack channel or download the binary from GitHub.

  • git

    A fork of Git containing Windows-specific patches. (by git-for-windows)

  • Materialize is another supported execution environment, thanks to its wire compatibility with Postgres. With a standard database backend, dbt can perform periodic refreshes through the use of incremental models. Effectively this is like running mini-ETL jobs to update data. With dbt and Materialize, you can define your logic in dbt for Materialize to execute continuous real-time updates. No matter how frequently your data arrives, your models will stay up-to-date without manual or configured refreshes. As a transformation framework, dbt provides facilities for packaging, testing and documentation of models, pipelines, and schema. Especially when paired with a version control tool like Git, this combination gives data teams a powerful, self-documenting development stack for streaming data pipelines.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • We Built a Streaming SQL Engine

    3 projects | news.ycombinator.com | 21 Oct 2023
  • Query Real Time Data in Kafka Using SQL

    7 projects | dev.to | 23 Mar 2023
  • What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?

    1 project | /r/dataengineering | 23 Jan 2023
  • How to handle partial updates and bulk updates in the source systems

    1 project | /r/dataengineering | 5 Jan 2023
  • Headless BI with streaming data

    2 projects | dev.to | 22 Sep 2022