-
mz-hack-day-2022
Discontinued Official repo for the Materialize + Redpanda + dbt Hack Day 2022, including a sample project to get everyone started!
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
redpanda
Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
This is state-of-the-art Kappa Architecture: Redpanda as a fast, durable log; Materialize for SQL based streaming; and dbt for dataOps. This stack combines speed, ease of use, developer productivity, and governance. Best of all, you do not need to invest in setting up a large infrastructure: this entire stack can be packaged to run as a single Docker Compose project in your own laptop or workstation. You can try it out for yourself using this sample project.
Materialize is a database purpose-built for streaming analytics. It incrementally updates query results (defined using SQL materialized views) as new data arrives, without requiring manual refreshes. The original reference implementation of the Kappa Architecture used Apache Samza, and since then multiple other streaming frameworks have popped up which require imperative programming skills in languages like Java, Scala, or Python. However, SQL is the lingua franca for batch data processing and analytics, so it makes sense to use it for streaming as well. Through its use of ANSI-standard SQL, Materialize puts the power of stream processing back into the hands of data engineers and analysts. Materialize is also wire-compatible with Postgres and can work with its broad ecosystem of tools and integrations.
For additional information or help with Redpanda, specifically, we encourage you to join our community Slack channel or download the binary from GitHub.
Materialize is another supported execution environment, thanks to its wire compatibility with Postgres. With a standard database backend, dbt can perform periodic refreshes through the use of incremental models. Effectively this is like running mini-ETL jobs to update data. With dbt and Materialize, you can define your logic in dbt for Materialize to execute continuous real-time updates. No matter how frequently your data arrives, your models will stay up-to-date without manual or configured refreshes. As a transformation framework, dbt provides facilities for packaging, testing and documentation of models, pipelines, and schema. Especially when paired with a version control tool like Git, this combination gives data teams a powerful, self-documenting development stack for streaming data pipelines.
Related posts
-
We Built a Streaming SQL Engine
-
Query Real Time Data in Kafka Using SQL
-
What makes a time series oriented database (ex: QuestDB) more efficient for OLAP on time series than an OLAP "only" oriented database (ex: DuckDB) technically?
-
How to handle partial updates and bulk updates in the source systems
-
Headless BI with streaming data