materialize
dbt-expectations
Our great sponsors
materialize | dbt-expectations | |
---|---|---|
117 | 10 | |
5,567 | 939 | |
0.9% | 3.3% | |
10.0 | 6.7 | |
about 2 hours ago | 25 days ago | |
Rust | Shell | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
materialize
-
Ask HN: How Can I Make My Front End React to Database Changes in Real-Time?
[2] https://materialize.com/
-
Choosing Between a Streaming Database and a Stream Processing Framework in Python
To fully leverage the data is the new oil concept, companies require a special database designed to manage vast amounts of data instantly. This need has led to different database forms, including NoSQL databases, vector databases, time-series databases, graph databases, in-memory databases, and in-memory data grids. Recent years have seen the rise of cloud-based streaming databases such as RisingWave, Materialize, DeltaStream, and TimePlus. While they each have distinct commercial and technical approaches, their overarching goal remains consistent: to offer users cloud-based streaming database services.
-
Proton, a fast and lightweight alternative to Apache Flink
> Materialize no longer provide the latest code as an open-source software that you can download and try. It turned from a single binary design to cloud-only micro-service
Materialize CTO here. Just wanted to clarify that Materialize has always been source available, not OSS. Since our initial release in 2020, we've been licensed under the Business Source License (BSL), like MariaDB and CockroachDB. Under the BSL, each release does eventually transition to Apache 2.0, four years after its initial release.
Our core codebase is absolutely still publicly available on GitHub [0], and our developer guide for building and running Materialize on your own machine is still public [1].
It is true that we substantially rearchitected Materialize in 2022 to be more "cloud-native". Our new cloud offering offers horizontal scalability and fault tolerance—our two most requested features in the single-binary days. I wouldn't call the new architecture a microservices design though! There are only 2-3 services, each quite substantial, in the new architecture (loosely: a compute service, an orchestration service, and, soon, a load balancing service).
We do push folks to sign up for a free trial of our hosted cloud offering [2] these days, rather than trying to start off by running things locally, as we generally want folks' first impression of Materialize to be of the version that we support for production use cases. A all-in-one single machine Docker image does still exist, if you know where to look, but it's very much use-at-your-own-risk, and we don't recommend using it for anything serious, but it's there to support e.g. academic work that wants to evaluate Materialize's capabilities to incrementally maintain recursive SQL queries.
If folks have questions about Materialize, we've got a lively community Slack [3] where you can connect directly with our product and engineering teams.
[0]: https://github.com/MaterializeInc/materialize/tree/main
- What I Talk About When I Talk About Query Optimizer (Part 1): IR Design
-
We Built a Streaming SQL Engine
Some recent solutions to this problem include Differential Dataflow and Materialize. It would be neat if postgres adopted something similar for live-updating materialized views.
https://github.com/timelydataflow/differential-dataflow
https://materialize.com/
-
Ask HN: Who is hiring? (October 2023)
Materialize | Full-Time | NYC Office or Remote | https://materialize.com
Materialize is an Operational Data Warehouse: A cloud data warehouse with streaming internals, built for work that needs action on what’s happening right now. Keep the familiar SQL, keep the proven architecture of cloud warehouses but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.
Materialize is the operational data warehouse built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI.
Senior/Staff Product Manager - https://grnh.se/69754ebf4us
Senior Frontend Engineer - https://grnh.se/7010bdb64us
===
Investors include Redpoint, Lightspeed and Kleiner Perkins.
-
Ask HN: Who is hiring? (June 2023)
Materialize | EM (Compute), Senior PM | New York, New York | https://materialize.com/
You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.
That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI.
Engineering Manager, Compute - https://grnh.se/4e14099f4us
Senior Product Manager - https://grnh.se/587c36804us
VP of Marketing - https://grnh.se/9caac4b04us
- What are your favorite tools or components in the Kafka ecosystem?
- Ask HN: Who is hiring? (May 2023)
-
Dozer: A scalable Real-Time Data APIs backend written in Rust
How does it compare to https://materialize.com/ ?
dbt-expectations
-
Dbt tests vs Soda SQL
Have not used Soda, but dbt indeed is pretty good especially when adding dbt-expectations
-
Data-eng related highlights from the latest Thoughtworks Tech Radar
dbt-expectations
-
Data Quality Dimensions: Assuring Your Data Quality with Great Expectations
I highly.. highly.. recommend the dbt-expectations extension from Catologica for dbt. It's a port of Great Expectations, except you can quickly thunk it in your schema.yml's and have it run as part of your dbt test process. Super powerful and it's prevented us from shipping bad data many times.
-
Managing SQL Tests
I'm used to utilising dbt and defining my tests there (along with dbt-utils or https://github.com/calogica/dbt-expectations): I simply add a list item to a column definition and can already define a great number of tests without having to copy code. I can even extend the pre-defined using generic tests. Writing custom tests also integrates nicely. Additionally it's very convenient to tag tests or define a severity. The learning curve for a business engineer is almost flat as long as they know some SQL.
-
What are some Data Quality check related frameworks for datasets ranging from 100GB to 1TB in size?
Use dbt's testing functionality during your transformations with catalogica/dbt-expectations (Great Expectations framework ported to dbt)
-
Great Expectations is annoyingly cumbersome
Check out dbt-expectations https://github.com/calogica/dbt-expectations
-
CI/CD in data engineering - help a noob
There are certain things I would like to add such as data quality, I can use something like dbt great expectations, but I am not sure how much more I should force it before getting an airflow setup..
- How do you query and quality check data produced in intermediate steps in analytics pipeline?
-
ETL Pipelines with Airflow: The Good, the Bad and the Ugly
[dbt Labs employee here]
Check out dbt-expectations package[1]. It's a port of the Great Expectations checks to dbt as tests. The advantage of this is you don't need another tool for these pretty standard tests, and can be early incorporated into dbt workflows.
[1] https://github.com/calogica/dbt-expectations
-
Unit testing SQL in DBT
Also check out dbt-expectations that is a port of Great Expectations that greatly expands the configurable (non-assert) tests.
What are some alternatives?
ClickHouse - ClickHouse® is a free analytics DBMS for big data
dbt-utils - Utility functions for dbt projects.
risingwave - Cloud-native SQL stream processing, analytics, and management. KsqlDB and Apache Flink alternative. 🚀 10x more productive. 🚀 10x more cost-efficient.
dbt-oracle - A dbt adapter for oracle db backend
openpilot - openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for 250+ supported car makes and models.
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.
rust-kafka-101 - Getting started with Rust and Kafka
NVTabular - NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
scryer-prolog - A modern Prolog implementation written mostly in Rust.
cuetils - CLI and library for diff, patch, and ETL operations on CUE, JSON, and Yaml
roapi - Create full-fledged APIs for slowly moving datasets without writing a single line of code.
dbt-fal - do more with dbt. dbt-fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.