dbt-core
great_expectations
Our great sponsors
dbt-core | great_expectations | |
---|---|---|
86 | 15 | |
8,718 | 9,361 | |
6.1% | 1.9% | |
9.7 | 9.9 | |
4 days ago | 7 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dbt-core
-
Relational is more than SQL
dbt integration was one of our major goals early on but we found that the interaction wasn't as straightforward as had hoped.
There is an open PR in the dbt repo: https://github.com/dbt-labs/dbt-core/pull/5982#issuecomment-...
I have some ideas about future directions in this space where I believe PRQL could really shine. I will only be able to write those down in a couple of hours. I think this could be a really exciting direction for the project to grow into if anyone would like to collaborate and contribute!
-
Python: Just Write SQL
I really dislike SQL, but recognize its importance for many organizations. I also understand that SQL is definitely testable, particularly if managed by environments such as DBT (https://github.com/dbt-labs/dbt-core). Those who arrived here with preference to python will note that dbt is largely implemented in python, adds Jinja macros and iterative forms to SQL, and adds code testing capabilities.
-
Transform Your Data Like a Pro With dbt (Data Build Tool)
3). Data Build Tool Repository.
- How do I build a docker image based on a Dockerfile on github?
-
DBT core v1.5 released
Here’s the PR, which includes a what/how/why: https://github.com/dbt-labs/dbt-core/issues/7158
- Building Column Level Lineage for dbt
-
Unit testing with dbt
Hey OP! There are packages like dbt-datamocktool or dbt-unit-testing. You can check it out. You might want to check out this thread as well.
- SQL and M4 = Composable SQL
-
Interview Prep - Senior Data Integration role
RudderStack, dbt, Kafka, Headless CDP, etc. on top of my mind
great_expectations
-
Data Quality at Scale with Great Expectations, Spark, and Airflow on EMR
Great Expectations (GE) is an open-source data validation tool that helps ensure data quality.
- Looking for Unit Testing framework in Database Migration Process
-
Soda Core (OSS) is now GA! So, why should you add checks to your data pipelines?
GE is arguably the most well known OSS alternative to Soda Core. The third option is deequ, originally developed and released in OSS by AWS. Our community has told us that Soda Core is different because it’s easy to get going and embed into data pipelines. And it also allows some of the check authoring work to be moved to other members of the data team. I'm sure there are also scenarios where Soda Core is not the best option. For example, when you only use Pandas dataframes or develop in Scala.
-
Package for drift detection
great_expectations: https://github.com/great-expectations/great_expectations
-
Data pipeline suggestions
Testing: GreatExpectations
-
Where can I find free data engineering ( big data) projects online?
Ingestion / ETL: Airbyte, Singer, Jitsu Transformation: dbt Orchestration: Airflow, Dagster Testing: GreatExpectations Observability: Monosi Reverse ETL: Grouparoo, Castled Visualization: Lightdash, Superset
- [P] Deepchecks: an open-source tool for high standards validations for ML models and data.
-
great_expectations VS redata - a user suggested alternative
2 projects | 24 Sep 2021
-
Looking for open-source model serving framework with dashboard for test data quality
it should have a dashboard for test data quality monitoring - ideally with alarms from the great_expectations framework https://github.com/great-expectations/great_expectations
-
[D] What’s the simplest, most lightweight but complete and 100% open source MLOps toolkit? -> MY OWN CONCLUSIONS
I expected Great Expectations library to be recommended, but nobody told anything. Instead, unit testing and/or smoke tests using pytest. And checking them with Jenkins. Anyway, if Kedro ends up being our project template, I'll keep an eye on the plugin with Great Expectations.
What are some alternatives?
evidently - Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b
kedro-great - The easiest way to integrate Kedro and Great Expectations
deepchecks - Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.
re_data - re_data - fix data issues before your users & CEO would discover them 😊
streamlit - Streamlit — A faster way to build and share data apps.
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
metricflow - MetricFlow allows you to define, build, and maintain metrics in code.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
n8n - Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
seldon-core - An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
citus - Distributed PostgreSQL as an extension
dagster - An orchestration platform for the development, production, and observation of data assets.