soda-sql
airflow-docker
Our great sponsors
soda-sql | airflow-docker | |
---|---|---|
25 | 1 | |
50 | 21 | |
- | - | |
8.2 | 7.0 | |
over 1 year ago | 2 months ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
soda-sql
-
Data Quality - Great Expectations for Data Engineers
I might be a bit biased, but that was my opinion before even I started contributing to Soda SQL.
You can always give Soda a try, more info on soda.io and https://github.com/sodadata/soda-sql. We've put a lot of focus on making it lightweight and easy to use. Disclaimer: I'm one of the founders :).
-
dbt vs R/Python for transformation
Testing and production monitoring of data is still underrated in many teams. In building and operating software systems this has become the norm. In data, there is still a lot of room for improvement. The mentioned tools are insufficient for a thorough testing and monitoring setup. That is why we created Soda with Soda SQL as our open source tool for testing data in and out of pipeline: https://github.com/sodadata/soda-sql
-
How do you test your pipelines?
You can also use soda-sql to do checks on your warehouses separately. Both Soda SQL and Soda Spark are OSS/Apache licensed.
- How heavily do you use Great Expectations?
-
What are some exciting new tools/libraries in 2021?
soda-sql really cool library to automate data quality checks on SQL tables
-
Data Testing Tools, Pytest vs Great Expectations vs Soda vs Deequ
Certainly! Itβs not requested that much π but please add an issue on GitHub . I would love to add at least experimental support.
-
Open source contributions for a Data Engineer?
If you are interested in using/learning Python, SQL and data warehouse skills, take a look at https://github.com/sodadata/soda-sql
-
Anyone aware of any Data Validation Framework with custom SQL capability
Soda-sql looks promising. It has some out of the box tests and you can also provide custom SQL: https://github.com/sodadata/soda-sql
airflow-docker
-
Airflow Api tests
Clone the airflow-docker repo.
What are some alternatives?
deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
pandera - A light-weight, flexible, and expressive statistical data testing library
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
dbt-sessionization - Using DBT for Creating Session Abstractions on RudderStack - an open-source, warehouse-first customer data pipeline and Segment alternative.
re_data - re_data - fix data issues before your users & CEO would discover them π
trino_data_mesh - Proof of concept on how to gain insights with Trino across different databases from a distributed data mesh
spark-fast-tests - Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
piperider - Code review for data in dbt
dagster - An orchestration platform for the development, production, and observation of data assets.
airflow-notebook - This repository is no longer maintained.
wsl-windows-toolbar-launcher - Adds linux GUI application menu to a windows toolbar