soda-sql
piperider
Our great sponsors
soda-sql | piperider | |
---|---|---|
25 | 6 | |
50 | 467 | |
- | 0.6% | |
8.2 | 9.5 | |
over 1 year ago | about 2 months ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
soda-sql
-
Data Quality - Great Expectations for Data Engineers
I might be a bit biased, but that was my opinion before even I started contributing to Soda SQL.
- dbt vs R/Python for transformation
-
SodaCL - preview of a new "data reliability as code" language
I'm one of the developers of the Open Source soda-sql data quality monitoring library, and over the past year we got some incredible feedback from our users, and based on that we started working on a new DSL for data reliability as code we are calling Soda CL.
-
How do you test your pipelines?
You can also use soda-sql to do checks on your warehouses separately. Both Soda SQL and Soda Spark are OSS/Apache licensed.
-
Being constantly shut down by more senior team members when I mention adding some QA in our work
As many have said, there might be business side of things to deliver. Somebody above promised delivery with tight deadlines. Trust me, I am not a fan, but this how the world works and it sucks. I would say in your free time, explore tools like greatexpectations.io https://greatexpectations.io/ or https://github.com/sodadata/soda-sql which are modern ways of testing in your learning curve
- Soda
- How heavily do you use Great Expectations?
-
What are some exciting new tools/libraries in 2021?
soda-sql really cool library to automate data quality checks on SQL tables
-
How do I incorporate testing after the fact?
Look at SodaSQL. It's more enterprise focused than Great Expectations and you can pipe results to a database for downstream actions and analysis.
-
Data Testing Tools, Pytest vs Great Expectations vs Soda vs Deequ
Certainly! It’s not requested that much 😊 but please add an issue on GitHub . I would love to add at least experimental support.
piperider
- Show HN: PipeRider – open-source Data Impact Analysis for dbt changes
-
Open source data observability tools with UI?
If you post a GitHub issue to request these connectors is might help persuade the product team to add these sooner than later.
-
Data profiling as part of a data reliability strategy?
PS. I'm a bit biased -> I'm working for PipeRider; we're building an open-source data reliability toolkit with profiling at the core: https://github.com/InfuseAI/piperider
-
Show HN: PipeRider, data reliability automated tool
I was rush to Show HN, and now I want to tell a bit more.
PipeRider, it’s our take on a data reliability and quality tool for data pipelines. It’s based on data profiling and assertions that test against the data profile.
It’s open-source and ready to use on Github here: https://github.com/infuseai/piperider
Here is a quick start to get you up and running easily:
What are some alternatives?
deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
great_expectations - Always know what to expect from your data.
pandera - A light-weight, flexible, and expressive statistical data testing library
pointblank - Data quality assessment and metadata reporting for data frames and database tables
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
pandas-profiling - Create HTML profiling reports from pandas DataFrame objects [Moved to: https://github.com/ydataai/pandas-profiling]
dbt-sessionization - Using DBT for Creating Session Abstractions on RudderStack - an open-source, warehouse-first customer data pipeline and Segment alternative.
ydata-profiling - 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
re_data - re_data - fix data issues before your users & CEO would discover them 😊
elementary - The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
trino_data_mesh - Proof of concept on how to gain insights with Trino across different databases from a distributed data mesh
dbt-oracle - dbt (data build tool) adapter for Oracle Autonomous Database