soda-core
cuallee
soda-core | cuallee | |
---|---|---|
5 | 5 | |
1,765 | 107 | |
2.3% | - | |
8.9 | 9.0 | |
5 days ago | 5 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
soda-core
- Looking for Unit Testing framework in Database Migration Process
-
Data profiling tools / approaches?
Tools like Soda Core could be really helpful for this. For example, it allows you to set up a change over time threshold which could take the form of: change avg last 3 for missing_count(column_name) < 20%
-
Data QC? Great Expectations?
You can give https://github.com/sodadata/soda-core - open source and (in my opinion) easy to get a lot of value with minimum effort.
- Show HN: Soda Core is now GA – Test data like you would test your code
-
Soda Core (OSS) is now GA! So, why should you add checks to your data pipelines?
Give Soda Core a try! It's really easy. If you only have 2 minutes, check out our docs or interactive demo (pretty cool no?). If you have a bit more time, install it and give it a spin! Want to look at it later? Star on Github. Got stuck? As in our Slack community.
cuallee
- Show HN: Snowflake Data Quality Checks in Python
-
data-diff VS cuallee - a user suggested alternative
2 projects | 30 Nov 2022
Declarative data quality rules at scale
-
deequ VS cuallee - a user suggested alternative
2 projects | 30 Nov 2022
Cuallee offers a faster and optimized version of pydeequ, on the Check API through the use of the new Observation API in pyspark. As well as support to Snowpark, Pandas, Polars and DuckDB dataframe abstractions.
- Show HN: Pyspark and Snowpark and Pandas data quality
- Show HN: Cuallee – pyspark data quality framework for v3.3.0
What are some alternatives?
great_expectations - Always know what to expect from your data.
data-diff - Compare tables within or across databases
dbt-data-reliability - dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
dictum - Describe business metrics with YAML, query and visualize in Jupyter with zero SQL
fastexcel - A Python wrapper around calamine
deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
dbt-snowflake-monitoring - A dbt package from SELECT to help you monitor Snowflake performance and costs
ibis - the portable Python dataframe library
pointblank - Data quality assessment and metadata reporting for data frames and database tables
polars-xdt - Polars plugin offering eXtra stuff for DateTimes