Python dataquality

Open-source Python projects categorized as dataquality

Top 6 Python dataquality Projects

  • great_expectations

    Always know what to expect from your data.

  • data-diff

    Compare tables within or across databases

  • Project mention: How to Check 2 SQL Tables Are the Same | news.ycombinator.com | 2023-07-26

    If the issue happen a lot, there is also: https://github.com/datafold/data-diff

    That is a nice tool to do it cross database as well.

    I think it's based on checksum method.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

  • cuallee

    Possibly the fastest DataFrame-agnostic quality check library in town.

  • Project mention: Show HN: Snowflake Data Quality Checks in Python | news.ycombinator.com | 2024-02-11
  • fastapi-greatexpectations

    Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool

  • data_check

    data and pipeline testing with and for SQL

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python dataquality related posts

Index

What are some of the best open-source dataquality projects in Python? This list will help you:

Project Stars
1 great_expectations 9,440
2 data-diff 2,830
3 soda-core 1,751
4 cuallee 105
5 fastapi-greatexpectations 12
6 data_check 4

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com