Python dataquality

Open-source Python projects categorized as dataquality

Top 7 Python dataquality Projects

dataquality
  1. great_expectations

    Always know what to expect from your data.

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

    Project mention: Ask HN: Not a webdev, why are these sites so good? | news.ycombinator.com | 2024-06-18

    https://cleanlab.ai/

  4. soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

  5. cuallee

    Possibly the fastest DataFrame-agnostic quality check library in town.

  6. data-observability-installer

    Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

    Project mention: New: Open Source Data Observability | dev.to | 2024-05-22

    DataKitchen Data Observability Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

  7. fastapi-greatexpectations

    Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool

  8. data_check

    data and pipeline testing with and for SQL

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python dataquality discussion

Log in or Post with

Python dataquality related posts

  • Data Quality at Scale with Great Expectations, Spark, and Airflow on EMR

    2 projects | dev.to | 24 Apr 2023
  • Soda Core (OSS) is now GA! So, why should you add checks to your data pipelines?

    3 projects | /r/dataengineering | 28 Jun 2022
  • Greatexpectations - Always know what to expect from your data.

    1 project | /r/github_trends | 7 May 2022
  • Greatexpectations – Always know what to expect from your data

    1 project | news.ycombinator.com | 7 May 2022
  • Package for drift detection

    2 projects | /r/mlops | 6 Apr 2022
  • [D] Do you use data engineering pipelines for real life projects?

    1 project | /r/MachineLearning | 1 Apr 2022
  • Launch HN: Elementary (YC W22) – Open-source data observability

    7 projects | news.ycombinator.com | 4 Mar 2022
  • A note from our sponsor - CodeRabbit
    coderabbit.ai | 18 Mar 2025
    Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →

Index

What are some of the best open-source dataquality projects in Python? This list will help you:

# Project Stars
1 great_expectations 10,253
2 cleanlab 10,227
3 soda-core 2,036
4 cuallee 183
5 data-observability-installer 108
6 fastapi-greatexpectations 12
7 data_check 4

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai