Python data-quality

Open-source Python projects categorized as data-quality

Top 15 Python data-quality Projects

data-quality
  • ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

    Project mention: FLaNK 25 December 2023 | dev.to | 2023-12-26
  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • great_expectations

    Always know what to expect from your data.

  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

    Project mention: Ask HN: Not a webdev, why are these sites so good? | news.ycombinator.com | 2024-06-18

    https://cleanlab.ai/

  • feast

    The Open Source Feature Store for Machine Learning

    Project mention: Accelerating into AI: Lessons from AWS | dev.to | 2024-06-12

    Feast is a feature store to help teams track changes during in-house model development.

  • soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

  • cleanvision

    Automatically find issues in image datasets and practice data-centric computer vision.

  • piperider

    Code review for data in dbt

    Project mention: Show HN: PipeRider – open-source Data Impact Analysis for dbt changes | news.ycombinator.com | 2023-09-06
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Encord Active

    Open source active learning toolkit to find failure modes in your computer vision models, prioritize data to label next, and drive data curation to improve model performance.

    Project mention: Launch HN: Encord (YC W21) – Unit testing for computer vision models | news.ycombinator.com | 2024-01-31

    We base our pricing on your user and consumption scale and would be happy to discuss this with you directly. Please feel free to explore the OS version of Active at https://github.com/encord-team/encord-active. Note that some features, such as natural language search using GPU accelerated APIs, are not included in the cloud version.

  • feathub

    FeatHub - A stream-batch unified feature store for real-time machine learning

  • cuallee

    Possibly the fastest DataFrame-agnostic quality check library in town.

    Project mention: Show HN: Snowflake Data Quality Checks in Python | news.ycombinator.com | 2024-02-11
  • swiple

    Swiple enables you to easily observe, understand, validate and improve the quality of your data

  • soda-spark

    Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

  • data-observability-installer

    Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

    Project mention: New: Open Source Data Observability | dev.to | 2024-05-22

    DataKitchen Data Observability Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

  • panda_patrol

    Project mention: Show HN: Data monitoring and profiling with 1 function call | news.ycombinator.com | 2023-12-13
  • data_check

    data and pipeline testing with and for SQL

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-quality discussion

Log in or Post with

Python data-quality related posts

  • Ask HN: Not a webdev, why are these sites so good?

    1 project | news.ycombinator.com | 18 Jun 2024
  • Show HN: Snowflake Data Quality Checks in Python

    1 project | news.ycombinator.com | 11 Feb 2024
  • Show HN: Data monitoring and profiling with 1 function call

    1 project | news.ycombinator.com | 13 Dec 2023
  • [Research] Detecting Annotation Errors in Semantic Segmentation Data

    1 project | /r/MachineLearning | 5 Nov 2023
  • [R] Automated Quality Assurance for Object Detection Datasets

    1 project | /r/computervision | 28 Sep 2023
  • Show HN: PipeRider – open-source Data Impact Analysis for dbt changes

    3 projects | news.ycombinator.com | 6 Sep 2023
  • [D] Is accurately estimating image quality even possible?

    3 projects | /r/MachineLearning | 22 Apr 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 17 Jul 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source data-quality projects in Python? This list will help you:

Project Stars
1 ydata-profiling 12,250
2 great_expectations 9,661
3 cleanlab 9,113
4 feast 5,373
5 soda-core 1,821
6 cleanvision 969
7 piperider 475
8 Encord Active 427
9 feathub 304
10 cuallee 128
11 swiple 78
12 soda-spark 63
13 data-observability-installer 58
14 panda_patrol 21
15 data_check 4

Sponsored
Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com

Did you konow that Python is
the 1st most popular programming language
based on number of metions?