Python Data Validation

Open-source Python projects categorized as Data Validation

Top 16 Python Data Validation Projects

  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05

    We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.

  • jsonschema

    An implementation of the JSON Schema specification for Python

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • deepchecks

    Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.

  • Project mention: Detect, Defend, Prevail: Payments Fraud Detection using ML & Deepchecks | dev.to | 2024-01-13

    Also if you have any confusion related to it. You can directly go to their discussion section in github :

  • Cerberus

    Lightweight, extensible data validation library for Python (by pyeve)

  • Project mention: Show HN: Config-file-validator – CLI tool to validate all your config files | news.ycombinator.com | 2023-09-29

    I was expecting this to validate the configuration files are also valid for their use cases, not just valid JSON, TOML, etc.

    If you're looking for that and Python is your jam, the library cerberus[0] is very good at it.

    [0]: https://github.com/pyeve/cerberus

  • pandera

    A light-weight, flexible, and expressive statistical data testing library

  • schema

    Schema validation just got Pythonic

  • Schematics

    Python Data Structures for Humans™.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • voluptuous

    CONTRIBUTIONS ONLY: Voluptuous, despite the name, is a Python data validation library.

  • soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

  • cleanvision

    Automatically find issues in image datasets and practice data-centric computer vision.

  • colander

    A serialization/deserialization/validation library for strings, mappings and lists.

  • Encord Active

    Open source active learning toolkit to find failure modes in your computer vision models, prioritize data to label next, and drive data curation to improve model performance.

  • Project mention: Launch HN: Encord (YC W21) – Unit testing for computer vision models | news.ycombinator.com | 2024-01-31

    We base our pricing on your user and consumption scale and would be happy to discuss this with you directly. Please feel free to explore the OS version of Active at https://github.com/encord-team/encord-active. Note that some features, such as natural language search using GPU accelerated APIs, are not included in the cloud version.

  • valideer

    Lightweight data validation and adaptation Python library.

  • python-codicefiscale

    :it: :credit_card: italian fiscal codes encoding, decoding and validation - codifica, decodifica e validazione del Codice Fiscale italiano.

  • laravel-validation

    A PHP Laravel like validation for python language

  • data_check

    data and pipeline testing with and for SQL

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Data Validation related posts

Index

What are some of the best open-source Data Validation projects in Python? This list will help you:

Project Stars
1 cleanlab 8,592
2 jsonschema 4,431
3 deepchecks 3,338
4 Cerberus 3,106
5 pandera 2,994
6 schema 2,830
7 Schematics 2,571
8 voluptuous 1,798
9 soda-core 1,745
10 cleanvision 919
11 colander 440
12 Encord Active 420
13 valideer 264
14 python-codicefiscale 66
15 laravel-validation 10
16 data_check 4

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com