dataquality

Open-source projects categorized as dataquality

Top 11 dataquality Open-Source Projects

  • great_expectations

    Always know what to expect from your data.

  • OpenMetadata

    Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

  • Project mention: How to Dynamically Adjust the Height of a Textarea in ReactJS | dev.to | 2023-10-25

    In this blog post, I have demonstrated how I addressed the challenge of dynamically adjusting the height of a textarea element based on its content, preventing the need for vertical scrolling in the title section of the OpenMetadata Knowledge article page.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • deequ

    Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

  • data-diff

    Compare tables within or across databases

  • Project mention: How to Check 2 SQL Tables Are the Same | news.ycombinator.com | 2023-07-26

    If the issue happen a lot, there is also: https://github.com/datafold/data-diff

    That is a nice tool to do it cross database as well.

    I think it's based on checksum method.

  • soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

  • re_data

    re_data - fix data issues before your users & CEO would discover them 😊

  • zingg

    Scalable identity resolution, entity resolution, data mastering and deduplication using ML

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • cuallee

    Possibly the fastest DataFrame-agnostic quality check library in town.

  • Project mention: Show HN: Snowflake Data Quality Checks in Python | news.ycombinator.com | 2024-02-11
  • fastapi-greatexpectations

    Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool

  • setup-duckdb-action

    πŸ¦† Blazing Fast and highly customizable Github Action to setup a DuckDb runtime

  • Project mention: πŸ¦† Effortless Data Quality w/duckdb on GitHub ♾️ | dev.to | 2023-07-25

    View on GitHub

  • data_check

    data and pipeline testing with and for SQL

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

dataquality related posts

  • How to Dynamically Adjust the Height of a Textarea in ReactJS

    1 project | dev.to | 25 Oct 2023
  • Blog - Project Nessie: A Look in the Depths

    1 project | /r/bigdata | 11 Jul 2023
  • What is your favorite data catalog?

    2 projects | /r/dataengineering | 25 Jun 2023
  • Data Governance Hands On with Amazon DataZone

    1 project | dev.to | 22 May 2023
  • What OSS are you using for data contracts?

    1 project | /r/dataengineering | 3 May 2023
  • Data Quality at Scale with Great Expectations, Spark, and Airflow on EMR

    2 projects | dev.to | 24 Apr 2023
  • Thoughts around decube.io (data observability and catalog platform)

    1 project | /r/dataengineering | 4 Apr 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 12 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more β†’

Index

What are some of the best open-source dataquality projects? This list will help you:

Project Stars
1 great_expectations 9,497
2 OpenMetadata 4,227
3 deequ 3,138
4 data-diff 2,862
5 soda-core 1,768
6 re_data 1,527
7 zingg 886
8 cuallee 110
9 fastapi-greatexpectations 12
10 setup-duckdb-action 5
11 data_check 4

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com