data-governance

Open-source projects categorized as data-governance

Top 12 data-governance Open-Source Projects

data-governance
  1. OpenMetadata

    OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

    Project mention: Show HN: OpenMetadata – OSS platform for data discovery observability governance | news.ycombinator.com | 2024-07-17

    * It seems like DataHub has an async Kafka ingestion approach while OpenMetadata is API

    We do not use Kafka by default. If someone needs kafka they can add it. However for Metadata APIs, we do not feel like Kafka is needed. Lot of projects are getting dependent on Kafka and calling it as real-time. Its unnecessary burden on users who are going to operate in production for 99% of use-cases Kafka is not needed, coming from a Kafka committer :)

    2. Yes all of our APIs and Entity definitions are generated using JsonSchema. For us, Json Schema has been awesome, all of our backend / ingestion and UI is generated from JsonSchema and its easy to extend and add new models when needed

    3. IMO, we have much more coverage , you can look at the types available here https://github.com/open-metadata/OpenMetadata/tree/main/open... and we are support JsonSchema as a type from a long time

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

  4. elementary

    The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

  5. marquez

    Collect, aggregate, and visualize a data ecosystem's metadata

  6. sqllineage

    SQL Lineage Analysis Tool powered by Python

  7. odd-platform

    First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

  8. data-drift

    Metrics Observability & Troubleshooting

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. data-lineage

    Generate and Visualize Data Lineage from query history

  11. opendatadiscovery-specification

    ODD Specification is a universal open standard for collecting metadata.

  12. conduktor-poc-kafka-protocol

    POC to demonstrate how to alter incoming/outgoing records in Kafka. It's a toy, don't use it in production.

    Project mention: The Data Security Duo: Data Encryption and Vulnerability Scans | dev.to | 2024-07-28

    Sidecar/Proxy Approach: Passing sensitive data through a proxy for additional encryption processing is another approach. While this can be effective, deploying sidecars or proxies can be challenging depending on the infrastructure setup. Additionally, data security often needs to be schema-aware, making it difficult for sidecar or proxy layers to handle without additional client-side implementation. Despite these challenges, this approach is framework and client-agnostic, making it easier to implement across diverse data ecosystems. Examples of such offerings include Conduktor.

  13. bufstream-demo

    A demo of Bufstream, a drop-in replacement for Apache Kafka that's 8x less expensive to operate and brings broker-side schema awareness to Kafka

    Project mention: Jepsen: Bufstream 0.1.0 | news.ycombinator.com | 2024-11-12

    I’m looking at the product page [0] and wondering how those two statements are compatible:

    > Bufstream runs fully within your AWS or GCP VPC, giving you complete control over your data, metadata, and uptime. Unlike the alternatives, Bufstream never phones home.

    > Bufstream pricing is simple: just $0.002 per uncompressed GiB written (about $2 per TiB). We don't charge any per-core, per-agent, or per-call fees.

    Surely they wouldn’t run their entire business on the honor system?

    [0] https://buf.build/product/bufstream

  14. AI-Data-Guard

    AI Data Guard

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

data-governance discussion

Log in or Post with

data-governance related posts

  • The Data Security Duo: Data Encryption and Vulnerability Scans

    3 projects | dev.to | 28 Jul 2024
  • AI-Data-Guard: Robots.txt Analysis for GPT Bots

    2 projects | news.ycombinator.com | 14 Sep 2023
  • Open source data observability tools with UI?

    4 projects | /r/dataengineering | 18 Mar 2023
  • SQL “Visualization” Website/Resource?

    1 project | /r/SQL | 7 Jul 2022
  • Open source dbt tests monitoring

    1 project | news.ycombinator.com | 23 Jun 2022
  • Suggestions for open source anomaly-detection, linting and metadata solutions?

    1 project | /r/dataengineering | 7 Apr 2022
  • Data lineage info to a table in the DWH

    1 project | /r/dataengineering | 10 Feb 2022
  • A note from our sponsor - SaaSHub
    www.saashub.com | 18 Mar 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source data-governance projects? This list will help you:

# Project Stars
1 OpenMetadata 6,264
2 soda-core 2,036
3 elementary 2,018
4 marquez 1,869
5 sqllineage 1,432
6 odd-platform 1,294
7 data-drift 318
8 data-lineage 316
9 opendatadiscovery-specification 135
10 conduktor-poc-kafka-protocol 63
11 bufstream-demo 38
12 AI-Data-Guard 3

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?