data-discovery

Open-source projects categorized as data-discovery

Top 12 data-discovery Open-Source Projects

data-discovery
  1. applied-ml

    📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. datahub

    The Metadata Platform for your Data and AI Stack

    Project mention: DataHub: The Data Discovery Platform for the Modern Data Stack | news.ycombinator.com | 2025-02-24
  4. OpenMetadata

    OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

    Project mention: Show HN: OpenMetadata – OSS platform for data discovery observability governance | news.ycombinator.com | 2024-07-17

    * It seems like DataHub has an async Kafka ingestion approach while OpenMetadata is API

    We do not use Kafka by default. If someone needs kafka they can add it. However for Metadata APIs, we do not feel like Kafka is needed. Lot of projects are getting dependent on Kafka and calling it as real-time. Its unnecessary burden on users who are going to operate in production for 99% of use-cases Kafka is not needed, coming from a Kafka committer :)

    2. Yes all of our APIs and Entity definitions are generated using JsonSchema. For us, Json Schema has been awesome, all of our backend / ingestion and UI is generated from JsonSchema and its easy to extend and add new models when needed

    3. IMO, we have much more coverage , you can look at the types available here https://github.com/open-metadata/OpenMetadata/tree/main/open... and we are support JsonSchema as a type from a long time

  5. amundsen

    Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

    Project mention: Amundsen: A data discovery and metadata engine | news.ycombinator.com | 2025-03-20
  6. marquez

    Collect, aggregate, and visualize a data ecosystem's metadata

  7. sqllineage

    SQL Lineage Analysis Tool powered by Python

  8. odd-platform

    First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. awesome-data-catalogs

    📙 Awesome Data Catalogs and Observability Platforms.

  11. recap

    Work with your web service, database, and streaming schemas in a single format.

  12. opendatadiscovery-specification

    ODD Specification is a universal open standard for collecting metadata.

  13. dataasee

    DatAasee - A Metadata-Lake for Libraries

    Project mention: One Minute: DatAasee | dev.to | 2024-09-10

    Website: https://ulbmuenster.github.io/dataasee Repository: https://github.com/ulbmuenster/dataasee Companion Paper: https://arxiv.org/abs/2409.05512

  14. analytics_data_where_house

    An analytics engineering sandbox focusing on real estates prices in Cook County, IL

    Project mention: Show HN: OpenTimes – Free travel times between U.S. Census geographies | news.ycombinator.com | 2025-03-17

    Thank you for this excellent post! I've been developing [my own platform](https://github.com/MattTriano/analytics_data_where_house) that curates a data warehouse mostly of census and socrata datasets but I haven't really had a good way to share the products with anyone as it's a bit too heavyweight. I've been trying to find alternate solutions to that issue (I'm currently building out a much smaller [platform](https://github.com/MattTriano/fbi_cde_data) to process the FBI's NIBRS datasets), and your post has given me a few great implementations to study and experiment with.

    Thanks!

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

data-discovery discussion

Log in or Post with

data-discovery related posts

  • Amundsen: A data discovery and metadata engine

    1 project | news.ycombinator.com | 20 Mar 2025
  • DataHub: The Data Discovery Platform for the Modern Data Stack

    1 project | news.ycombinator.com | 24 Feb 2025
  • DataHub: Open-Source Metadata Platform

    1 project | news.ycombinator.com | 23 Feb 2025
  • Guided Data Access Patterns: A Deal Breaker for Data Platforms

    1 project | dev.to | 15 May 2024
  • Ask HN: Looking for DB schema management tool

    1 project | news.ycombinator.com | 24 Oct 2023
  • Which open source or commercial tools are used for Data Governance and access management

    1 project | /r/dataengineering | 22 Jun 2023
  • ODD Platform - An open-source data discovery and observability service - v0.12 release

    2 projects | /r/dataengineering | 10 May 2023
  • A note from our sponsor - CodeRabbit
    coderabbit.ai | 23 Mar 2025
    Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →

Index

What are some of the best open-source data-discovery projects? This list will help you:

# Project Stars
1 applied-ml 27,831
2 datahub 10,436
3 OpenMetadata 6,314
4 amundsen 4,533
5 marquez 1,878
6 sqllineage 1,436
7 odd-platform 1,295
8 awesome-data-catalogs 809
9 recap 344
10 opendatadiscovery-specification 135
11 dataasee 14
12 analytics_data_where_house 9

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?