Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems. Learn more →
Top 11 data-catalog Open-Source Projects
-
Project mention: DataHub: The Data Discovery Platform for the Modern Data Stack | news.ycombinator.com | 2025-02-24
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
Project mention: Show HN: OpenMetadata – OSS platform for data discovery observability governance | news.ycombinator.com | 2024-07-17* It seems like DataHub has an async Kafka ingestion approach while OpenMetadata is API
We do not use Kafka by default. If someone needs kafka they can add it. However for Metadata APIs, we do not feel like Kafka is needed. Lot of projects are getting dependent on Kafka and calling it as real-time. Its unnecessary burden on users who are going to operate in production for 99% of use-cases Kafka is not needed, coming from a Kafka committer :)
2. Yes all of our APIs and Entity definitions are generated using JsonSchema. For us, Json Schema has been awesome, all of our backend / ingestion and UI is generated from JsonSchema and its easy to extend and add new models when needed
3. IMO, we have much more coverage , you can look at the types available here https://github.com/open-metadata/OpenMetadata/tree/main/open... and we are support JsonSchema as a type from a long time
-
amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
-
gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Project mention: What is Data Agent and how to build it in 15 Minutes | news.ycombinator.com | 2024-08-16 -
odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
-
intake
Intake is a lightweight package for finding, investigating, loading and disseminating data. (by intake)
After the MLOps tooling evaluation, our focus shifted to data engineering. Some teams in the company were already using tools like Dask and xarray to manage and process their datasets. The architect was determined to build a data lake for the organization. The vision was to make xarray datasets accessible via Intake, using a Dask-capable computing platform. For the compute platform, we explored services like SaturnCloud and Coiled.
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
-
meteor
Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog. (by raystack)
-
Website: https://ulbmuenster.github.io/dataasee Repository: https://github.com/ulbmuenster/dataasee Companion Paper: https://arxiv.org/abs/2409.05512
-
analytics_data_where_house
An analytics engineering sandbox focusing on real estates prices in Cook County, IL
Project mention: Show HN: OpenTimes – Free travel times between U.S. Census geographies | news.ycombinator.com | 2025-03-17Thank you for this excellent post! I've been developing [my own platform](https://github.com/MattTriano/analytics_data_where_house) that curates a data warehouse mostly of census and socrata datasets but I haven't really had a good way to share the products with anyone as it's a bit too heavyweight. I've been trying to find alternate solutions to that issue (I'm currently building out a much smaller [platform](https://github.com/MattTriano/fbi_cde_data) to process the FBI's NIBRS datasets), and your post has given me a few great implementations to study and experiment with.
Thanks!
data-catalog discussion
data-catalog related posts
-
Amundsen: A data discovery and metadata engine
-
DataHub: The Data Discovery Platform for the Modern Data Stack
-
DataHub: Open-Source Metadata Platform
-
Guided Data Access Patterns: A Deal Breaker for Data Platforms
-
Ask HN: Looking for DB schema management tool
-
Which open source or commercial tools are used for Data Governance and access management
-
ODD Platform - An open-source data discovery and observability service - v0.12 release
-
A note from our sponsor - InfluxDB
influxdata.com | 27 Apr 2025
Index
What are some of the best open-source data-catalog projects? This list will help you:
# | Project | Stars |
---|---|---|
1 | datahub | 10,542 |
2 | OpenMetadata | 6,521 |
3 | amundsen | 4,560 |
4 | gravitino | 1,451 |
5 | odd-platform | 1,311 |
6 | intake | 1,041 |
7 | awesome-data-catalogs | 836 |
8 | recap | 343 |
9 | meteor | 205 |
10 | dataasee | 14 |
11 | analytics_data_where_house | 9 |