awesome-data-catalogs
opendatadiscovery-specification
awesome-data-catalogs | opendatadiscovery-specification | |
---|---|---|
9 | 2 | |
586 | 117 | |
4.3% | 2.6% | |
4.2 | 6.2 | |
8 months ago | 22 days ago | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
awesome-data-catalogs
-
How to map out data pipeline of 500-person BI Excel team?
Check out this GitHub awesome list of Data Catalogs.
-
Standalone lineage tool
Maybe what you want i some specification from which you can build something? In that way, perhaps this can help you https://github.com/opendatadiscovery/awesome-data-catalogs. Airflow uses OpenLineage as a way to send their metadata, and Marquez collects them to show them in their UI (https://openlineage.io/docs/guides/airflow), so I suppose you would want to do something similar? But maybe in that GitHub you can find other specifications that can help you better.
- Our data catalog is difficult to manage and not built for the wider org - what can we do?
-
Looking for an "offline" data discovery platform
In order to gain a understanding of the tables and their contents in our company, I have implemented one of the existing [data discovery platforms](https://github.com/opendatadiscovery/awesome-data-catalogs) (in my case [Amundsen](https://www.amundsen.io/))). Unfortunately, Amundsen can only display the tables it has access to.
-
Open source data catalog
I got nice data catalog summary in case anyone would be interested - https://github.com/opendatadiscovery/awesome-data-catalogs. It is probably biased since author is also author of one of the data catalogs, but still can be quite useful :)
- Data Catalog High level feature comparison
- Data Catalog Comparison List
- Awesome-data-catalogs – A curated list of data catalogs
-
Ask HN: Is there any data catalog that targets ML as the first citizen?
Hi, I would like to know is there any opensource data catalog systems that targets machine learning applications (datasets (unstructral, e.g., text, image, and video) and models) as the citizen?
I have read the awesome-data-catalogs ([1]) list but found none of them is treating ML as 1st cizten and the support for datasets and models are not specific enough.
[1]: https://github.com/opendatadiscovery/awesome-data-catalogs
opendatadiscovery-specification
-
Show HN: First open source data discovery and observability platform
Thank you!
Actually everything is working on a push basis in ODD now. ODD Platform implements ODD Specification (https://github.com/opendatadiscovery/opendatadiscovery-speci...) and all agents, custom scripts and integrations, Airflow/Spark listeners, etc are pushing metadata to specific ODD Platform's endpoint (https://github.com/opendatadiscovery/opendatadiscovery-speci...). ODD Collectors (agents) are pushing metadata on a configurable schedule.
ODD Specification is a standard for collecting and gathering such metadata, ETL included. We gather metadata for lineage on an entity level now, but we plan to expand this to the column-level lineage at the end 2022 — start 2023. Specification allows us to make the system open and it's really easy to write your own integration by taking a look in what format metadata needs to be injected in the Platform.
ODD Platform has its own OpenAPI specification (https://github.com/opendatadiscovery/odd-platform/tree/main/...) so that the already indexed and layered metadata could be extracted via platform's API.
Also, thank you for sharing links with us! I'm thrilled to take a look how BMW solved a problem of lineage gathering from Spark, that's something we are improving in our product right now.
What are some alternatives?
odd-platform - First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
android-analytics-debugger - The Avo Android analytics debugger
spline - Data Lineage Tracking And Visualization Solution
grai-core
opendatadiscovery-speci
OpenMetadata - Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
datahub - The Metadata Platform for your Data Stack
metadata-guardian - Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️
awesome-italian-public-datasets - A selection of interesting Open dataset from the Italian Public Administration and Civic Data use cases
awful-oss-incidents - 🤬 A categorized list of incidents caused by unappreciated OSS maintainers or underfunded OSS projects. Feedback welcome!