opendatadiscovery-specification
awesome-data-catalogs
opendatadiscovery-specification | awesome-data-catalogs | |
---|---|---|
2 | 9 | |
135 | 809 | |
0.7% | 4.8% | |
4.6 | 2.6 | |
5 months ago | 8 months ago | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
opendatadiscovery-specification
-
Show HN: First open source data discovery and observability platform
Thank you!
Actually everything is working on a push basis in ODD now. ODD Platform implements ODD Specification (https://github.com/opendatadiscovery/opendatadiscovery-speci...) and all agents, custom scripts and integrations, Airflow/Spark listeners, etc are pushing metadata to specific ODD Platform's endpoint (https://github.com/opendatadiscovery/opendatadiscovery-speci...). ODD Collectors (agents) are pushing metadata on a configurable schedule.
ODD Specification is a standard for collecting and gathering such metadata, ETL included. We gather metadata for lineage on an entity level now, but we plan to expand this to the column-level lineage at the end 2022 — start 2023. Specification allows us to make the system open and it's really easy to write your own integration by taking a look in what format metadata needs to be injected in the Platform.
ODD Platform has its own OpenAPI specification (https://github.com/opendatadiscovery/odd-platform/tree/main/...) so that the already indexed and layered metadata could be extracted via platform's API.
Also, thank you for sharing links with us! I'm thrilled to take a look how BMW solved a problem of lineage gathering from Spark, that's something we are improving in our product right now.
awesome-data-catalogs
-
How to map out data pipeline of 500-person BI Excel team?
Check out this GitHub awesome list of Data Catalogs.
-
Standalone lineage tool
Maybe what you want i some specification from which you can build something? In that way, perhaps this can help you https://github.com/opendatadiscovery/awesome-data-catalogs. Airflow uses OpenLineage as a way to send their metadata, and Marquez collects them to show them in their UI (https://openlineage.io/docs/guides/airflow), so I suppose you would want to do something similar? But maybe in that GitHub you can find other specifications that can help you better.
- Our data catalog is difficult to manage and not built for the wider org - what can we do?
-
Looking for an "offline" data discovery platform
In order to gain a understanding of the tables and their contents in our company, I have implemented one of the existing [data discovery platforms](https://github.com/opendatadiscovery/awesome-data-catalogs) (in my case [Amundsen](https://www.amundsen.io/))). Unfortunately, Amundsen can only display the tables it has access to.
-
Open source data catalog
I got nice data catalog summary in case anyone would be interested - https://github.com/opendatadiscovery/awesome-data-catalogs. It is probably biased since author is also author of one of the data catalogs, but still can be quite useful :)
- Data Catalog High level feature comparison
- Data Catalog Comparison List
- Awesome-data-catalogs – A curated list of data catalogs
-
Ask HN: Is there any data catalog that targets ML as the first citizen?
Hi, I would like to know is there any opensource data catalog systems that targets machine learning applications (datasets (unstructral, e.g., text, image, and video) and models) as the citizen?
I have read the awesome-data-catalogs ([1]) list but found none of them is treating ML as 1st cizten and the support for datasets and models are not specific enough.
[1]: https://github.com/opendatadiscovery/awesome-data-catalogs
What are some alternatives?
opendatadiscovery-speci
grai-core
odd-platform - First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
spline - Data Lineage Tracking And Visualization Solution
osgeo - The Open Source Geospatial Foundation is not-for-profit organization to empower everyone with open source geospatial. Directly supports projects as an outreach and advocacy organization providing financial, organizational and legal support. Works with our sponsors and partners for open software, standards, data, research and education.