awesome-data-catalogs
grai-core
awesome-data-catalogs | grai-core | |
---|---|---|
9 | 6 | |
829 | 303 | |
5.1% | 0.7% | |
4.2 | 8.2 | |
9 days ago | 22 days ago | |
Python | ||
MIT License | MIT No Attribution |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
awesome-data-catalogs
-
How to map out data pipeline of 500-person BI Excel team?
Check out this GitHub awesome list of Data Catalogs.
-
Standalone lineage tool
Maybe what you want i some specification from which you can build something? In that way, perhaps this can help you https://github.com/opendatadiscovery/awesome-data-catalogs. Airflow uses OpenLineage as a way to send their metadata, and Marquez collects them to show them in their UI (https://openlineage.io/docs/guides/airflow), so I suppose you would want to do something similar? But maybe in that GitHub you can find other specifications that can help you better.
- Our data catalog is difficult to manage and not built for the wider org - what can we do?
-
Looking for an "offline" data discovery platform
In order to gain a understanding of the tables and their contents in our company, I have implemented one of the existing [data discovery platforms](https://github.com/opendatadiscovery/awesome-data-catalogs) (in my case [Amundsen](https://www.amundsen.io/))). Unfortunately, Amundsen can only display the tables it has access to.
-
Open source data catalog
I got nice data catalog summary in case anyone would be interested - https://github.com/opendatadiscovery/awesome-data-catalogs. It is probably biased since author is also author of one of the data catalogs, but still can be quite useful :)
- Data Catalog High level feature comparison
- Data Catalog Comparison List
- Awesome-data-catalogs โ A curated list of data catalogs
-
Ask HN: Is there any data catalog that targets ML as the first citizen?
Hi, I would like to know is there any opensource data catalog systems that targets machine learning applications (datasets (unstructral, e.g., text, image, and video) and models) as the citizen?
I have read the awesome-data-catalogs ([1]) list but found none of them is treating ML as 1st cizten and the support for datasets and models are not specific enough.
[1]: https://github.com/opendatadiscovery/awesome-data-catalogs
grai-core
-
Launch HN: Grai (YC S22) โ Open-Source Data Observability Platform
Elastic v2 if one is interested in such things: https://github.com/grai-io/grai-core/blob/v0.1.33/LICENSE
-
Standalone lineage tool
Iโm not sure if this is precisely what youโre looking for but Grai might serve your needs. The backend data model allows you to push any arbitrary metadata you want / need onto the lineage graph and retrieve it either through the rest or graph API. Iโm one of the authors so happy to answer any questions you might have.
-
Data Load Diagram
We've been looking at building something like this for Grai specifically to support Airflow but haven't yet prioritized it.
-
Grai, a self-hosted data lineage tool. Test downstream impact of data migration changes
We were frustrated because although we had tests in our data warehouse, they only notified us after an outage occurred. What we needed was a way to detect changes during CI/CD, so we could fix things before they impacted production. So we developed Grai, as an open-source data lineage toolkit pre-built integrations for the most common data stores and designed to work with CI tools, like Github Actions.
What are some alternatives?
odd-platform - First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
django-dbbackup - Management commands to help backup and restore your project database and media files
opendatadiscovery-specification - ODD Specification is a universal open standard for collecting metadata.
dbt-snowflake-monitoring - A dbt package from SELECT to help you monitor Snowflake performance and costs
osgeo - The Open Source Geospatial Foundation is not-for-profit organization to empower everyone with open source geospatial. Directly supports projects as an outreach and advocacy organization providing financial, organizational and legal support. Works with our sponsors and partners for open software, standards, data, research and education.
jupysql - Better SQL in Jupyter. ๐