OpenMetadata
datahub
Our great sponsors
OpenMetadata | datahub | |
---|---|---|
26 | 34 | |
3,895 | 9,089 | |
9.7% | 2.1% | |
10.0 | 9.9 | |
6 days ago | 6 days ago | |
TypeScript | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OpenMetadata
-
What is your favorite data catalog?
I recommend considering the use of a centralized metadata platform like OpenMetadata (http://open-metadata.org) although I am may be biased, coming from open-metadata.org core team
u/cmcau try https://open-metadata.org much easier to setup , for details https://docs.open-metadata.org and for any support https://slack.open-metadata.org
-
Our data catalog is difficult to manage and not built for the wider org - what can we do?
We're looking to PoC https://open-metadata.org/ shortly
-
Looking for an open-source data lineage app, where objects and connections can be manually defined (not just automatically ingested)
Hello everyone, I'm looking for an open-source data lineage app (e.g. tokern, datahubproject, openmetadata).
-
Ask HN: Do you use JSON Schema? Help us shape its future stability guarantees
We at OpenMetadata(https://open-metadata.org) use JsonSchema extensively to define the metadata standards. JsonSchema is one of the reasons we are able to ship and get the project to what it is today in quick time. More about it here https://www.youtube.com/watch?v=ZrVTZwmTR3k
-
Open source data catalog
OpenMetadata ?
-
Recommended Data Governance solution for smaller businesses?
Check out https://datahubproject.io/ or https://open-metadata.org. both have a free version to try.
-
Great Expectations is annoyingly cumbersome
Hi folks, Pere here, part of the development team behind OpenMetadata.
-
Metadata Store - Which one to Choose ? OpenMetadata vs Datahub ?
We use Kubernetes as our deployment platform. Any feedback on one of these open source data catalogs ? - https://atlas.apache.org/#/ - https://opendatadiscovery.org/ - https://open-metadata.org/ - https://marquezproject.github.io/marquez/ - https://datahubproject.io/ - https://www.amundsen.io/ - https://ckan.org/ - https://magda.io/
-
Which data lineage tool did you implement at your company
We use Open Metadata (https://open-metadata.org/). Great tool and it is open source!
datahub
- ODD Platform - An open-source data discovery and observability service - v0.12 release
-
What data governance tool are you folks using?
I’m a huge fan of DataHub, the open source data catalogue spun out of LinkedIn, but it’s best thought of as an observability layer for data assets that can be shared by data engineers and analyst-types. For data users: it’s a stellar search/discovery interface (what datasets are there on this keyword, which are most broadly used across the organization, what downstream products are made with this data, what’s it usually joined to, are it’s upstream pipelines reliable). For data engineers, it’s a comprehensive asset cataloger, crawling your warehouse, orchestrator, modeling layers, features, and reports, matching the lineage into a graph where it can.
- Our data catalog is difficult to manage and not built for the wider org - what can we do?
-
Looking for an "offline" data discovery platform
What I am looking for is a solution (similar to Amundsen or [Datahub](https://datahubproject.io/)) that also allows to add tables and their metadata manually.
-
Looking for an open-source data lineage app, where objects and connections can be manually defined (not just automatically ingested)
Hello everyone, I'm looking for an open-source data lineage app (e.g. tokern, datahubproject, openmetadata).
-
Recommended Data Governance solution for smaller businesses?
Check out https://datahubproject.io/ or https://open-metadata.org. both have a free version to try.
-
Metadata Store - Which one to Choose ? OpenMetadata vs Datahub ?
We use Kubernetes as our deployment platform. Any feedback on one of these open source data catalogs ? - https://atlas.apache.org/#/ - https://opendatadiscovery.org/ - https://open-metadata.org/ - https://marquezproject.github.io/marquez/ - https://datahubproject.io/ - https://www.amundsen.io/ - https://ckan.org/ - https://magda.io/
-
What’s your process for deploying a data pipeline from a notebook, running it, and managing it in production?
Something like this? https://datahubproject.io/
-
Field Lineage
There are specialized tools like DataHub (see this for columnar level reporting: https://feature-requests.datahubproject.io/roadmap/541 ) that would help. But really, in a good data platform, the orchestration layer should be aggregating metadata and giving you everything you need to trace lineage, A tool like Dagster does this well if you make full use of the Software Defined Assets capability, but that is fairly new so not so many people have embraced it yet.
-
LinkedDataHub: The Knowledge Graph Notebook
LinkedDataHub, a "RDF-native notebook", is not to be confused with LinkedIn DataHub, which is a metadata store/crawler/ui for your data systems: https://datahubproject.io/.
What are some alternatives?
amundsen - Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
OpenLineage - An Open Standard for lineage metadata collection
atlas - A modern tool for managing database schemas
marquez - Collect, aggregate, and visualize a data ecosystem's metadata
metacat
odd-platform - First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Atlas - 🚀 An open and lightweight modification to Windows, designed to optimize performance, privacy and security.
monosi - Open source data observability platform
dbt-synapse - dbt adapter for Azure Synapse Dedicated SQL Pools
CKAN - CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
SchemaCrawler - Free database schema discovery and comprehension tool
metadata-extractor - Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files