datahub
OpenMetadata
datahub | OpenMetadata | |
---|---|---|
35 | 29 | |
9,977 | 5,653 | |
1.4% | 3.9% | |
10.0 | 10.0 | |
3 days ago | 3 days ago | |
Java | TypeScript | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
datahub
-
Guided Data Access Patterns: A Deal Breaker for Data Platforms
There are several commercial providers, but I would definitely recommend Data Hub Project. DataHub Project is an open-source metadata platform that serves as an extensible data catalog and supports data discovery, data observability, and federated governance to address the complexity of the data ecosystem. The data catalog enables the combination of technical, operational and business metadata to provide a 360-degree view of data entities. DataHub makes it possible to pre-enrich important metadata using shift-left practices and respond to changes in real time.
-
Ask HN: Looking for DB schema management tool
Sounds like you are looking for a data catalog tool instead of db schema management tool. You can check out Amundsen (https://www.amundsen.io/), DataHub (https://datahubproject.io/)
If you are looking for schema change management tool, then you can check out Bytebase (bytebase.com). But it can't answer questions like "which collections contain links to bigmongo.user.id?"
-
Which open source or commercial tools are used for Data Governance and access management
IIUC DataHub (open source project out of LinkedIn) might be relevant here
- ODD Platform - An open-source data discovery and observability service - v0.12 release
-
What data governance tool are you folks using?
I’m a huge fan of DataHub, the open source data catalogue spun out of LinkedIn, but it’s best thought of as an observability layer for data assets that can be shared by data engineers and analyst-types. For data users: it’s a stellar search/discovery interface (what datasets are there on this keyword, which are most broadly used across the organization, what downstream products are made with this data, what’s it usually joined to, are it’s upstream pipelines reliable). For data engineers, it’s a comprehensive asset cataloger, crawling your warehouse, orchestrator, modeling layers, features, and reports, matching the lineage into a graph where it can.
- Our data catalog is difficult to manage and not built for the wider org - what can we do?
-
What's the best way to build documentation for a data infrastructure? any existing tools
If you are looking for a data cataloguing solution, look at Datahub. Haven't used it, but heard good things about it.
-
Looking for an "offline" data discovery platform
What I am looking for is a solution (similar to Amundsen or [Datahub](https://datahubproject.io/)) that also allows to add tables and their metadata manually.
-
Looking for an open-source data lineage app, where objects and connections can be manually defined (not just automatically ingested)
Hello everyone, I'm looking for an open-source data lineage app (e.g. tokern, datahubproject, openmetadata).
-
How do you document your dashboards?
What about DataHub? Haven't really used it but I'm actively reading about it and about to use it for some light documentation for some small pipelines.
OpenMetadata
-
Show HN: OpenMetadata – OSS platform for data discovery observability governance
* It seems like DataHub has an async Kafka ingestion approach while OpenMetadata is API
We do not use Kafka by default. If someone needs kafka they can add it. However for Metadata APIs, we do not feel like Kafka is needed. Lot of projects are getting dependent on Kafka and calling it as real-time. Its unnecessary burden on users who are going to operate in production for 99% of use-cases Kafka is not needed, coming from a Kafka committer :)
2. Yes all of our APIs and Entity definitions are generated using JsonSchema. For us, Json Schema has been awesome, all of our backend / ingestion and UI is generated from JsonSchema and its easy to extend and add new models when needed
3. IMO, we have much more coverage , you can look at the types available here https://github.com/open-metadata/OpenMetadata/tree/main/open... and we are support JsonSchema as a type from a long time
- OpenMetadata: Join the #1 Open Source Data Community
-
How to Dynamically Adjust the Height of a Textarea in ReactJS
In this blog post, I have demonstrated how I addressed the challenge of dynamically adjusting the height of a textarea element based on its content, preventing the need for vertical scrolling in the title section of the OpenMetadata Knowledge article page.
-
Blog - Project Nessie: A Look in the Depths
How does this compare with https://github.com/open-metadata/OpenMetadata
-
What is your favorite data catalog?
u/cmcau try https://open-metadata.org much easier to setup , for details https://docs.open-metadata.org and for any support https://slack.open-metadata.org
-
Data Governance Hands On with Amazon DataZone
Then, a pool of tools appeared on the market with features that allow covering some of the challenges cited, especially those related to data cataloging. Informatica's tool is perhaps the best known among the licensed. Among the open source tools, I highlight Data Hub (www.datahubproject.io) developed on LinkedIn, Open Metadata (https://open-metadata.org/) and Amundsen (https://www.amundsen.io /) powered by Lyft. In addition to cataloging and discovering data artifacts, these tools allow for a view of data lineage, including technical documentation and business terms, and building relationships between data artifacts. Also, it is possible to register data owners, the people responsible for the data in those tools. This greatly facilitates access request and evaluation process (which today is a major bottleneck).
-
What OSS are you using for data contracts?
Probably, in order to have it integrate with tools like OpenLineage and OpenMetadata and such I will have to make open-source contributions.
-
Thoughts around decube.io (data observability and catalog platform)
We are the team behind OpenMetadata . Our mission is to build a centralized metadata platform that offers data discovery, collaboration, governance and quality. We believe that having tool for each of these categories not only result user frustration but metadata silos.
-
Great expectations?
As anyone ever tried open metadata for data QA testing? Curious about that https://open-metadata.org/
-
Our data catalog is difficult to manage and not built for the wider org - what can we do?
We're looking to PoC https://open-metadata.org/ shortly
What are some alternatives?
amundsen - Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
marquez - Collect, aggregate, and visualize a data ecosystem's metadata
OpenLineage - An Open Standard for lineage metadata collection
odd-platform - First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
atlas - Manage your database schema as code
Hyperactive - An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.
metacat
Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
Atlas - 🚀 An open and lightweight modification to Windows, designed to optimize performance, privacy and usability.
big-data-pipeline-lambda-arch - A full big data pipeline (Lambda Architecture) with Spark, Kafka, HDFS and Cassandra.
monosi - Open source data observability platform
awesome-wardley-maps - Wardley maps community hub. Useful Wardley mapping resources