|4 days ago||4 days ago|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ODD Platform - An open-source data discovery and observability service - v0.12 release
2 projects | /r/dataengineering | 10 May 2023
What data governance tool are you folks using?
3 projects | /r/dataengineering | 26 Mar 2023
I’m a huge fan of DataHub, the open source data catalogue spun out of LinkedIn, but it’s best thought of as an observability layer for data assets that can be shared by data engineers and analyst-types. For data users: it’s a stellar search/discovery interface (what datasets are there on this keyword, which are most broadly used across the organization, what downstream products are made with this data, what’s it usually joined to, are it’s upstream pipelines reliable). For data engineers, it’s a comprehensive asset cataloger, crawling your warehouse, orchestrator, modeling layers, features, and reports, matching the lineage into a graph where it can.
Our data catalog is difficult to manage and not built for the wider org - what can we do?
4 projects | /r/dataengineering | 10 Mar 2023
Looking for an "offline" data discovery platform
2 projects | /r/dataengineering | 21 Feb 2023
What I am looking for is a solution (similar to Amundsen or [Datahub](https://datahubproject.io/)) that also allows to add tables and their metadata manually.
Looking for an open-source data lineage app, where objects and connections can be manually defined (not just automatically ingested)
3 projects | /r/dataengineering | 5 Feb 2023
Hello everyone, I'm looking for an open-source data lineage app (e.g. tokern, datahubproject, openmetadata).
Recommended Data Governance solution for smaller businesses?
2 projects | /r/dataengineering | 19 Dec 2022
Check out https://datahubproject.io/ or https://open-metadata.org. both have a free version to try.
Metadata Store - Which one to Choose ? OpenMetadata vs Datahub ?
5 projects | /r/dataengineering | 17 Nov 2022
We use Kubernetes as our deployment platform. Any feedback on one of these open source data catalogs ? - https://atlas.apache.org/#/ - https://opendatadiscovery.org/ - https://open-metadata.org/ - https://marquezproject.github.io/marquez/ - https://datahubproject.io/ - https://www.amundsen.io/ - https://ckan.org/ - https://magda.io/
What’s your process for deploying a data pipeline from a notebook, running it, and managing it in production?
4 projects | /r/dataengineering | 13 Oct 2022
Something like this? https://datahubproject.io/
4 projects | /r/dataengineering | 2 Aug 2022
There are specialized tools like DataHub (see this for columnar level reporting: https://feature-requests.datahubproject.io/roadmap/541 ) that would help. But really, in a good data platform, the orchestration layer should be aggregating metadata and giving you everything you need to trace lineage, A tool like Dagster does this well if you make full use of the Software Defined Assets capability, but that is fairly new so not so many people have embraced it yet.
LinkedDataHub: The Knowledge Graph Notebook
5 projects | news.ycombinator.com | 23 Jun 2022
LinkedDataHub, a "RDF-native notebook", is not to be confused with LinkedIn DataHub, which is a metadata store/crawler/ui for your data systems: https://datahubproject.io/.
What are some alternatives?
OpenMetadata - Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
amundsen - Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
OpenLineage - An Open Standard for lineage metadata collection
atlas - A modern tool for managing database schemas
monosi - Open source data observability platform
dbt-synapse - dbt adapter for Azure Synapse Dedicated SQL Pools
Atlas - 🚀 An open and transparent modification to Windows, designed to optimize performance and latency.
sharp - High performance Node.js image processing, the fastest module to resize JPEG, PNG, WebP, AVIF and TIFF images. Uses the libvips library.
SchemaCrawler - Free database schema discovery and comprehension tool
exiv2 - Image metadata library and tools
grobid - A machine learning software for extracting information from scholarly documents