datahub
dbt-synapse
Our great sponsors
datahub | dbt-synapse | |
---|---|---|
34 | 2 | |
9,089 | 62 | |
2.1% | - | |
9.9 | 9.0 | |
6 days ago | 11 days ago | |
Java | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
datahub
- ODD Platform - An open-source data discovery and observability service - v0.12 release
-
What data governance tool are you folks using?
I’m a huge fan of DataHub, the open source data catalogue spun out of LinkedIn, but it’s best thought of as an observability layer for data assets that can be shared by data engineers and analyst-types. For data users: it’s a stellar search/discovery interface (what datasets are there on this keyword, which are most broadly used across the organization, what downstream products are made with this data, what’s it usually joined to, are it’s upstream pipelines reliable). For data engineers, it’s a comprehensive asset cataloger, crawling your warehouse, orchestrator, modeling layers, features, and reports, matching the lineage into a graph where it can.
- Our data catalog is difficult to manage and not built for the wider org - what can we do?
-
Looking for an "offline" data discovery platform
What I am looking for is a solution (similar to Amundsen or [Datahub](https://datahubproject.io/)) that also allows to add tables and their metadata manually.
-
Looking for an open-source data lineage app, where objects and connections can be manually defined (not just automatically ingested)
Hello everyone, I'm looking for an open-source data lineage app (e.g. tokern, datahubproject, openmetadata).
-
Recommended Data Governance solution for smaller businesses?
Check out https://datahubproject.io/ or https://open-metadata.org. both have a free version to try.
-
Metadata Store - Which one to Choose ? OpenMetadata vs Datahub ?
We use Kubernetes as our deployment platform. Any feedback on one of these open source data catalogs ? - https://atlas.apache.org/#/ - https://opendatadiscovery.org/ - https://open-metadata.org/ - https://marquezproject.github.io/marquez/ - https://datahubproject.io/ - https://www.amundsen.io/ - https://ckan.org/ - https://magda.io/
-
What’s your process for deploying a data pipeline from a notebook, running it, and managing it in production?
Something like this? https://datahubproject.io/
-
Field Lineage
There are specialized tools like DataHub (see this for columnar level reporting: https://feature-requests.datahubproject.io/roadmap/541 ) that would help. But really, in a good data platform, the orchestration layer should be aggregating metadata and giving you everything you need to trace lineage, A tool like Dagster does this well if you make full use of the Software Defined Assets capability, but that is fairly new so not so many people have embraced it yet.
-
LinkedDataHub: The Knowledge Graph Notebook
LinkedDataHub, a "RDF-native notebook", is not to be confused with LinkedIn DataHub, which is a metadata store/crawler/ui for your data systems: https://datahubproject.io/.
dbt-synapse
-
How to load parquet files from Azure Data Lake Gen2/Azure Blob Storage to Dedicated pool using dbt?
I'm using dbt-synapse: https://github.com/dbt-msft/dbt-synapse
-
Can someone explain the big deal with dbt?
Have you tried dbt-synapse? I'm the maintainer and I'd love to hear what you think is missing. Also, w.r.t. dbt cloud support for Synapse -- it's something we're working on, but we're we need buy-in from MSFT first!
What are some alternatives?
OpenMetadata - Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
amundsen - Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
OpenLineage - An Open Standard for lineage metadata collection
atlas - A modern tool for managing database schemas
metacat
Atlas - 🚀 An open and lightweight modification to Windows, designed to optimize performance, privacy and security.
monosi - Open source data observability platform
CKAN - CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
SchemaCrawler - Free database schema discovery and comprehension tool
metadata-extractor - Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
grobid - A machine learning software for extracting information from scholarly documents
odd-platform - First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.