marquez
amundsen
marquez | amundsen | |
---|---|---|
1 | 7 | |
1,617 | 4,276 | |
0.9% | 0.6% | |
9.2 | 7.8 | |
7 days ago | 20 days ago | |
Java | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
marquez
amundsen
-
Quick Start Guide to Amundsen Demo 🚀
We'll be using WSL2 for this guide, and we'll start by cloning this repo and its submodules:
-
Apache Atlas or OpenMetaData?
You can use Amundsen data builder to send data to Apache Atlas, https://github.com/amundsen-io/amundsen/blob/main/databuilder/example/scripts/sample_atlas_search_extractor.py If you don’t have to configure Apache Atlas then why not, but the server side validation the last time when I used it was absent. You couldn’t validate the JSON body sent to the REST API endpoints.
-
Searching for Delta Lake Cataloging
Other than that, maybe you could try amundsen (https://github.com/amundsen-io/amundsen/issues/608) which now has a connector to extract delta lake metadata via Spark.
- Help with Data Discoverability in a Data Lake
-
Launch YC S21: Meet the Batch, Thread #6
How does it differ from something like Amundsen : https://github.com/amundsen-io/amundsen
-
Metadata and how to capture it
Metadata Engine: - Datahub https://github.com/linkedin/datahub - Amundsen https://github.com/amundsen-io/amundsen/ - Marquez https://marquezproject.github.io/ - Egeria - Open Metadata and Governance https://egeria.odpi.org
-
The State of Data Engineering in 2021
A final category worth highlighting is Discovery, where it seems every notable company developed an internal Data Catalogue tool that now is available as an open-source or paid service. Some examples are Amundsen (Lyft), Datahub (LinkedIn), Metacat (Netflix), Databook (Uber), and Dataportal (Airbnb).
What are some alternatives?
OpenMetadata - Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
datahub - The Metadata Platform for your Data Stack
OpenLineage - An Open Standard for lineage metadata collection
metadata-extractor - Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
metacat
SchemaCrawler - Free database schema discovery and comprehension tool
sickbeard_mp4_automator - Automatically convert video files to a standardized format with metadata tagging to create a beautiful and uniform media library
Widoco - Wizard for documenting ontologies. WIDOCO is a step by step generator of HTML templates with the documentation of your ontology. It uses the LODE environment to create part of the template.
Medusa - Building blocks for digital commerce
grobid - A machine learning software for extracting information from scholarly documents
amundsendatabuilder - Data ingestion library for Amundsen to build graph and search index