amundsen
marquez
Our great sponsors
amundsen | marquez | |
---|---|---|
7 | 1 | |
4,276 | 1,617 | |
1.5% | 2.4% | |
7.8 | 9.2 | |
15 days ago | 2 days ago | |
Python | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
amundsen
-
Quick Start Guide to Amundsen Demo 🚀
We'll be using WSL2 for this guide, and we'll start by cloning this repo and its submodules:
-
Apache Atlas or OpenMetaData?
You can use Amundsen data builder to send data to Apache Atlas, https://github.com/amundsen-io/amundsen/blob/main/databuilder/example/scripts/sample_atlas_search_extractor.py If you don’t have to configure Apache Atlas then why not, but the server side validation the last time when I used it was absent. You couldn’t validate the JSON body sent to the REST API endpoints.
-
Searching for Delta Lake Cataloging
Other than that, maybe you could try amundsen (https://github.com/amundsen-io/amundsen/issues/608) which now has a connector to extract delta lake metadata via Spark.
- Help with Data Discoverability in a Data Lake
-
Launch YC S21: Meet the Batch, Thread #6
How does it differ from something like Amundsen : https://github.com/amundsen-io/amundsen
-
Metadata and how to capture it
Metadata Engine: - Datahub https://github.com/linkedin/datahub - Amundsen https://github.com/amundsen-io/amundsen/ - Marquez https://marquezproject.github.io/ - Egeria - Open Metadata and Governance https://egeria.odpi.org
-
The State of Data Engineering in 2021
A final category worth highlighting is Discovery, where it seems every notable company developed an internal Data Catalogue tool that now is available as an open-source or paid service. Some examples are Amundsen (Lyft), Datahub (LinkedIn), Metacat (Netflix), Databook (Uber), and Dataportal (Airbnb).
marquez
What are some alternatives?
datahub - The Metadata Platform for your Data Stack
OpenMetadata - Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
OpenLineage - An Open Standard for lineage metadata collection
metacat
metadata-extractor - Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
sickbeard_mp4_automator - Automatically convert video files to a standardized format with metadata tagging to create a beautiful and uniform media library
SchemaCrawler - Free database schema discovery and comprehension tool
Medusa - Building blocks for digital commerce
Widoco - Wizard for documenting ontologies. WIDOCO is a step by step generator of HTML templates with the documentation of your ontology. It uses the LODE environment to create part of the template.
amundsendatabuilder - Data ingestion library for Amundsen to build graph and search index
grobid - A machine learning software for extracting information from scholarly documents