metadata-extractor
datahub
Our great sponsors
metadata-extractor | datahub | |
---|---|---|
1 | 13 | |
2,051 | 5,496 | |
- | 4.7% | |
8.1 | 9.9 | |
9 days ago | 6 days ago | |
Java | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
metadata-extractor
-
metta data with java
Alternatively and much more easily, you can use a library like Apache Commons Imaging or Metadata-Extractor, or run a shell command to read it.
datahub
-
Which data lineage tool did you implement at your company
I've been playing around with https://datahubproject.io which is in quite active development.
-
Metadata extraction and management
Relative;y mature open source projects for this are: https://www.amundsen.io/ and https://datahubproject.io/
-
What is Data Lineage? A thread.
DataHub is really nice.
-
Zero to Deployment and Evolution Data Catalog!
git clone https://github.com/linkedin/datahub.git
-
Can someone explain the big deal with dbt?
Go look up DataHub on Github. https://github.com/linkedin/datahub If they finished the DBT integration and what their writing says is true, then I think you'd be interested
-
Two Methods to Scan for PII in Data Warehouses
An important requirement for data privacy and protection is to find and catalog tables and columns that contain PII or PHI data in a data warehouse. Open source data catalogs like Datahub and Amundsen enable cataloging of information in data warehouses. Moreover, tables and columns can be tagged including PII and type of PII tags.
-
The Next Big Challenge for Data Is Organizational
I've had the same issue with finding more concrete details about it. I've read an entire book on data mesh, and while I get all the concepts, I sense that it's more like agile (a loose methodology) in the sense that it's not going to be something that can be taken off the shelf and applied to all organizations the same way. Basically, here's all these ways to make data handling successful, but how you do it will be specific to your company. Guidelines to follow and pitfalls to avoid as opposed to a specific technology.
That said, check out https://datahubproject.io/ . It's the closest OSS I've found that is following the spirit of Data Mesh.
-
Launch HN: Secoda (YC S21) – Searchable Company Data
Congrats on launching!
So how do you compare to a Data Catalog like datahub? https://datahubproject.io/
From the video you looked very similar to them as a metadata consumer and they provide extensive API integrations so you can add basically any set of metadata you want including slack, jira etc.
-
Metadata and how to capture it
Metadata Engine: - Datahub https://github.com/linkedin/datahub - Amundsen https://github.com/amundsen-io/amundsen/ - Marquez https://marquezproject.github.io/ - Egeria - Open Metadata and Governance https://egeria.odpi.org
-
What do you use for database documentation
Having done that, then if you opt later down the road to use something like (amundsen)[https://www.amundsen.io/) or DataHub (https://github.com/linkedin/datahub) or the commercial Alation (www.alation.com), it can read those comments and populate their UI automatically.
What are some alternatives?
amundsen - Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
OpenMetadata - Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
OpenLineage - An Open Standard for lineage metadata collection
sharp - High performance Node.js image processing, the fastest module to resize JPEG, PNG, WebP, AVIF and TIFF images. Uses the libvips library.
dbt-synapse - dbt adapter for Azure Synapse SQL Dedicated Pools
exiv2 - Image metadata library and tools
monosi - Open source data observability platform
metacat
MetadataExtractor - Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
exifr - 📷 The fastest and most versatile JS EXIF reading library.
TwelveMonkeys - TwelveMonkeys ImageIO: Additional plug-ins and extensions for Java's ImageIO
SchemaCrawler - Free database schema discovery and comprehension tool