Field Lineage

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • hamilton

    Discontinued A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton (by stitchfix)

  • If you're want to do more python https://github.com/stitchfix/hamilton allows you to model dependencies at a columnar (field) level.

  • datahub

    The Metadata Platform for your Data Stack

  • There are specialized tools like DataHub (see this for columnar level reporting: https://feature-requests.datahubproject.io/roadmap/541 ) that would help. But really, in a good data platform, the orchestration layer should be aggregating metadata and giving you everything you need to trace lineage, A tool like Dagster does this well if you make full use of the Software Defined Assets capability, but that is fairly new so not so many people have embraced it yet.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dagster

    An orchestration platform for the development, production, and observation of data assets.

  • There are specialized tools like DataHub (see this for columnar level reporting: https://feature-requests.datahubproject.io/roadmap/541 ) that would help. But really, in a good data platform, the orchestration layer should be aggregating metadata and giving you everything you need to trace lineage, A tool like Dagster does this well if you make full use of the Software Defined Assets capability, but that is fairly new so not so many people have embraced it yet.

  • OpenLineage

    An Open Standard for lineage metadata collection

  • Column-level lineage in OpenLineage is in its early days. There's support in the spec for it, and the integration with Spark currently emits column-level metadata. You can see the facet definition here.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts