spline
opendatadiscovery-speci
spline | opendatadiscovery-speci | |
---|---|---|
1 | 2 | |
611 | - | |
1.1% | - | |
7.8 | - | |
4 days ago | - | |
Scala | ||
Apache License 2.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spline
-
Show HN: First open source data discovery and observability platform
We found a way by leveraging the Spline Agent (https://github.com/AbsaOSS/spline) to make use of the Execution Plans, transform them into a suiting data model for our set of requirements and developed a UI to explore these relationships. We also open-sourced our approach in a
opendatadiscovery-speci
-
Show HN: First open source data discovery and observability platform
Thank you!
Actually everything is working on a push basis in ODD now. ODD Platform implements ODD Specification (https://github.com/opendatadiscovery/opendatadiscovery-speci...) and all agents, custom scripts and integrations, Airflow/Spark listeners, etc are pushing metadata to specific ODD Platform's endpoint (https://github.com/opendatadiscovery/opendatadiscovery-speci...). ODD Collectors (agents) are pushing metadata on a configurable schedule.
ODD Specification is a standard for collecting and gathering such metadata, ETL included. We gather metadata for lineage on an entity level now, but we plan to expand this to the column-level lineage at the end 2022 — start 2023. Specification allows us to make the system open and it's really easy to write your own integration by taking a look in what format metadata needs to be injected in the Platform.
ODD Platform has its own OpenAPI specification (https://github.com/opendatadiscovery/odd-platform/tree/main/...) so that the already indexed and layered metadata could be extracted via platform's API.
Also, thank you for sharing links with us! I'm thrilled to take a look how BMW solved a problem of lineage gathering from Spark, that's something we are improving in our product right now.
What are some alternatives?
odd-platform - First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
kyuubi - Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
opendatadiscovery-specification - ODD Specification is a universal open standard for collecting metadata.
seq-datasource-v2 - Sequence Data Source for Apache Spark
parquet4s - Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
tispark - TiSpark is built for running Apache Spark on top of TiDB/TiKV
Clustering4Ever - C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.