spark-extension
A library that provides useful extensions to Apache Spark and PySpark. (by G-Research)
macrobase-diff
Minimal implementation of Macrobase Diff (by PiotrZakrzewski)
spark-extension | macrobase-diff | |
---|---|---|
1 | 1 | |
172 | 6 | |
4.7% | - | |
8.3 | 10.0 | |
19 days ago | over 3 years ago | |
Scala | Python | |
Apache License 2.0 | Apache License 2.0 |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spark-extension
Posts with mentions or reviews of spark-extension.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-07-26.
-
Data diffs: Algorithms for explaining what changed in a dataset (2022)
We're doing a env migration and I've been using spark diff extension for reconcile data, it's amazing, we've discover bugs in the data logic so quickly,
here is the extension if anyone is interested https://github.com/G-Research/spark-extension/blob/master/DI...
macrobase-diff
Posts with mentions or reviews of macrobase-diff.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-07-26.
-
Data diffs: Algorithms for explaining what changed in a dataset (2022)
some years ago when I was digging into DIFF and Macrobase (the one from Ballis lab) I made a simple reproduction of DIFF algo https://github.com/PiotrZakrzewski/macrobase-diff
What are some alternatives?
When comparing spark-extension and macrobase-diff you can also consider the following projects:
deep-diff2 - Deep diff Clojure data structures and pretty print the result
handy_sql_queries
recidiffist - Diffs for structured data
pyspark-starter - Starter pyspark code with a working combination of all versions
ExplainDaV
Apache Calcite - Apache Calcite
Azure-Databricks-NYC-Taxi-Workshop - An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
thanos-remote-read - Adapter to query Thanos StoreAPI with Prometheus remote read support.
spark-extension vs deep-diff2
macrobase-diff vs deep-diff2
spark-extension vs handy_sql_queries
macrobase-diff vs recidiffist
spark-extension vs pyspark-starter
macrobase-diff vs handy_sql_queries
spark-extension vs recidiffist
macrobase-diff vs ExplainDaV
spark-extension vs ExplainDaV
macrobase-diff vs Apache Calcite
spark-extension vs Azure-Databricks-NYC-Taxi-Workshop
spark-extension vs thanos-remote-read