pyspark-starter
spark-extension
pyspark-starter | spark-extension | |
---|---|---|
1 | 1 | |
2 | 172 | |
- | 4.7% | |
5.5 | 8.3 | |
9 months ago | 16 days ago | |
Python | Scala | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pyspark-starter
spark-extension
-
Data diffs: Algorithms for explaining what changed in a dataset (2022)
We're doing a env migration and I've been using spark diff extension for reconcile data, it's amazing, we've discover bugs in the data logic so quickly,
here is the extension if anyone is interested https://github.com/G-Research/spark-extension/blob/master/DI...
What are some alternatives?
TypedPyspark - Type-annotate your spark dataframes and validate them
deep-diff2 - Deep diff Clojure data structures and pretty print the result
Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data - Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
handy_sql_queries
pyspark-on-aws-emr - The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
recidiffist - Diffs for structured data
Hail - Cloud-native genomic dataframes and batch computing
macrobase-diff - Minimal implementation of Macrobase Diff
ExplainDaV
Azure-Databricks-NYC-Taxi-Workshop - An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
thanos-remote-read - Adapter to query Thanos StoreAPI with Prometheus remote read support.
Apache Calcite - Apache Calcite