Apache Spark - A unified analytics engine for large-scale data processing
The authors of Spark itself do a great job, poke around source for examples of tests: https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
PySpark test helper methods with beautiful error messages
- All Spark transformations are tested with pytest + chispa (https://github.com/MrPowers/chispa)
The context switching struggle is real. Zigi makes context switching a thing of the past. It monitors Jira and GitHub updates, pings you when PRs need approval and lets you take fast actions - all directly from Slack!
Which language is better for Spark & Why?
2 projects | reddit.com/r/apachespark | 1 Feb 2021
Deequ for generating data quality reports
3 projects | dev.to | 24 Nov 2022
Check if structured streaming dataframe is empty or not
1 project | reddit.com/r/apachespark | 24 Nov 2022
System Design: Twitter
5 projects | dev.to | 21 Sep 2022
How can I reproduce the indeterminacy exception in Spark?
1 project | reddit.com/r/apachespark | 16 Sep 2022