Our great sponsors
-
spark-fast-tests
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
https://github.com/MrPowers/spark-fast-tests https://github.com/97arushisharma/Scala_Practice/tree/master/BigData_Analysis_with_Scala_and_Spark/wikipedia
-
https://github.com/MrPowers/spark-fast-tests https://github.com/97arushisharma/Scala_Practice/tree/master/BigData_Analysis_with_Scala_and_Spark/wikipedia
-
InfluxDB
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
-
deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Related posts
- Building a data quality solution for devs and business people
-
deequ VS cuallee - a user suggested alternative
2 projects | 30 Nov 2022
- Congrats on hitting the v1 milestone, whylabs! You're r/MLOps OSS tool of the month!
- PySpark - How to get Corrupted Records after Casting
- High level overviews of how to properly publish Spark open source libraries (Scala and PySpark)