Expressive types for Spark.
Valid point! Have you seen the withColumnTupled API? It returns a typed tuple instead. This seems to satisfy your use case - the dataset preserves its type and doesn't require a new case class. This is kind of what you're suggesting but without case class generation. Though not sure whether attribute labels (names) are preserved in this case. It's also unclear whether this is good enough for wide tables.
Filling in the Spark function gaps across APIs
See here for a more detailed discussion and let me know your thoughts!!
Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.
Guide for Apache Spark Setup, Job Optimisation, AWS EMR Cluster Configuration, S3, YARN and HDFS Optimisation
1 project | /r/apachespark | 10 Apr 2021
Spark scala v/s pyspark
1 project | /r/dataengineering | 24 Feb 2021
for comprehension and some questions
3 projects | /r/scala | 22 Jan 2023
Grasping the concepts and getting them down to earth
4 projects | /r/scala | 4 Nov 2022
doobie map to PostGIS Point
1 project | /r/scala | 17 Jul 2022