Top 7 Scala spark-sql Projects
-
kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
If you're looking for more modern notebooks supporting Scala (and Spark):
- https://almond.sh
- https://polynote.org
Toree is mostly dead but might also get a Scala 2.13 release now that Spark 4.0 is approaching.
-
incubator-gluten
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Project mention: Launch HN: ParaQuery (YC X25) – GPU Accelerated Spark/SQL | news.ycombinator.com | 2025-05-12I was about to comment that Gluten is only targeting CPU vectorization, but then I found this (very cool!): https://github.com/apache/incubator-gluten/issues/9098
I'm not very familiar with Gluten, but I'll still comment on the CPU side though, assuming that one of Gluten's goals is to use the full vector processing (SIMD) potential of the CPU. In that case, we'd till be memory(-bandwidth)-bound, not to mention the significantly lower FLOPs of the CPU itself. If we vectorize Spark (or any MPP) for efficient compute, perhaps we should run it on hardware optimized for vectorized, super-parallel, high-throughput compute.
Also, there's nothing which says we can't use Gluten to have even more CPU+GPU utilization!
-
LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
-
qbeast-spark
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
-
-
-
Sevalla
Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
Scala spark-sql discussion
Scala spark-sql related posts
Index
What are some of the best open-source spark-sql projects in Scala? This list will help you:
# | Project | Stars |
---|---|---|
1 | kyuubi | 2,236 |
2 | Jupyter Scala | 1,618 |
3 | incubator-gluten | 1,423 |
4 | LearningSparkV2 | 1,340 |
5 | qbeast-spark | 233 |
6 | opaque-sql | 182 |
7 | Sparkplug | 29 |