Top 7 Scala spark-sql Projects

kyuubi

1 1 2,236 9.4 Scala

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
InfluxDB

www.influxdata.com featured

InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
Jupyter Scala

2 7 1,618 8.2 Scala

A Scala kernel for Jupyter

Project mention: Apache Zeppelin | news.ycombinator.com | 2024-09-02

If you're looking for more modern notebooks supporting Scala (and Spark):
- https://almond.sh
- https://polynote.org
Toree is mostly dead but might also get a Scala 2.13 release now that Spark 4.0 is approaching.
incubator-gluten

3 5 1,423 9.9 Scala

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Project mention: Launch HN: ParaQuery (YC X25) – GPU Accelerated Spark/SQL | news.ycombinator.com | 2025-05-12

I was about to comment that Gluten is only targeting CPU vectorization, but then I found this (very cool!): https://github.com/apache/incubator-gluten/issues/9098
I'm not very familiar with Gluten, but I'll still comment on the CPU side though, assuming that one of Gluten's goals is to use the full vector processing (SIMD) potential of the CPU. In that case, we'd till be memory(-bandwidth)-bound, not to mention the significantly lower FLOPs of the CPU itself. If we vectorize Spark (or any MPP) for efficient compute, perhaps we should run it on hardware optimized for vectorized, super-parallel, high-throughput compute.
Also, there's nothing which says we can't use Gluten to have even more CPU+GPU utilization!
LearningSparkV2

4 1 1,340 2.4 Scala

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
qbeast-spark

5 12 233 8.6 Scala

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
opaque-sql

6 2 182 1.1 Scala

An encrypted data analytics platform
Sparkplug

7 0 29 0.0 Scala

Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Sevalla

sevalla.com featured

Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Scala spark-sql discussion

Scala spark-sql related posts

A glimpse into the future of data processing infrastructure.

1 project | dev.to | 2 May 2024

Index

What are some of the best open-source spark-sql projects in Scala? This list will help you:

#	Project	Stars
1	kyuubi	2,236
2	Jupyter Scala	1,618
3	incubator-gluten	1,423
4	LearningSparkV2	1,340
5	qbeast-spark	233
6	opaque-sql	182
7	Sparkplug	29

Scala spark-sql

Top 7 Scala spark-sql Projects

Scala spark-sql discussion

Scala spark-sql related posts

A glimpse into the future of data processing infrastructure.

Index

Did you know that Scala is the 31st most popular programming language based on number of references?

Did you know that Scala is
the 31st most popular programming language
based on number of references?