Scala spark-sql

Open-source Scala projects categorized as spark-sql

Top 7 Scala spark-sql Projects

  1. kyuubi

    Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. Jupyter Scala

    A Scala kernel for Jupyter

    Project mention: Apache Zeppelin | news.ycombinator.com | 2024-09-02

    If you're looking for more modern notebooks supporting Scala (and Spark):

    - https://almond.sh

    - https://polynote.org

    Toree is mostly dead but might also get a Scala 2.13 release now that Spark 4.0 is approaching.

  4. incubator-gluten

    Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

    Project mention: Launch HN: ParaQuery (YC X25) – GPU Accelerated Spark/SQL | news.ycombinator.com | 2025-05-12

    I was about to comment that Gluten is only targeting CPU vectorization, but then I found this (very cool!): https://github.com/apache/incubator-gluten/issues/9098

    I'm not very familiar with Gluten, but I'll still comment on the CPU side though, assuming that one of Gluten's goals is to use the full vector processing (SIMD) potential of the CPU. In that case, we'd till be memory(-bandwidth)-bound, not to mention the significantly lower FLOPs of the CPU itself. If we vectorize Spark (or any MPP) for efficient compute, perhaps we should run it on hardware optimized for vectorized, super-parallel, high-throughput compute.

    Also, there's nothing which says we can't use Gluten to have even more CPU+GPU utilization!

  5. LearningSparkV2

    This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

  6. qbeast-spark

    Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

  7. opaque-sql

    An encrypted data analytics platform

  8. Sparkplug

    Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌

  9. Sevalla

    Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!

    Sevalla logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Scala spark-sql discussion

Log in or Post with

Scala spark-sql related posts

  • A glimpse into the future of data processing infrastructure.

    1 project | dev.to | 2 May 2024

Index

What are some of the best open-source spark-sql projects in Scala? This list will help you:

# Project Stars
1 kyuubi 2,236
2 Jupyter Scala 1,618
3 incubator-gluten 1,423
4 LearningSparkV2 1,340
5 qbeast-spark 233
6 opaque-sql 182
7 Sparkplug 29

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Scala is
the 31st most popular programming language
based on number of references?