Scala Hadoop

Open-source Scala projects categorized as Hadoop

Top 4 Scala Hadoop Projects

  1. kyuubi

    Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. spline

    Data Lineage Tracking And Visualization Solution (by AbsaOSS)

  4. parquet4s

    Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

  5. seq-datasource-v2

    Sequence Data Source for Apache Spark

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Scala Hadoop discussion

Log in or Post with

Scala Hadoop related posts

  • Unveiling the Apache License 2.0: A Deep Dive into Open Source Freedom

    3 projects | dev.to | 11 Mar 2025
  • How to Install PySpark on Your Local Machine

    2 projects | dev.to | 9 Dec 2024
  • Unveiling the Analytics Industry in Bangalore

    3 projects | /r/u_Khushisondhi7 | 23 Mar 2023
  • Big Data Processing, EMR with Spark and Hadoop | Python, PySpark

    2 projects | dev.to | 27 Mar 2022
  • Spark for beginners - and you

    3 projects | dev.to | 22 Dec 2021
  • Spark is lit once again

    6 projects | dev.to | 29 Oct 2021
  • Advice for storing tick data in Google Cloud

    1 project | /r/scala | 25 Jan 2021
  • A note from our sponsor - SaaSHub
    www.saashub.com | 18 Jun 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Hadoop projects in Scala? This list will help you:

# Project Stars
1 kyuubi 2,202
2 spline 630
3 parquet4s 291
4 seq-datasource-v2 10

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Scala is
the 32nd most popular programming language
based on number of references?