Scala Big Data

Open-source Scala projects categorized as Big Data

Top 23 Scala Big Data Projects

  • Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

  • Project mention: "xAI will open source Grok" | news.ycombinator.com | 2024-03-11
  • kafka-manager

    CMAK is a tool for managing Apache Kafka clusters

  • Project mention: FLaNK Stack Weekly 16 October 2023 | dev.to | 2023-10-17
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • delta

    An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)

  • Project mention: Delta Lake vs. Parquet: A Comparison | news.ycombinator.com | 2024-01-19

    Delta is pretty great, let's you do upserts into tables in DataBricks much easier than without it.

    I think the website is here: https://delta.io

  • SynapseML

    Simple and Distributed Machine Learning

  • Project mention: FLaNK Stack Weekly for 12 September 2023 | dev.to | 2023-09-12
  • Scalding

    A Scala API for Cascading

  • Scio

    A Scala API for Apache Beam and Google Cloud Dataflow.

  • Jupyter Scala

    A Scala kernel for Jupyter

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Reactive-kafka

    Alpakka Kafka connector - Alpakka is a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka.

  • adam

    ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

  • H2O

    Sparkling Water provides H2O functionality inside Spark cluster

  • BIDMach

    CPU and GPU-accelerated Machine Learning Library

  • Gearpump

    Lightweight real-time big data streaming engine over Akka

  • Vegas

    The missing MatPlotLib for Scala + Spark (by vegas-viz)

  • spark-rapids

    Spark RAPIDS plugin - accelerate Apache Spark with GPUs

  • delta-sharing

    An open protocol for secure data sharing

  • Project mention: Azure data lake - Data Share | /r/dataengineering | 2023-06-29
  • nussknacker

    Low-code tool for automating actions on real time data | Stream processing for the users.

  • metorikku

    A simplified, lightweight ETL Framework based on Apache Spark

  • Sparkta

    Real Time Analytics and Data Pipelines based on Spark Streaming (by Stratio)

  • Scoobi

    A Scala productivity framework for Hadoop. (by NICTA)

  • qbeast-spark

    Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

  • Clustering4Ever

    C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

  • Schemer

    Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

  • Scoozie

    Scala DSL on top of Oozie XML

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Scala Big Data related posts

Index

What are some of the best open-source Big Data projects in Scala? This list will help you:

Project Stars
1 Apache Spark 38,320
2 kafka-manager 11,670
3 delta 6,897
4 SynapseML 4,967
5 Scalding 3,470
6 Scio 2,520
7 Jupyter Scala 1,562
8 Reactive-kafka 1,418
9 adam 967
10 H2O 952
11 BIDMach 913
12 Gearpump 765
13 Vegas 729
14 spark-rapids 720
15 delta-sharing 674
16 nussknacker 609
17 metorikku 576
18 Sparkta 524
19 Scoobi 482
20 qbeast-spark 190
21 Clustering4Ever 128
22 Schemer 112
23 Scoozie 82

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com