Scala Python

Open-source Scala projects categorized as Python

Top 11 Scala Python Projects

  1. Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

    Project mention: Every Database Will Support Iceberg — Here's Why | dev.to | 2025-04-22

    Apache Iceberg defines a table format that separates how data is stored from how data is queried. Any engine that implements the Iceberg integration — Spark, Flink, Trino, DuckDB, Snowflake, RisingWave — can read and/or write Iceberg data directly.

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. Mill

    Mill is a fast JVM build tool that supports Java, Scala, Kotlin and many other languages. 2-4x faster than Gradle and 4-10x faster than Maven for common workflows, Mill aims to make your project’s build process performant, maintainable, and flexible

    Project mention: The next generation of Bazel builds | news.ycombinator.com | 2025-04-10

    A big problem with Bazel not mentioned here is the complexity. It's just really hard for many people to grasp, and adopting Bazel at the two places I worked was a ~10 person-year effort for the rollout with ongoing maintenance after. That's a lot of effort!

    IMO Bazel has a lot of good ideas to it: hierarchical graph-based builds, pure hermetic build steps, and so on. Especially at the time, these were novel ideas. But in Bazel they are buried behind a sea of other concepts that may not be so critical: `query` vs `aquery` vs `cquery`, action-graph vs target-graph, providers vs outputs, etc. Some of these are necessary for ultra-large-scale builds, some are compromises due to legacy, but for the vast majority of non-Google-scale companies there may be a better way.

    But I'm hoping the next generation of build tools can simplify things enough that you don't need a person-decade of engineering work to adopt it. My own OSS project Mill (https://mill-build.org/) is one attempt in that direction, by re-using ideas from functional and object-oriented programming to hopefully make build graphs easier to describe and work with

  4. mleap

    MLeap: Deploy ML Pipelines to Production

  5. Cortex

    Cortex: a Powerful Observable Analysis and Active Response Engine (by TheHive-Project)

  6. adam

    ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

  7. sparkMeasure

    This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.

  8. scalapy

    Use the world of Python from the comfort of Scala!

  9. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
  10. Vyxal

    A code-golfing language experience that has aspects of traditional programming languages - terse yet convenient.

  11. spark-extension

    A library that provides useful extensions to Apache Spark and PySpark.

  12. kukulcan

    A REPL for Apache Kafka

  13. stasis

    Backup and recovery system with emphasis on security and privacy (by sndnv)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Scala Python discussion

Log in or Post with

Scala Python related posts

  • How to Reduce Big Data Analytics Costs by 90% with Karpenter and Spark

    3 projects | dev.to | 21 Apr 2025
  • Apache Spark VS cocoindex - a user suggested alternative

    2 projects | 1 Apr 2025
  • The Application of Java Programming In Data Analysis and Artificial Intelligence

    1 project | dev.to | 10 Mar 2025
  • Apache Spark: Revolutionizing Big Data with Sustainable Open Source Funding

    1 project | dev.to | 6 Mar 2025
  • Run PySpark Local Python Windows Notebook

    2 projects | dev.to | 21 Jan 2025
  • Infraestrutura para análise de dados com Jupyter, Cassandra, Pyspark e Docker

    2 projects | dev.to | 15 Jan 2025
  • His Startup Is Now Worth $62B. It Gave Away Its First Product Free

    1 project | news.ycombinator.com | 17 Dec 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 24 Apr 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Python projects in Scala? This list will help you:

# Project Stars
1 Apache Spark 40,958
2 Mill 2,381
3 mleap 1,516
4 Cortex 1,411
5 adam 1,020
6 sparkMeasure 742
7 scalapy 562
8 Vyxal 283
9 spark-extension 222
10 kukulcan 116
11 stasis 94

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Scala is
the 37th most popular programming language
based on number of references?