SaaSHub helps you find the best software and product alternatives Learn more →
Top 11 Scala Python Projects
-
Apache Iceberg defines a table format that separates how data is stored from how data is queried. Any engine that implements the Iceberg integration — Spark, Flink, Trino, DuckDB, Snowflake, RisingWave — can read and/or write Iceberg data directly.
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
Mill
Mill is a fast JVM build tool that supports Java, Scala, Kotlin and many other languages. 2-4x faster than Gradle and 4-10x faster than Maven for common workflows, Mill aims to make your project’s build process performant, maintainable, and flexible
A big problem with Bazel not mentioned here is the complexity. It's just really hard for many people to grasp, and adopting Bazel at the two places I worked was a ~10 person-year effort for the rollout with ongoing maintenance after. That's a lot of effort!
IMO Bazel has a lot of good ideas to it: hierarchical graph-based builds, pure hermetic build steps, and so on. Especially at the time, these were novel ideas. But in Bazel they are buried behind a sea of other concepts that may not be so critical: `query` vs `aquery` vs `cquery`, action-graph vs target-graph, providers vs outputs, etc. Some of these are necessary for ultra-large-scale builds, some are compromises due to legacy, but for the vast majority of non-Google-scale companies there may be a better way.
But I'm hoping the next generation of build tools can simplify things enough that you don't need a person-decade of engineering work to adopt it. My own OSS project Mill (https://mill-build.org/) is one attempt in that direction, by re-using ideas from functional and object-oriented programming to hopefully make build graphs easier to describe and work with
-
-
-
adam
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
-
sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
-
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
-
Vyxal
A code-golfing language experience that has aspects of traditional programming languages - terse yet convenient.
-
-
-
Scala Python discussion
Scala Python related posts
-
How to Reduce Big Data Analytics Costs by 90% with Karpenter and Spark
-
Apache Spark VS cocoindex - a user suggested alternative
2 projects | 1 Apr 2025 -
The Application of Java Programming In Data Analysis and Artificial Intelligence
-
Apache Spark: Revolutionizing Big Data with Sustainable Open Source Funding
-
Run PySpark Local Python Windows Notebook
-
Infraestrutura para análise de dados com Jupyter, Cassandra, Pyspark e Docker
-
His Startup Is Now Worth $62B. It Gave Away Its First Product Free
-
A note from our sponsor - SaaSHub
www.saashub.com | 24 Apr 2025
Index
What are some of the best open-source Python projects in Scala? This list will help you:
# | Project | Stars |
---|---|---|
1 | Apache Spark | 40,958 |
2 | Mill | 2,381 |
3 | mleap | 1,516 |
4 | Cortex | 1,411 |
5 | adam | 1,020 |
6 | sparkMeasure | 742 |
7 | scalapy | 562 |
8 | Vyxal | 283 |
9 | spark-extension | 222 |
10 | kukulcan | 116 |
11 | stasis | 94 |