Top 23 Python Distributed Projects

Ray

42 30,879 10.0 Python

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Project mention: Open Source Advent Fun Wraps Up! | dev.to | 2024-01-05

22. Ray | Github | tutorial
nni

5 13,708 6.7 Python

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
optuna

34 9,615 9.9 Python

A hyperparameter optimization framework

Project mention: Optuna – A Hyperparameter Optimization Framework | news.ycombinator.com | 2024-04-06

I didn’t even know WandB did hyperparameter optimization, I figured it was a neural network visualizer based on 2 minute papers. Didn’t seem like many alternatives out there to Optuna with TPE + persistence in conditional continuous & discrete spaces.
Anyway, it’s doable to make a multi objective decide_to_prune function with Optuna, here’s an example https://github.com/optuna/optuna/issues/3450#issuecomment-19...
modin

11 9,465 9.6 Python

Modin: Scale your Pandas workflows by changing a single line of code

Project mention: The Distributed Tensor Algebra Compiler (2022) | news.ycombinator.com | 2023-06-15
scrapy-redis

4 5,447 5.0 Python

Redis-based components for Scrapy.

Project mention: How to make scrapy run multiple times on the same URLs? | /r/scrapy | 2023-06-26
Gerapy

1 3,205 6.4 Python

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
lingvo

1 2,781 8.7 Python

Lingvo
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
hatchet

16 2,683 9.6 Python

A distributed, fault-tolerant task queue

Project mention: Ask HN: Who is hiring? (April 2024) | news.ycombinator.com | 2024-04-01

Hatchet (https://hatchet.run) | New York City | Full-time
We're hiring a founding engineer to help us with development on our open-source, distributed task queue: https://github.com/hatchet-dev/hatchet.
We recently launched on HN, you can check out our launch here: https://news.ycombinator.com/item?id=39643136. We're two second-time YC founders in this for the long haul and we are just wrapping up the YC W24 batch.
As a founding engineer, you'll be responsible for contributing across the entire codebase. We'll compensate accordingly and with high equity. It's currently just the two founders + a part-time contractor. We're all technical and contribute code.
Stack: Typescript/React, Go and PostgreSQL.
To apply, email alexander [at] hatchet [dot] run, and include the following:
1. Tell us about something impressive you've built.
2. Ask a question or write a comment about the state of the project. For example: a file that stood out to you in the codebase, a Github issue or discussion that piqued your interest, a general comment on distributed systems/task queues, or why our code is bad and how you could improve it.
arq

4 1,902 2.6 Python

Fast job queuing and RPC in python with asyncio and redis.

Project mention: Future Plan for Arq | news.ycombinator.com | 2024-03-18
fugue

11 1,869 6.7 Python

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

Project mention: FLaNK Stack Weekly 22 January 2024 | dev.to | 2024-01-22
PySR

7 1,850 9.6 Python

High-Performance Symbolic Regression in Python and Julia

Project mention: Potential of the Julia programming language for high energy physics computing | news.ycombinator.com | 2023-12-04

> Yes, julia can be called from other languages rather easily
This seems false to me. StaticCompiler.jl [1] puts in their limitations that "GC-tracked allocations and global variables do not work with compile_executable or compile_shlib. This has some interesting consequences, including that all functions within the function you want to compile must either be inlined or return only native types (otherwise Julia would have to allocate a place to put the results, which will fail)." PackageCompiler.jl [2] has the same limitations if I'm not mistaken. So then you have to fall back to distributing the Julia "binary" with a full Julia runtime, which is pretty heavy. There are some packages which do this. For example, PySR [3] does this.
There is some word going around though that there is an even better static compiler in the making, but as long as that one is not publicly available I'd say that Julia cannot easily be called from other languages.
[1]: https://github.com/tshort/StaticCompiler.jl
[2]: https://github.com/JuliaLang/PackageCompiler.jl
[3]: https://github.com/MilesCranmer/PySR
MLBox

1 1,474 0.0 Python

MLBox is a powerful Automated Machine Learning python library.
quokka

23 1,081 8.3 Python

Making data lake work for time series (by marsupialtail)

Project mention: How Query Engines Work | news.ycombinator.com | 2023-09-08

An awesome read!
Something related that I found out about from HN a few months back is another engine called quokka. It's particularly interesting and applicable how quokka schedules distributed queries to outperform Spark https://github.com/marsupialtail/quokka/blob/master/blog/why...
code2vec

3 1,072 2.5 Python

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"

Project mention: Word2vec | news.ycombinator.com | 2023-10-09
pottery

5 1,002 7.3 Python

Redis for humans. 🌎🌍🌏

Project mention: Is Redis om production ready? Or will it be production ready anytime soon? | /r/redis | 2023-05-12

However, as an alternative, consider my library, Pottery. Pottery offers some similar functionality to Redis OM, and Pottery is production ready.
evotorch

14 967 5.2 Python

Advanced evolutionary computation library built directly on top of PyTorch, created at NNAISENSE.
bagua

6 865 4.8 Python

Bagua Speeds up PyTorch
runhouse

5 702 9.8 Python

The fastest way to iterate and deploy AI workloads on your own infra. Unobtrusive, debuggable, PyTorch-like APIs.

Project mention: Better GPU Cluster Scheduling with Runhouse | dev.to | 2024-03-15

With Runhouse, it’s easy to send code to your compute no matter where it lives, and efficiently utilize your resources across multiple callers scheduling jobs (e.g. researchers, pipelines, inference services, etc). We believe less is more when it comes to AI DevOps, so we don’t make any assumptions about the structure of your code or the infrastructure to which you’re sending it.
optuna-examples

2 587 8.8 Python

Examples for https://github.com/optuna/optuna
Pyrlang

4 586 3.7 Python

Erlang node implemented in Python 3.5+ (Asyncio-based)
wakaq

3 563 8.7 Python

Background task queue for Python backed by Redis, a super minimal Celery

Project mention: Show HN: Hatchet – Open-source distributed task queue | news.ycombinator.com | 2024-03-08
modal-examples

9 545 9.5 Python

Examples of programs built using Modal

Project mention: Show HN: Real-time image autocomplete in <100 lines of code with SDXL Lightning | news.ycombinator.com | 2024-02-23

We made a small app for SDXL Lightning, running your own Python code on GPUs. It generates images in real time.
https://potatoes.ai/
We know there was a fal.ai post yesterday, and that got a lot of interest, but we also made this demo yesterday and didn't share — just wanted to mention it as an alternative option for people who like running their own code and custom models instead of using a prebuilt API provider.
The backend code is open-source too and you can deploy it yourself: https://github.com/modal-labs/modal-examples/blob/main/06_gpu_and_ml/stable_diffusion/stable_diffusion_xl_lightning.py
AgileRL

12 488 9.8 Python

Streamlining reinforcement learning with RLOps. State-of-the-art RL algorithms and tools.

Project mention: [P] Introducing PPO and Rainbow DQN to our super fast evolutionary HPO reinforcement learning framework | /r/MachineLearning | 2023-10-15
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-06.

Python Distributed related posts

Optuna – A Hyperparameter Optimization Framework
2 projects | news.ycombinator.com | 6 Apr 2024
Future Plan for Arq
1 project | news.ycombinator.com | 18 Mar 2024
How to test optimal parameters
2 projects | /r/algotrading | 9 Dec 2023
Optuna – A Hyperparameter Optimization Framework
1 project | news.ycombinator.com | 8 Dec 2023
Word2vec
1 project | news.ycombinator.com | 9 Oct 2023
How Query Engines Work
2 projects | news.ycombinator.com | 8 Sep 2023
Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Custom Models
1 project | news.ycombinator.com | 11 Aug 2023
A note from our sponsor - WorkOS
workos.com | 18 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source Distributed projects in Python? This list will help you:

	Project	Stars
1	Ray	30,879
2	nni	13,708
3	optuna	9,615
4	modin	9,465
5	scrapy-redis	5,447
6	Gerapy	3,205
7	lingvo	2,781
8	hatchet	2,683
9	arq	1,902
10	fugue	1,869
11	PySR	1,850
12	MLBox	1,474
13	quokka	1,081
14	code2vec	1,072
15	pottery	1,002
16	evotorch	967
17	bagua	865
18	runhouse	702
19	optuna-examples	587
20	Pyrlang	586
21	wakaq	563
22	modal-examples	545
23	AgileRL	488