Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →
Top 23 Python Distributed Projects
-
Ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
I'm guessing this comment is some kind of "if you know, you know." Likely starting from https://docs.ray.io/en/latest/cluster/vms/user-guides/launch... and then trawling through one of these I guess https://github.com/ray-project/ray/issues?q=is%3Aissue+prem+...
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
Project mention: Optuna – A Hyperparameter Optimization Framework | news.ycombinator.com | 2024-04-06
I didn’t even know WandB did hyperparameter optimization, I figured it was a neural network visualizer based on 2 minute papers. Didn’t seem like many alternatives out there to Optuna with TPE + persistence in conditional continuous & discrete spaces.
Anyway, it’s doable to make a multi objective decide_to_prune function with Optuna, here’s an example https://github.com/optuna/optuna/issues/3450#issuecomment-19...
-
-
-
Project mention: Running Durable Workflows in Postgres Using DBOS | news.ycombinator.com | 2024-12-10
Disclaimer: I'm a co-founder of Hatchet (https://github.com/hatchet-dev/hatchet), which is a Postgres-backed task queue that supports durable execution.
> Because a step transition is just a Postgres write (~1ms) versus an async dispatch from an external orchestrator (~100ms), it means DBOS is 25x faster than AWS Step Functions
Durable execution engines deployed as an external orchestrator will always been slower than direct DB writes, but the 1ms delay versus ~100ms doesn't seem inherent to the orchestrator being external. In the case of Hatchet, pushing work takes ~15ms and invoking the work takes ~1ms if deployed in the same VPC, and 90% of that execution time is on the database. In the best-case, the external orchestrator should take 2x as long to write a step transition (round-trip network call to the orchestrator + database write), so an ideal external orchestrator would be ~2ms of latency here.
There are also some tradeoffs to a library-only mode that aren't discussed. How would work that requires global coordination between workers behave in this model? Let's say, for example, a global rate limit -- you'd ideally want to avoid contention on rate limit rows, assuming they're stored in Postgres, but each worker attempting to acquire a rate limit simultaneously would slow down start time significantly (and place additional load on the DB). Whereas with a single external orchestrator (or leader election), you can significantly increase throughput by acquiring rate limits as part of a push-based assignment process.
The same problem of coordination arises if many workers are competing for the same work -- for example if a machine crashes while doing work, as described in the article. I'm assuming there's some kind of polling happening which uses FOR UPDATE SKIP LOCKED, which concerns me as you start to scale up the number of workers.
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
-
-
-
-
-
code2vec
TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
-
evotorch
Advanced evolutionary computation library built directly on top of PyTorch, created at NNAISENSE.
-
Project mention: Show HN: RAG app example with self-hosted embedding and LLM services | news.ycombinator.com | 2024-08-13
-
-
-
-
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Distributed discussion
Python Distributed related posts
-
Ask HN: Going beyond Pandas for analysis, how to stay sane?
-
Show HN: Dataclr – Python library simplifying feature selection for ML
-
Dataclr – New feature selection algorithm for ML achieving SOTA results
-
Github's Top 31 items of Dec 18, 2024
-
AIM Weekly 28 Oct 2024
-
Multimodal Madness! Create a Product Recommender for Smart Shopping
-
Comparison: Dask vs. Ray
-
A note from our sponsor - CodeRabbit
coderabbit.ai | 22 Mar 2025
Index
What are some of the best open-source Distributed projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | Ray | 36,132 |
2 | optuna | 11,583 |
3 | modin | 10,063 |
4 | scrapy-redis | 5,583 |
5 | hatchet | 4,637 |
6 | Gerapy | 3,414 |
7 | lingvo | 2,833 |
8 | arq | 2,390 |
9 | fugue | 2,055 |
10 | trainer | 1,721 |
11 | MLBox | 1,503 |
12 | quokka | 1,157 |
13 | pottery | 1,133 |
14 | code2vec | 1,126 |
15 | evotorch | 1,044 |
16 | runhouse | 1,014 |
17 | bagua | 879 |
18 | modal-examples | 807 |
19 | optuna-examples | 729 |
20 | AgileRL | 701 |
21 | Pyrlang | 613 |
22 | wakaq | 581 |
23 | malib | 518 |