Python Distributed

Open-source Python projects categorized as Distributed

Top 23 Python Distributed Projects

Distributed
  1. Ray

    Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

    Project mention: Ask HN: What Open Source Projects Need Help? | news.ycombinator.com | 2024-11-16

    I'm guessing this comment is some kind of "if you know, you know." Likely starting from https://docs.ray.io/en/latest/cluster/vms/user-guides/launch... and then trawling through one of these I guess https://github.com/ray-project/ray/issues?q=is%3Aissue+prem+...

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. optuna

    A hyperparameter optimization framework

    Project mention: Optuna – A Hyperparameter Optimization Framework | news.ycombinator.com | 2024-04-06

    I didn’t even know WandB did hyperparameter optimization, I figured it was a neural network visualizer based on 2 minute papers. Didn’t seem like many alternatives out there to Optuna with TPE + persistence in conditional continuous & discrete spaces.

    Anyway, it’s doable to make a multi objective decide_to_prune function with Optuna, here’s an example https://github.com/optuna/optuna/issues/3450#issuecomment-19...

  4. modin

    Modin: Scale your Pandas workflows by changing a single line of code

  5. scrapy-redis

    Redis-based components for Scrapy.

  6. hatchet

    A distributed, fault-tolerant task queue

    Project mention: Running Durable Workflows in Postgres Using DBOS | news.ycombinator.com | 2024-12-10

    Disclaimer: I'm a co-founder of Hatchet (https://github.com/hatchet-dev/hatchet), which is a Postgres-backed task queue that supports durable execution.

    > Because a step transition is just a Postgres write (~1ms) versus an async dispatch from an external orchestrator (~100ms), it means DBOS is 25x faster than AWS Step Functions

    Durable execution engines deployed as an external orchestrator will always been slower than direct DB writes, but the 1ms delay versus ~100ms doesn't seem inherent to the orchestrator being external. In the case of Hatchet, pushing work takes ~15ms and invoking the work takes ~1ms if deployed in the same VPC, and 90% of that execution time is on the database. In the best-case, the external orchestrator should take 2x as long to write a step transition (round-trip network call to the orchestrator + database write), so an ideal external orchestrator would be ~2ms of latency here.

    There are also some tradeoffs to a library-only mode that aren't discussed. How would work that requires global coordination between workers behave in this model? Let's say, for example, a global rate limit -- you'd ideally want to avoid contention on rate limit rows, assuming they're stored in Postgres, but each worker attempting to acquire a rate limit simultaneously would slow down start time significantly (and place additional load on the DB). Whereas with a single external orchestrator (or leader election), you can significantly increase throughput by acquiring rate limits as part of a push-based assignment process.

    The same problem of coordination arises if many workers are competing for the same work -- for example if a machine crashes while doing work, as described in the article. I'm assuming there's some kind of polling happening which uses FOR UPDATE SKIP LOCKED, which concerns me as you start to scale up the number of workers.

  7. Gerapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

  8. lingvo

    Lingvo

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. arq

    Fast job queuing and RPC in python with asyncio and redis.

  11. fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

  12. trainer

    Distributed ML Training and Fine-Tuning on Kubernetes

  13. MLBox

    MLBox is a powerful Automated Machine Learning python library.

  14. quokka

    Making data lake work for time series (by marsupialtail)

  15. pottery

    Redis for humans. 🌎🌍🌏

  16. code2vec

    TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"

  17. evotorch

    Advanced evolutionary computation library built directly on top of PyTorch, created at NNAISENSE.

  18. runhouse

    Distribute and run AI workloads magically in Python, like PyTorch for ML infra.

    Project mention: Show HN: RAG app example with self-hosted embedding and LLM services | news.ycombinator.com | 2024-08-13
  19. bagua

    Bagua Speeds up PyTorch

  20. optuna-examples

    Examples for https://github.com/optuna/optuna

  21. AgileRL

    Streamlining reinforcement learning with RLOps. State-of-the-art RL algorithms and tools.

  22. Pyrlang

    Erlang node implemented in Python 3.5+ (Asyncio-based)

  23. wakaq

    Background task queue for Python backed by Redis, a super minimal Celery

  24. malib

    A parallel framework for population-based multi-agent reinforcement learning.

  25. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Distributed discussion

Log in or Post with

Python Distributed related posts

  • Ask HN: Going beyond Pandas for analysis, how to stay sane?

    1 project | news.ycombinator.com | 14 Feb 2025
  • Show HN: Dataclr – Python library simplifying feature selection for ML

    1 project | news.ycombinator.com | 6 Jan 2025
  • Dataclr – New feature selection algorithm for ML achieving SOTA results

    1 project | news.ycombinator.com | 5 Jan 2025
  • Github's Top 31 items of Dec 18, 2024

    1 project | dev.to | 18 Dec 2024
  • AIM Weekly 28 Oct 2024

    21 projects | dev.to | 28 Oct 2024
  • Multimodal Madness! Create a Product Recommender for Smart Shopping

    6 projects | dev.to | 7 Aug 2024
  • Comparison: Dask vs. Ray

    1 project | news.ycombinator.com | 14 Jun 2024
  • A note from our sponsor - CodeRabbit
    coderabbit.ai | 22 Mar 2025
    Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →

Index

What are some of the best open-source Distributed projects in Python? This list will help you:

# Project Stars
1 Ray 36,132
2 optuna 11,583
3 modin 10,063
4 scrapy-redis 5,583
5 hatchet 4,637
6 Gerapy 3,414
7 lingvo 2,833
8 arq 2,390
9 fugue 2,055
10 trainer 1,721
11 MLBox 1,503
12 quokka 1,157
13 pottery 1,133
14 code2vec 1,126
15 evotorch 1,044
16 runhouse 1,014
17 bagua 879
18 modal-examples 807
19 optuna-examples 729
20 AgileRL 701
21 Pyrlang 613
22 wakaq 581
23 malib 518

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?