Python Distributed

Open-source Python projects categorized as Distributed

Top 23 Python Distributed Projects

  • Ray

    Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

    Project mention: Open Source Advent Fun Wraps Up! | | 2024-01-05

    22. Ray | Github | tutorial

  • nni

    An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

    Project mention: Filter Pruning for PyTorch | /r/deeplearning | 2023-04-13

    Learn 300+ open source libraries for free using AI. LearnThisRepo lets you learn 300+ open source repos including Postgres, Langchain, VS Code, and more by chatting with them using AI!

  • modin

    Modin: Scale your Pandas workflows by changing a single line of code

    Project mention: The Distributed Tensor Algebra Compiler (2022) | | 2023-06-15
  • optuna

    A hyperparameter optimization framework

    Project mention: How to test optimal parameters | /r/algotrading | 2023-12-09
  • scrapy-redis

    Redis-based components for Scrapy.

    Project mention: How to make scrapy run multiple times on the same URLs? | /r/scrapy | 2023-06-26
  • Gerapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

  • lingvo


  • WorkOS

    The modern API for authentication & user identity. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

    Project mention: FLaNK Stack Weekly 22 January 2024 | | 2024-01-22
  • arq

    Fast job queuing and RPC in python with asyncio and redis.

    Project mention: The Many Problems with Celery | /r/Python | 2023-05-22
  • PySR

    High-Performance Symbolic Regression in Python and Julia

    Project mention: Potential of the Julia programming language for high energy physics computing | | 2023-12-04

    > Yes, julia can be called from other languages rather easily

    This seems false to me. StaticCompiler.jl [1] puts in their limitations that "GC-tracked allocations and global variables do not work with compile_executable or compile_shlib. This has some interesting consequences, including that all functions within the function you want to compile must either be inlined or return only native types (otherwise Julia would have to allocate a place to put the results, which will fail)." PackageCompiler.jl [2] has the same limitations if I'm not mistaken. So then you have to fall back to distributing the Julia "binary" with a full Julia runtime, which is pretty heavy. There are some packages which do this. For example, PySR [3] does this.

    There is some word going around though that there is an even better static compiler in the making, but as long as that one is not publicly available I'd say that Julia cannot easily be called from other languages.




  • MLBox

    MLBox is a powerful Automated Machine Learning python library.

  • quokka

    Making data lake work for time series (by marsupialtail)

    Project mention: How Query Engines Work | | 2023-09-08

    An awesome read!

    Something related that I found out about from HN a few months back is another engine called quokka. It's particularly interesting and applicable how quokka schedules distributed queries to outperform Spark

  • code2vec

    TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"

    Project mention: Word2vec | | 2023-10-09
  • pottery

    Redis for humans. 🌎🌍🌏

    Project mention: Is Redis om production ready? Or will it be production ready anytime soon? | /r/redis | 2023-05-12

    However, as an alternative, consider my library, Pottery. Pottery offers some similar functionality to Redis OM, and Pottery is production ready.

  • evotorch

    Advanced evolutionary computation library built directly on top of PyTorch, created at NNAISENSE.

  • bagua

    Bagua Speeds up PyTorch

  • Pyrlang

    Erlang node implemented in Python 3.5+ (Asyncio-based)

  • wakaq

    Background task queue for Python backed by Redis, a super minimal Celery

    Project mention: Ask HN: What apps have you created for your own use? | | 2023-12-12
  • optuna-examples

    Examples for

  • malib

    A parallel framework for population-based multi-agent reinforcement learning.

  • machin

    Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

  • dask-sql

    Distributed SQL Engine in Python using Dask

    Project mention: FLaNK Stack Weekly for 20 June 2023 | | 2023-06-20
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-02-23.

Python Distributed related posts


What are some of the best open-source Distributed projects in Python? This list will help you:

Project Stars
1 Ray 30,029
2 nni 13,598
3 modin 9,332
4 optuna 9,300
5 scrapy-redis 5,416
6 Gerapy 3,170
7 lingvo 2,776
8 fugue 1,833
9 arq 1,802
10 PySR 1,659
11 MLBox 1,465
12 quokka 1,064
13 code2vec 1,057
14 pottery 978
15 evotorch 948
16 bagua 868
17 Pyrlang 577
18 wakaq 558
19 optuna-examples 553
20 modal-examples 473
21 malib 454
22 machin 381
23 dask-sql 355
Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.