Top 23 Python Distributed Computing Projects

ColossalAI

42 37,951 9.7 Python

Making large AI models cheaper, faster and more accessible

Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22

catalyst

1 3,229 0.0 Python

Accelerated deep learning R&D (by catalyst-team)

Project mention: Instance segmentation of small objects in grainy drone imagery | /r/computervision | 2023-12-09

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
alpa

4 2,989 5.1 Python

Training and serving large-scale neural networks with auto parallelization.
inference

2 2,701 9.8 Python

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Project mention: GreptimeAI + Xinference - Efficient Deployment and Monitoring of Your LLM Applications | dev.to | 2024-01-24

Xorbits Inference (Xinference) is an open-source platform to streamline the operation and integration of a wide array of AI models. With Xinference, you’re empowered to run inference using any open-source LLMs, embedding models, and multimodal models either in the cloud or on your own premises, and create robust AI-driven applications. It provides a RESTful API compatible with OpenAI API, Python SDK, CLI, and WebUI. Furthermore, it integrates third-party developer tools like LangChain, LlamaIndex, and Dify, facilitating model integration and development.

fugue

11 1,883 6.4 Python

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

Project mention: FLaNK Stack Weekly 22 January 2024 | dev.to | 2024-01-22

distributed

3 1,544 9.6 Python

A distributed task scheduler for Dask
vizier

5 1,174 9.3 Python

Python-based research interface for blackbox and hyperparameter optimization, based on the internal Google Vizier Service.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
AI-Horde

16 984 9.4 Python

A crowdsourced distributed cluster for AI art and text generation

Project mention: Nvidia Announces Financial Results for Second Quarter Fiscal 2024 | news.ycombinator.com | 2023-08-23

Also the Horde for Stable Diffusion, pretty good concept: https://github.com/Haidra-Org/AI-Horde/

couler

1 890 5.2 Python

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Project mention: (Not) to Write a Pipeline | news.ycombinator.com | 2023-06-27

author seems to be describing the kind of patterns you might make with https://argoproj.github.io/argo-workflows/ . or see for example https://github.com/couler-proj/couler , which is an sdk for describing tasks that may be submitted to different workflow engines on the backend.
it's a little confusing to me that the author seems to object to "pipelines" and then equate them with messaging-queues. for me at least, "pipeline" vs "workflow-engine" vs "scheduler" are all basically synonyms in this context. those things may or may not be implemented with a message-queue for persistence, but the persistence layer itself is usually below the level of abstraction that $current_problem is really concerned with. like the author says, eventually you have to track state/timestamps/logs, but you get that from the beginning if you start with a workflow engine.
i agree with author that message-queues should not be a knee-jerk response to most problems because the LoE for edge-cases/observability/monitoring is huge. (maybe reach for a queue only if you may actually overwhelm whatever the "scheduler" can handle.) but don't build the scheduler from scratch either.. use argowf, kubeflow, or a more opinionated framework like airflow, mlflow, databricks, aws lamda or step-functions. all/any of these should have config or api that's robust enough to express rate-limit/retry stuff. almost any of these choices has better observability out-of-the-box than you can easily get from a queue. but most importantly.. they provide idioms for handling failure that data-science folks and junior devs can work with. the right way to structure code is just much more clear and things like structuring messages/events, subclassing workers, repeating/retrying tasks, is just harder to mess up.

bagua

6 865 4.8 Python

Bagua Speeds up PyTorch
tdigest

0 376 0.0 Python

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark (by CamDavidsonPilon)
machinaris

41 343 8.9 Python

An easy-to-use WebUI for crypto plotting and farming. Offers Bladebit, Gigahorse, MadMax, Chiadog and Plotman in a Docker container. Supports Chia, MMX, Chives, Flax, and HDDCoin among others.

Project mention: Chia farming on Home Assistant Operating System (Linux)? | /r/chia | 2023-05-15

No need to worry, HAOS has docker. You can use docker to get chia or use Machinaris

sparktorch

1 335 2.5 Python

Train and run Pytorch models on Apache Spark.
arkouda

1 225 9.4 Python

Arkouda (αρκούδα): Interactive Data Analytics at Supercomputing Scale :bear:

Project mention: Mojo is now available on Mac | news.ycombinator.com | 2023-10-19

Those interested in the intersection between Python, HPC, and data science may want to take a look at Arkouda, which is a Python package for data science at massive scales (TB of memory) at interactive rates (seconds), powered by Chapel:
* https://github.com/Bears-R-Us/arkouda

stable-diffusion-webui-distributed

1 163 8.9 Python

Chains stable-diffusion-webui instances together to facilitate faster image generation.

Project mention: The next version of Stable Diffusion ("SDXL") that is currently beta tested with a bot in the official Discord looks super impressive! Here's a gallery of some of the best photorealistic generations posted so far on Discord. And it seems the open-source release will be very soon, in just a few days. | /r/StableDiffusion | 2023-06-20

mlToolKits

1 75 0.0 Python

learningOrchestra is a distributed Machine Learning integration tool that facilitates and streamlines iterative processes in a Data Science project.
wrapyfi

4 70 9.6 Python

Python Wrapper for Message-Oriented and Robotics Middleware
redis-dict

1 37 7.5 Python

Python Redis Dictionary. Pythonic wrapper that provides name spacing for everyone favorite caching database Redis.

Project mention: We applied advanced fuzzing techniques to cURL | news.ycombinator.com | 2024-03-01

[0]https://github.com/Attumm/redis-dict/blob/main/tests.py

tune

1 33 1.3 Python

An abstraction layer for parameter tuning (by fugue-project)
FindTheMag2

13 27 9.0 Python

A tool to determine optimal projects for Gridcoin & BOINC crunchers. Maximize your magnitude!

Project mention: Need assistance | /r/cryptomining | 2023-12-06

No mining will currently generate profit if you pay for electric. If you want an estimate of GRC you can use this tool. https://github.com/makeasnek/FindTheMag2

rxray

1 12 3.9 Python

Ray distributed computing integration for RxPY
py-inventa

2 7 0.0 Python

A Python library for microservice registry and executing RPC (Remote Procedure Call) over Redis.
FindTheMag

10 7 2.8 Python

A tool to determine optimal projects for Gridcoin crunchers. Maximize your magnitude!
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Distributed Computing related posts

Daft: A High-Performance Distributed Dataframe Library for Multimodal Data

4 projects | news.ycombinator.com | 7 Jun 2023
about making a game.

2 projects | /r/ArtificialInteligence | 6 Jun 2023
TIL : about the game "Foldit", a puzzle game about protein folding. In 2011, its gamers helped decipher a protein of a HIV-like virus, solving a scientific problem that went unsolved for 15 years in as little as 10 days.

5 projects | /r/todayilearned | 22 May 2023
Alternatives to Kaggle and Collab?

1 project | /r/StableDiffusion | 25 Apr 2023
Shuffling large data at constant memory in Dask

1 project | /r/Python | 17 Apr 2023
What is Midjourney doing better than us?

3 projects | /r/StableDiffusion | 4 Apr 2023
Dectralized AI solutions/building blocks?

1 project | /r/Rad_Decentralization | 30 Mar 2023
A note from our sponsor - SaaSHub
www.saashub.com | 10 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Distributed Computing projects in Python? This list will help you:

	Project	Stars
1	ColossalAI	37,951
2	catalyst	3,229
3	alpa	2,989
4	inference	2,701
5	fugue	1,883
6	distributed	1,544
7	vizier	1,174
8	AI-Horde	984
9	couler	890
10	bagua	865
11	tdigest	376
12	machinaris	343
13	sparktorch	335
14	arkouda	225
15	stable-diffusion-webui-distributed	163
16	mlToolKits	75
17	wrapyfi	70
18	redis-dict	37
19	tune	33
20	FindTheMag2	27
21	rxray	12
22	py-inventa	7
23	FindTheMag	7

Python Distributed Computing

Top 23 Python Distributed Computing Projects

Python Distributed Computing related posts

Daft: A High-Performance Distributed Dataframe Library for Multimodal Data

about making a game.

TIL : about the game "Foldit", a puzzle game about protein folding. In 2011, its gamers helped decipher a protein of a HIV-like virus, solving a scientific problem that went unsolved for 15 years in as little as 10 days.

Alternatives to Kaggle and Collab?

Shuffling large data at constant memory in Dask

What is Midjourney doing better than us?

Dectralized AI solutions/building blocks?

Index