SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Distributed Computing Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
-
fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
-
vizier
Python-based research interface for blackbox and hyperparameter optimization, based on the internal Google Vizier Service.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
couler
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
-
tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark (by CamDavidsonPilon)
-
machinaris
An easy-to-use WebUI for crypto plotting and farming. Offers Bladebit, Gigahorse, MadMax, Chiadog and Plotman in a Docker container. Supports Chia, MMX, Chives, Flax, and HDDCoin among others.
-
stable-diffusion-webui-distributed
Chains stable-diffusion-webui instances together to facilitate faster image generation.
-
mlToolKits
learningOrchestra is a distributed Machine Learning integration tool that facilitates and streamlines iterative processes in a Data Science project.
-
redis-dict
Python Redis Dictionary. Pythonic wrapper that provides name spacing for everyone favorite caching database Redis.
-
FindTheMag2
A tool to determine optimal projects for Gridcoin & BOINC crunchers. Maximize your magnitude!
-
py-inventa
A Python library for microservice registry and executing RPC (Remote Procedure Call) over Redis.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Instance segmentation of small objects in grainy drone imagery | /r/computervision | 2023-12-09
Project mention: GreptimeAI + Xinference - Efficient Deployment and Monitoring of Your LLM Applications | dev.to | 2024-01-24Xorbits Inference (Xinference) is an open-source platform to streamline the operation and integration of a wide array of AI models. With Xinference, you’re empowered to run inference using any open-source LLMs, embedding models, and multimodal models either in the cloud or on your own premises, and create robust AI-driven applications. It provides a RESTful API compatible with OpenAI API, Python SDK, CLI, and WebUI. Furthermore, it integrates third-party developer tools like LangChain, LlamaIndex, and Dify, facilitating model integration and development.
Project mention: Nvidia Announces Financial Results for Second Quarter Fiscal 2024 | news.ycombinator.com | 2023-08-23Also the Horde for Stable Diffusion, pretty good concept: https://github.com/Haidra-Org/AI-Horde/
author seems to be describing the kind of patterns you might make with https://argoproj.github.io/argo-workflows/ . or see for example https://github.com/couler-proj/couler , which is an sdk for describing tasks that may be submitted to different workflow engines on the backend.
it's a little confusing to me that the author seems to object to "pipelines" and then equate them with messaging-queues. for me at least, "pipeline" vs "workflow-engine" vs "scheduler" are all basically synonyms in this context. those things may or may not be implemented with a message-queue for persistence, but the persistence layer itself is usually below the level of abstraction that $current_problem is really concerned with. like the author says, eventually you have to track state/timestamps/logs, but you get that from the beginning if you start with a workflow engine.
i agree with author that message-queues should not be a knee-jerk response to most problems because the LoE for edge-cases/observability/monitoring is huge. (maybe reach for a queue only if you may actually overwhelm whatever the "scheduler" can handle.) but don't build the scheduler from scratch either.. use argowf, kubeflow, or a more opinionated framework like airflow, mlflow, databricks, aws lamda or step-functions. all/any of these should have config or api that's robust enough to express rate-limit/retry stuff. almost any of these choices has better observability out-of-the-box than you can easily get from a queue. but most importantly.. they provide idioms for handling failure that data-science folks and junior devs can work with. the right way to structure code is just much more clear and things like structuring messages/events, subclassing workers, repeating/retrying tasks, is just harder to mess up.
No need to worry, HAOS has docker. You can use docker to get chia or use Machinaris
Those interested in the intersection between Python, HPC, and data science may want to take a look at Arkouda, which is a Python package for data science at massive scales (TB of memory) at interactive rates (seconds), powered by Chapel:
* https://github.com/Bears-R-Us/arkouda
Project mention: The next version of Stable Diffusion ("SDXL") that is currently beta tested with a bot in the official Discord looks super impressive! Here's a gallery of some of the best photorealistic generations posted so far on Discord. And it seems the open-source release will be very soon, in just a few days. | /r/StableDiffusion | 2023-06-20
[0]https://github.com/Attumm/redis-dict/blob/main/tests.py
No mining will currently generate profit if you pay for electric. If you want an estimate of GRC you can use this tool. https://github.com/makeasnek/FindTheMag2
Python Distributed Computing related posts
-
Daft: A High-Performance Distributed Dataframe Library for Multimodal Data
-
about making a game.
-
TIL : about the game "Foldit", a puzzle game about protein folding. In 2011, its gamers helped decipher a protein of a HIV-like virus, solving a scientific problem that went unsolved for 15 years in as little as 10 days.
-
Alternatives to Kaggle and Collab?
-
Shuffling large data at constant memory in Dask
-
What is Midjourney doing better than us?
-
Dectralized AI solutions/building blocks?
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 May 2024
Index
What are some of the best open-source Distributed Computing projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | ColossalAI | 37,951 |
2 | catalyst | 3,229 |
3 | alpa | 2,989 |
4 | inference | 2,701 |
5 | fugue | 1,883 |
6 | distributed | 1,544 |
7 | vizier | 1,174 |
8 | AI-Horde | 984 |
9 | couler | 890 |
10 | bagua | 865 |
11 | tdigest | 376 |
12 | machinaris | 343 |
13 | sparktorch | 335 |
14 | arkouda | 225 |
15 | stable-diffusion-webui-distributed | 163 |
16 | mlToolKits | 75 |
17 | wrapyfi | 70 |
18 | redis-dict | 37 |
19 | tune | 33 |
20 | FindTheMag2 | 27 |
21 | rxray | 12 |
22 | py-inventa | 7 |
23 | FindTheMag | 7 |
Sponsored