Python Distributed Computing

Open-source Python projects categorized as Distributed Computing

Top 23 Python Distributed Computing Projects

  • ColossalAI

    Making large AI models cheaper, faster and more accessible

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • catalyst

    Accelerated deep learning R&D (by catalyst-team)

  • Project mention: Instance segmentation of small objects in grainy drone imagery | /r/computervision | 2023-12-09
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • alpa

    Training and serving large-scale neural networks with auto parallelization.

  • inference

    Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

  • Project mention: GreptimeAI + Xinference - Efficient Deployment and Monitoring of Your LLM Applications | dev.to | 2024-01-24

    Xorbits Inference (Xinference) is an open-source platform to streamline the operation and integration of a wide array of AI models. With Xinference, you’re empowered to run inference using any open-source LLMs, embedding models, and multimodal models either in the cloud or on your own premises, and create robust AI-driven applications. It provides a RESTful API compatible with OpenAI API, Python SDK, CLI, and WebUI. Furthermore, it integrates third-party developer tools like LangChain, LlamaIndex, and Dify, facilitating model integration and development.

  • fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

  • Project mention: FLaNK Stack Weekly 22 January 2024 | dev.to | 2024-01-22
  • distributed

    A distributed task scheduler for Dask

  • vizier

    Python-based research interface for blackbox and hyperparameter optimization, based on the internal Google Vizier Service.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • AI-Horde

    A crowdsourced distributed cluster for AI art and text generation

  • Project mention: Nvidia Announces Financial Results for Second Quarter Fiscal 2024 | news.ycombinator.com | 2023-08-23

    Also the Horde for Stable Diffusion, pretty good concept: https://github.com/Haidra-Org/AI-Horde/

  • couler

    Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

  • Project mention: (Not) to Write a Pipeline | news.ycombinator.com | 2023-06-27

    author seems to be describing the kind of patterns you might make with https://argoproj.github.io/argo-workflows/ . or see for example https://github.com/couler-proj/couler , which is an sdk for describing tasks that may be submitted to different workflow engines on the backend.

    it's a little confusing to me that the author seems to object to "pipelines" and then equate them with messaging-queues. for me at least, "pipeline" vs "workflow-engine" vs "scheduler" are all basically synonyms in this context. those things may or may not be implemented with a message-queue for persistence, but the persistence layer itself is usually below the level of abstraction that $current_problem is really concerned with. like the author says, eventually you have to track state/timestamps/logs, but you get that from the beginning if you start with a workflow engine.

    i agree with author that message-queues should not be a knee-jerk response to most problems because the LoE for edge-cases/observability/monitoring is huge. (maybe reach for a queue only if you may actually overwhelm whatever the "scheduler" can handle.) but don't build the scheduler from scratch either.. use argowf, kubeflow, or a more opinionated framework like airflow, mlflow, databricks, aws lamda or step-functions. all/any of these should have config or api that's robust enough to express rate-limit/retry stuff. almost any of these choices has better observability out-of-the-box than you can easily get from a queue. but most importantly.. they provide idioms for handling failure that data-science folks and junior devs can work with. the right way to structure code is just much more clear and things like structuring messages/events, subclassing workers, repeating/retrying tasks, is just harder to mess up.

  • bagua

    Bagua Speeds up PyTorch

  • tdigest

    t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark (by CamDavidsonPilon)

  • machinaris

    An easy-to-use WebUI for crypto plotting and farming. Offers Bladebit, Gigahorse, MadMax, Chiadog and Plotman in a Docker container. Supports Chia, MMX, Chives, Flax, and HDDCoin among others.

  • Project mention: Chia farming on Home Assistant Operating System (Linux)? | /r/chia | 2023-05-15

    No need to worry, HAOS has docker. You can use docker to get chia or use Machinaris

  • sparktorch

    Train and run Pytorch models on Apache Spark.

  • arkouda

    Arkouda (αρκούδα): Interactive Data Analytics at Supercomputing Scale :bear:

  • Project mention: Mojo is now available on Mac | news.ycombinator.com | 2023-10-19

    Those interested in the intersection between Python, HPC, and data science may want to take a look at Arkouda, which is a Python package for data science at massive scales (TB of memory) at interactive rates (seconds), powered by Chapel:

    * https://github.com/Bears-R-Us/arkouda

  • stable-diffusion-webui-distributed

    Chains stable-diffusion-webui instances together to facilitate faster image generation.

  • Project mention: The next version of Stable Diffusion ("SDXL") that is currently beta tested with a bot in the official Discord looks super impressive! Here's a gallery of some of the best photorealistic generations posted so far on Discord. And it seems the open-source release will be very soon, in just a few days. | /r/StableDiffusion | 2023-06-20
  • mlToolKits

    learningOrchestra is a distributed Machine Learning integration tool that facilitates and streamlines iterative processes in a Data Science project.

  • wrapyfi

    Python Wrapper for Message-Oriented and Robotics Middleware

  • redis-dict

    Python Redis Dictionary. Pythonic wrapper that provides name spacing for everyone favorite caching database Redis.

  • Project mention: We applied advanced fuzzing techniques to cURL | news.ycombinator.com | 2024-03-01

    [0]https://github.com/Attumm/redis-dict/blob/main/tests.py

  • tune

    An abstraction layer for parameter tuning (by fugue-project)

  • FindTheMag2

    A tool to determine optimal projects for Gridcoin & BOINC crunchers. Maximize your magnitude!

  • Project mention: Need assistance | /r/cryptomining | 2023-12-06

    No mining will currently generate profit if you pay for electric. If you want an estimate of GRC you can use this tool. https://github.com/makeasnek/FindTheMag2

  • rxray

    Ray distributed computing integration for RxPY

  • py-inventa

    A Python library for microservice registry and executing RPC (Remote Procedure Call) over Redis.

  • FindTheMag

    A tool to determine optimal projects for Gridcoin crunchers. Maximize your magnitude!

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Distributed Computing related posts

  • Daft: A High-Performance Distributed Dataframe Library for Multimodal Data

    4 projects | news.ycombinator.com | 7 Jun 2023
  • about making a game.

    2 projects | /r/ArtificialInteligence | 6 Jun 2023
  • TIL : about the game "Foldit", a puzzle game about protein folding. In 2011, its gamers helped decipher a protein of a HIV-like virus, solving a scientific problem that went unsolved for 15 years in as little as 10 days.

    5 projects | /r/todayilearned | 22 May 2023
  • Alternatives to Kaggle and Collab?

    1 project | /r/StableDiffusion | 25 Apr 2023
  • Shuffling large data at constant memory in Dask

    1 project | /r/Python | 17 Apr 2023
  • What is Midjourney doing better than us?

    3 projects | /r/StableDiffusion | 4 Apr 2023
  • Dectralized AI solutions/building blocks?

    1 project | /r/Rad_Decentralization | 30 Mar 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 10 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Distributed Computing projects in Python? This list will help you:

Project Stars
1 ColossalAI 37,951
2 catalyst 3,229
3 alpa 2,989
4 inference 2,701
5 fugue 1,883
6 distributed 1,544
7 vizier 1,174
8 AI-Horde 984
9 couler 890
10 bagua 865
11 tdigest 376
12 machinaris 343
13 sparktorch 335
14 arkouda 225
15 stable-diffusion-webui-distributed 163
16 mlToolKits 75
17 wrapyfi 70
18 redis-dict 37
19 tune 33
20 FindTheMag2 27
21 rxray 12
22 py-inventa 7
23 FindTheMag 7

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com