Python Distributed

Open-source Python projects categorized as Distributed

Top 23 Python Distributed Projects

  • Ray

    Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.

    Project mention: Is dynamic action masking possible in Rllib? | reddit.com/r/reinforcementlearning | 2023-01-23
  • nni

    An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • modin

    Modin: Scale your Pandas workflows by changing a single line of code

    Project mention: Modern Polars: an extensive side-by-side comparison of Polars and Pandas | news.ycombinator.com | 2023-01-07

    Yeah, tried Polars a couple of times: the API seems worse than Pandas to me too. eg the decision only to support autoincrementing integer indexes seems like it would make debugging "hmmm, that answer is wrong, what exactly did I select?" bugs much more annoying. Polars docs write "blazingly fast" all over them but I doubt that is a compelling point for people using single-node dataframe libraries. It isn't for me.

    Modin (https://github.com/modin-project/modin) seems more promising at this point, particularly since a migration path for standing Pandas code is highly desirable.

  • optuna

    A hyperparameter optimization framework

    Project mention: How to tune more than 2 hyperparameters in Grid Search in Python? | reddit.com/r/learnmachinelearning | 2023-02-04
  • scrapy-redis

    Redis-based components for Scrapy.

    Project mention: Ask HN: What are the best tools for web scraping in 2022? | news.ycombinator.com | 2022-08-10

    11. With some work, you can use Scrapy for distributed projects that are scraping thousands (millions) of domains. We are using https://github.com/rmax/scrapy-redis.

  • Gerapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

  • lingvo

    Lingvo

    Project mention: Voice assistant that can be taught how to swear (Part 1) | dev.to | 2022-04-08

    To calculate the Word Error Rate I took a python script from the tensorflow/lingvo project and rewrote it in js. In essence, it is just a simple solution of the Edit Distance task, in addition to error calculation for each of the three types: deletion, insertion, and replacement. In the end, I did not the most intelligent method of comparing texts, and yet it was sufficient enough to later on add parameters to queries to Speech-to-Tex.

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • arq

    Fast job queuing and RPC in python with asyncio and redis.

  • MLBox

    MLBox is a powerful Automated Machine Learning python library.

    Project mention: Feeling starting out | reddit.com/r/datascience | 2022-03-22
  • fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark, Dask and Ray without any rewrites.

    Project mention: Ask HN: How do you test SQL? | news.ycombinator.com | 2023-01-31
  • PySR

    High-Performance Symbolic Regression in Python

    Project mention: [D] Is there any research into using neural networks to discover classical algorithms? | reddit.com/r/MachineLearning | 2023-01-01

    I first learned about it with PySR https://github.com/MilesCranmer/PySR, they have an arxiv paper with some use cases as well.

  • code2vec

    TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"

    Project mention: Why is everyone freaking out about Chat GPT? | reddit.com/r/cscareerquestions | 2022-12-07

    This isn't a "mathematician's calculator" or a new language or standard for computer science people. This is a thing that you tell it what you want it to do and it does it, yes it'd need heavy guidance to get a full product out even if it commits no breaking bugs .... NOW... at this specific point in time. For comparison sake, this was roughly the state of the art THREE YEARS AGO: https://code2vec.org/ Ie a model that blurted out some terms it thought could describe your function. Compare it to the what the big models do now and....

  • pottery

    Redis for humans. 🌎🌍🌏

    Project mention: Is there any way for hGetAll to return a key-value pair list instead of a simple list? | reddit.com/r/redis | 2022-12-18

    This isn’t for Node.js… But if you’re using Python, you might want to check out Pottery. Pottery provides the functionality you’re describing and much more.

  • evotorch

    Advanced evolutionary computation library built directly on top of PyTorch, created at NNAISENSE.

    Project mention: [P] EvoTorch 0.4.0 dropped with GPU-accelerated implementations of CMA-ES, MAP-Elites and NSGA-II. | reddit.com/r/MachineLearning | 2023-01-26
  • bagua

    Bagua Speeds up PyTorch

  • Pyrlang

    Erlang node implemented in Python 3.5+ (Asyncio-based)

    Project mention: Office files processing | reddit.com/r/elixir | 2023-01-17

    - Using Ports to call CLIs that take care of this (e.g. Poppler for PDFs, Libreoffice in `--headless` mode) - Use jInterface to startup a JVM with Apache POI to work on this specific workflow (I have an example here to work with Java Image API). You can also do this with other languages (Golang , Python and other).

  • wakaq

    Distributed background task queue for Python backed by Redis, a super minimal Celery

    Project mention: Building a distributed task queue in Python | news.ycombinator.com | 2022-09-05

    https://github.com/wakatime/wakaq/blob/main/wakaq/__init__.p...

    and

    https://github.com/wakatime/wakaq/blob/main/wakaq/worker.py

    is the meat of it. The blog post talks about the Redis data structures used, and there's not much to it beyond that.

  • malib

    A parallel framework for population-based multi-agent reinforcement learning.

  • machin

    Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

  • optuna-examples

    Examples for https://github.com/optuna/optuna

    Project mention: [D]How to optimize an ANN? | reddit.com/r/MachineLearning | 2022-08-12

    Check out the examples for Optuna, a popular hyper parameter tuning package. It has examples for most popular ML frameworks including Xgboost, so you can see how it compares to an ANN framework like Keras or PyTorch.

  • FedScale

    FedScale is a scalable and extensible open-source federated learning (FL) platform.

    Project mention: University of Michigan Researchers Open-Source ‘FedScale’: a Federated Learning (FL) Benchmarking Suite with Realistic Datasets and a Scalable Runtime to Enable Reproducible FL Research on Privacy-Preserving Machine Learning | reddit.com/r/machinelearningnews | 2022-07-23

    Continue reading | Checkout the paper, github link

  • lithops

    A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀

    Project mention: Lithops: A multi-cloud framework for embarrassingly parallel jobs | news.ycombinator.com | 2023-01-14
  • squirrel-core

    A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

    Project mention: [P] Squirrel: A new OS library for fast & flexible large-scale data loading | reddit.com/r/MachineLearning | 2022-04-11

    Today we open-sourced Squirrel, a data infrastructure library that my colleagues and I have been working on over the past 1.5 years: https://github.com/merantix-momentum/squirrel-core

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-02-04.

Python Distributed related posts

Index

What are some of the best open-source Distributed projects in Python? This list will help you:

Project Stars
1 Ray 23,900
2 nni 12,450
3 modin 8,314
4 optuna 7,545
5 scrapy-redis 5,232
6 Gerapy 2,904
7 lingvo 2,670
8 arq 1,458
9 MLBox 1,390
10 fugue 1,180
11 PySR 977
12 code2vec 930
13 pottery 819
14 evotorch 812
15 bagua 792
16 Pyrlang 527
17 wakaq 513
18 malib 354
19 machin 348
20 optuna-examples 322
21 FedScale 277
22 lithops 256
23 squirrel-core 250
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com