Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work. Learn more →
Top 23 Python Distributed Projects
-
Ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
Project mention: Is dynamic action masking possible in Rllib? | reddit.com/r/reinforcementlearning | 2023-01-23 -
nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
Project mention: Modern Polars: an extensive side-by-side comparison of Polars and Pandas | news.ycombinator.com | 2023-01-07
Yeah, tried Polars a couple of times: the API seems worse than Pandas to me too. eg the decision only to support autoincrementing integer indexes seems like it would make debugging "hmmm, that answer is wrong, what exactly did I select?" bugs much more annoying. Polars docs write "blazingly fast" all over them but I doubt that is a compelling point for people using single-node dataframe libraries. It isn't for me.
Modin (https://github.com/modin-project/modin) seems more promising at this point, particularly since a migration path for standing Pandas code is highly desirable.
-
Project mention: How to tune more than 2 hyperparameters in Grid Search in Python? | reddit.com/r/learnmachinelearning | 2023-02-04
-
Project mention: Ask HN: What are the best tools for web scraping in 2022? | news.ycombinator.com | 2022-08-10
11. With some work, you can use Scrapy for distributed projects that are scraping thousands (millions) of domains. We are using https://github.com/rmax/scrapy-redis.
-
-
To calculate the Word Error Rate I took a python script from the tensorflow/lingvo project and rewrote it in js. In essence, it is just a simple solution of the Edit Distance task, in addition to error calculation for each of the three types: deletion, insertion, and replacement. In the end, I did not the most intelligent method of comparing texts, and yet it was sufficient enough to later on add parameters to queries to Speech-to-Tex.
-
InfluxDB
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
-
-
-
fugue
A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark, Dask and Ray without any rewrites.
-
Project mention: [D] Is there any research into using neural networks to discover classical algorithms? | reddit.com/r/MachineLearning | 2023-01-01
I first learned about it with PySR https://github.com/MilesCranmer/PySR, they have an arxiv paper with some use cases as well.
-
code2vec
TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
Project mention: Why is everyone freaking out about Chat GPT? | reddit.com/r/cscareerquestions | 2022-12-07This isn't a "mathematician's calculator" or a new language or standard for computer science people. This is a thing that you tell it what you want it to do and it does it, yes it'd need heavy guidance to get a full product out even if it commits no breaking bugs .... NOW... at this specific point in time. For comparison sake, this was roughly the state of the art THREE YEARS AGO: https://code2vec.org/ Ie a model that blurted out some terms it thought could describe your function. Compare it to the what the big models do now and....
-
Project mention: Is there any way for hGetAll to return a key-value pair list instead of a simple list? | reddit.com/r/redis | 2022-12-18
This isn’t for Node.js… But if you’re using Python, you might want to check out Pottery. Pottery provides the functionality you’re describing and much more.
-
evotorch
Advanced evolutionary computation library built directly on top of PyTorch, created at NNAISENSE.
Project mention: [P] EvoTorch 0.4.0 dropped with GPU-accelerated implementations of CMA-ES, MAP-Elites and NSGA-II. | reddit.com/r/MachineLearning | 2023-01-26 -
-
- Using Ports to call CLIs that take care of this (e.g. Poppler for PDFs, Libreoffice in `--headless` mode) - Use jInterface to startup a JVM with Apache POI to work on this specific workflow (I have an example here to work with Java Image API). You can also do this with other languages (Golang , Python and other).
-
https://github.com/wakatime/wakaq/blob/main/wakaq/__init__.p...
and
https://github.com/wakatime/wakaq/blob/main/wakaq/worker.py
is the meat of it. The blog post talks about the Redis data structures used, and there's not much to it beyond that.
-
-
machin
Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...
-
Check out the examples for Optuna, a popular hyper parameter tuning package. It has examples for most popular ML frameworks including Xgboost, so you can see how it compares to an ANN framework like Keras or PyTorch.
-
Project mention: University of Michigan Researchers Open-Source ‘FedScale’: a Federated Learning (FL) Benchmarking Suite with Realistic Datasets and a Scalable Runtime to Enable Reproducible FL Research on Privacy-Preserving Machine Learning | reddit.com/r/machinelearningnews | 2022-07-23
Continue reading | Checkout the paper, github link
-
lithops
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Project mention: Lithops: A multi-cloud framework for embarrassingly parallel jobs | news.ycombinator.com | 2023-01-14 -
squirrel-core
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
Project mention: [P] Squirrel: A new OS library for fast & flexible large-scale data loading | reddit.com/r/MachineLearning | 2022-04-11Today we open-sourced Squirrel, a data infrastructure library that my colleagues and I have been working on over the past 1.5 years: https://github.com/merantix-momentum/squirrel-core
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Distributed related posts
- How to tune more than 2 hyperparameters in Grid Search in Python?
- Is dynamic action masking possible in Rllib?
- AWS re:Invent 2022 Recap | Data & Analytics services
- [D] Is there any research into using neural networks to discover classical algorithms?
- Polars: The Next Big Python Data Science Library... written in RUST?
- Suggestion to optimize algo
- Is there any way for hGetAll to return a key-value pair list instead of a simple list?
-
A note from our sponsor - Sonar
www.sonarsource.com | 8 Feb 2023
Index
What are some of the best open-source Distributed projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Ray | 23,900 |
2 | nni | 12,450 |
3 | modin | 8,314 |
4 | optuna | 7,545 |
5 | scrapy-redis | 5,232 |
6 | Gerapy | 2,904 |
7 | lingvo | 2,670 |
8 | arq | 1,458 |
9 | MLBox | 1,390 |
10 | fugue | 1,180 |
11 | PySR | 977 |
12 | code2vec | 930 |
13 | pottery | 819 |
14 | evotorch | 812 |
15 | bagua | 792 |
16 | Pyrlang | 527 |
17 | wakaq | 513 |
18 | malib | 354 |
19 | machin | 348 |
20 | optuna-examples | 322 |
21 | FedScale | 277 |
22 | lithops | 256 |
23 | squirrel-core | 250 |