Ray
maddpg
Our great sponsors
Ray | maddpg | |
---|---|---|
42 | 2 | |
30,988 | 1,516 | |
2.8% | 3.9% | |
10.0 | 0.0 | |
about 5 hours ago | 18 days ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Ray
-
Open Source Advent Fun Wraps Up!
22. Ray | Github | tutorial
-
Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Custom Models
Training times for GSM8k are mentioned here: https://github.com/ray-project/ray/tree/master/doc/source/te...
- Ray – an open source project for scaling AI workloads
-
Methods to keep agents inside grid world.
Here's a reference from RLlib that points to docs and an example, and here's one from one of my projects that includes all my own implementations
-
TransformerXL + PPO Baseline + MemoryGym
RLlib
- Is dynamic action masking possible in Rllib?
-
AWS re:Invent 2022 Recap | Data & Analytics services
⦿ AWS Glue Data Quality - Automatic data quality rule recommendations based on your data AWS Glue for Ray - Data integration with Ray (ray.io), a popular new open-source compute framework that helps you scale Python workloads
-
Think about it for a second
https://ray.io (just dropping the link)
-
Elixir Livebook now as a desktop app
I've wondered whether it's easier to add data analyst stuff to Elixir that Python seems to have, or add features to Python that Erlang (and by extension Elixir) provides out of the box.
By what I can see, if you want multiprocessing on Python in an easier way (let's say running async), you have to use something like ray core[0], then if you want multiple machines you need redis(?). Elixir/Erlang supports this out of the box.
Explorer[1] is an interesting approach, where it uses Rust via Rustler (Elixir library to call Rust code) and uses Polars as its dataframe library. I think Rustler needs to be reworked for this usecase, as it can be slow to return data. I made initial improvements which drastically improves encoding (https://github.com/elixir-nx/explorer/pull/282 and https://github.com/elixir-nx/explorer/pull/286, tldr 20+ seconds down to 3).
-
Learn various techniques to reduce data processing time by using multiprocessing, joblib, and tqdm concurrent
Adding these for anyone who had a similar question about Ray vs dask 1, 2, 3
maddpg
-
How is the backward pass performed in MADDPG algorithm from MARL
I'm using the MADDPG algorithm from https://github.com/openai/maddpg/blob/master/maddpg/trainer/maddpg.py. I understood the forward pass for both the actor and critic networks. I'm not able to understand how the actor and critic networks are updates. Like at line 188 and 191 the authors compute the critic loss and actor loss. But can anyone explain how the critic and actor networks are updated. Also, as far as I understand, when the number of agents increases from 3 to 6 for a simple spread policy in MADDPG, the computation time for Q loss and P loss at lines 188 and 191 increase super-linearly. I'm assuming this might be because both the Q loss and P loss utilize the Q values and the dimension to calculate the Q values increases with the number of increasing linearly. It would be great if anyone can help me to understand this back propagation phase much better and why does the computation time grow super-linearly. I also put a time counter to track the computation time of Q loss and P loss for 60,000 episodes with simple spread policy (3 agents, 3 landmarks, 0 adversaries). Thanks for the help, in advance! **Q loss** 3 agents 74.31 sec 6 agents 243.31 sec (3X) **P loss** 3 agents 114.86 sec 6 agents 321.76 sec (3x)
-
How to get my multi-agents more collaborative?
Another thing is that I don't use only one centralized critic, I'm using one for each agent (they are all centralized), you could use parameter sharing for the ones of the same type if you want. A great start would be to look at how the MADDPG works in an implementation (original, tf2 ,pytorch-1 , pytorch-2 ), then you can see how it is the training of the actor and the critic and just adapt the ideas to your MA-PPO implementation.
What are some alternatives?
optuna - A hyperparameter optimization framework
pymarl - Python Multi-Agent Reinforcement Learning framework
stable-baselines3 - PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
multiagent-particle-envs - Code for a multi-agent particle environment used in the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
Faust - Python Stream Processing
gpt-2 - Code for the paper "Language Models are Unsupervised Multitask Learners"
gevent - Coroutine-based concurrency library for Python
transferlearning - Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
stable-baselines - A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
SCOOP (Scalable COncurrent Operations in Python) - SCOOP (Scalable COncurrent Operations in Python)
Thespian Actor Library - Python Actor concurrency library
Dask - Parallel computing with task scheduling