on-policy
pymarl2
Our great sponsors
on-policy | pymarl2 | |
---|---|---|
12 | 1 | |
1,125 | 556 | |
7.8% | - | |
4.9 | 5.0 | |
10 days ago | 4 months ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
on-policy
- How do you compute rewards when you are using parallel environments?
-
Renderer of the environment does not work?
I am trying to feed the agents with visual observation and thus using the renderer of this environment (https://github.com/marlbenchmark/on-policy/blob/main/onpolicy/envs/mpe/rendering.py), but I get this as an image:
- Stuck on this error for days: I can't use importlib the right way
- Difference between setup.py, environments.yaml and requirements.txt
-
Ubuntu terminal crashes when I launch a deep reinforcement learning model
I am trying to run this code on my Ubuntu machine (https://github.com/marlbenchmark/on-policy).
-
"chmod" is not recognized as an internal or external command, operable program or batch file
If you don't want to install a Linux VM, the other option is to read the source of the train_mpe.sh script and write your own version as a Windows batch file.
-
Confused between "centralized critic" and "centralized training decentralized execution"
Sorry, this was the paper: https://arxiv.org/abs/2104.07750 But I guess you already answered my question. Indeed, agents receive a global obervation, but cannot directly observe other agents' actions, states, orrewards, and do not share parameters. So if I understand correctly that what they're using here is independent PPO with global observation, but no centralized critic. Which is what MAPPO (https://github.com/marlbenchmark/on-policy/blob/main/onpolicy/algorithms/r_mappo/algorithm/r_actor_critic.py) does: centralized observation space, but (if I'm correct), decentralized critic.
-
Why is this implementation of PPO using a replay buffer?
I don't see the buffer being cleared anywhere, but it looks to me like it may not need to... For example, the implementation of SeparatedReplayBuffer receives the episode_length (or "horizon" as is sometimes called) and sets the size of the buffer accordingly when its initialized. That way, the amount of samples collected before each policy/value update is constant. You just need one giant tensor block to collect all your samples, then after doing a networks update, why clear them out? Just overwrite the existing samples, since you know you'll collect exactly the same number of new samples.
-
MARL top conference papers are ridiculous
https://github.com/marlbenchmark/on-policy (MAPPO-FP)
pymarl2
-
MARL top conference papers are ridiculous
https://github.com/hijkzzz/pymarl2 (RIIT)
What are some alternatives?
gym-pybullet-drones - PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control
nlp-recipes - Natural Language Processing Best Practices & Examples
DI-engine - OpenDILab Decision AI Engine
auto-sklearn - Automated Machine Learning with scikit-learn
ai-economist - Foundation is a flexible, modular, and composable framework to model socio-economic behaviors and dynamics with both agents and governments. This framework can be used in conjunction with reinforcement learning to learn optimal economic policies, as done by the AI Economist (https://www.einstein.ai/the-ai-economist).
Mava - 🦁 A research-friendly codebase for fast experimentation of multi-agent reinforcement learning in JAX
fast-reid - SOTA Re-identification Methods and Toolbox
SimpleView - Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"
Emergent-Multiagent-Strategies - Emergence of complex strategies through multiagent competition