Our great sponsors
-
Ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
maddpg
Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
Cooperative agents is a research field on its own. Check out some recent papers like QMIX, paper linked in this repo: https://github.com/oxwhirl/pymarl/
QMIX is indeed a great paper. I'm planning on using it with RLLIB on my env, however it asks some work to adapt and understand the subtleties ;) ( such as the agents groups : https://github.com/ray-project/ray/blob/936cb5929c455102d5638ff5d59c80c4ae94770f/rllib/env/multi_agent_env.py#L82 )
Another thing is that I don't use only one centralized critic, I'm using one for each agent (they are all centralized), you could use parameter sharing for the ones of the same type if you want. A great start would be to look at how the MADDPG works in an implementation (original, tf2 ,pytorch-1 , pytorch-2 ), then you can see how it is the training of the actor and the critic and just adapt the ideas to your MA-PPO implementation.