on-policy
DI-engine
Our great sponsors
on-policy | DI-engine | |
---|---|---|
12 | 3 | |
1,125 | 2,553 | |
7.8% | 2.8% | |
4.9 | 8.7 | |
10 days ago | 4 days ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
on-policy
- How do you compute rewards when you are using parallel environments?
-
Renderer of the environment does not work?
I am trying to feed the agents with visual observation and thus using the renderer of this environment (https://github.com/marlbenchmark/on-policy/blob/main/onpolicy/envs/mpe/rendering.py), but I get this as an image:
- Stuck on this error for days: I can't use importlib the right way
- Difference between setup.py, environments.yaml and requirements.txt
-
Ubuntu terminal crashes when I launch a deep reinforcement learning model
I am trying to run this code on my Ubuntu machine (https://github.com/marlbenchmark/on-policy).
-
"chmod" is not recognized as an internal or external command, operable program or batch file
If you don't want to install a Linux VM, the other option is to read the source of the train_mpe.sh script and write your own version as a Windows batch file.
-
Confused between "centralized critic" and "centralized training decentralized execution"
Sorry, this was the paper: https://arxiv.org/abs/2104.07750 But I guess you already answered my question. Indeed, agents receive a global obervation, but cannot directly observe other agents' actions, states, orrewards, and do not share parameters. So if I understand correctly that what they're using here is independent PPO with global observation, but no centralized critic. Which is what MAPPO (https://github.com/marlbenchmark/on-policy/blob/main/onpolicy/algorithms/r_mappo/algorithm/r_actor_critic.py) does: centralized observation space, but (if I'm correct), decentralized critic.
-
Why is this implementation of PPO using a replay buffer?
I don't see the buffer being cleared anywhere, but it looks to me like it may not need to... For example, the implementation of SeparatedReplayBuffer receives the episode_length (or "horizon" as is sometimes called) and sets the size of the buffer accordingly when its initialized. That way, the amount of samples collected before each policy/value update is constant. You just need one giant tensor block to collect all your samples, then after doing a networks update, why clear them out? Just overwrite the existing samples, since you know you'll collect exactly the same number of new samples.
-
MARL top conference papers are ridiculous
https://github.com/marlbenchmark/on-policy (MAPPO-FP)
DI-engine
-
Anyone have experience with DI-Engine?
I posted a while back asking people what frameworks they were using for RL research. Recently i stumbled upon DI-Engine which looks promising! Actively maintained, with a diverse set of algorithms already implemented.
-
TransformerXL + PPO Baseline + MemoryGym
DI Engine
- Struggling with algorithm generality? Try DI engine; here is the solution
What are some alternatives?
gym-pybullet-drones - PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control
stable-baselines - A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
auto-sklearn - Automated Machine Learning with scikit-learn
pytorch-a2c-ppo-acktr-gail - PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
tianshou - An elegant PyTorch deep reinforcement learning library.
seed_rl - SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. Implements IMPALA and R2D2 algorithms in TF2 with SEED's architecture.
stable-baselines3 - PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
myosuite - MyoSuite is a collection of environments/tasks to be solved by musculoskeletal models simulated with the MuJoCo physics engine and wrapped in the OpenAI gym API.
godot_rl_agents - An Open Source package that allows video game creators, AI researchers and hobbyists the opportunity to learn complex behaviors for their Non Player Characters or agents
brain-agent - Brain Agent for Large-Scale and Multi-Task Agent Learning
ml-agents - The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
Gymnasium - An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)