PPO-for-Beginners
R-NaD
PPO-for-Beginners | R-NaD | |
---|---|---|
1 | 1 | |
653 | 30 | |
- | - | |
4.2 | 4.7 | |
5 months ago | about 1 year ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
PPO-for-Beginners
-
Why does this PPO implementation calculate the Advantage only once per rollout?
I am looking at this PPO implementation, which follows the pseudocode given in Spinning Up. This implementation has been really easy to follow and I understand almost everything. However, I am lost in line 103, where the author computes the normalized advantage before the rollout -
R-NaD
What are some alternatives?
stable-baselines3 - PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
pytorch-learn-reinforcement-learning - A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.
PantheonRL - PantheonRL is a package for training and testing multi-agent reinforcement learning environments. PantheonRL supports cross-play, fine-tuning, ad-hoc coordination, and more.
PPO-PyTorch - Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
Simple-MADRL-Chess - MADRL project solving chess environment using PPO with two different methods: 2 agents/networks and a single agent/network.
cleanrl - High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
stable-baselines3-contrib - Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
warp-drive - Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)