PPO-for-Beginners
stable-baselines3-contrib
PPO-for-Beginners | stable-baselines3-contrib | |
---|---|---|
1 | 6 | |
653 | 429 | |
- | 4.4% | |
4.2 | 6.7 | |
5 months ago | 12 days ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
PPO-for-Beginners
-
Why does this PPO implementation calculate the Advantage only once per rollout?
I am looking at this PPO implementation, which follows the pseudocode given in Spinning Up. This implementation has been really easy to follow and I understand almost everything. However, I am lost in line 103, where the author computes the normalized advantage before the rollout -
stable-baselines3-contrib
-
Problem with Truncated Quantile Critics (TQC) and n-step learning algorithm.
# https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/tqc/tqc.py :
-
Understanding Action Masking in RLlib
Here's a theoretical overview and an implementation of action masking for PPO.
-
PPO rollout buffer for turn-based two-player game with varying turn lengths
Simplified version of rollout collection (adapted from ppo_mask.py line 282):
-
GitHub Copilot: your AI pair programmer
Transformers (GPT-3) aren't quite _supervised_, but it does require valid samples.
Agree 100% with RL being the path forward. You probably have already seen ( https://venturebeat.com/2021/06/09/deepmind-says-reinforceme... ). Personally I'm really stoked for this https://github.com/Stable-Baselines-Team/stable-baselines3-c... , which will make it a lot easier for rubes like me to use RL.
-
[P] Stable-Baselines3 v1.0 - Reliable implementations of RL algorithms
But as we already have vanilla DQN and QR-DQN (in our contrib repo: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib ) I think it is already a good start for off-policy discrete action algorithms. (QR-DQN is usually competitive vs DQN+extensions)
What are some alternatives?
stable-baselines3 - PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
muzero-general - MuZero
pytorch-learn-reinforcement-learning - A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.
TabNine - AI Code Completions
PPO-PyTorch - Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
stable-baselines3-c
R-NaD - Experimentation with Regularized Nash Dynamics on a GPU accelerated game
copilot-cli - The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
cleanrl - High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
rl-baselines3-zoo - A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
dreamerv2 - Mastering Atari with Discrete World Models
robot-gym - RL applied to robotics.