Our great sponsors
-
pytorch-a2c-ppo-acktr-gail
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
For the PPO, I used this repo, which includes most tricks including GAE, normalized rewards, etc. I have verified this repo works for the traditional Pendulum-v0 task and Atari games (Pong and Breakout).
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
- How does advantage estimation is done when episodes are of variable length in PPO?
- [P] 10x faster reinforcement learning hyperparameter optimization than SOTA - now with distributed training!
- TransformerXL + PPO Baseline + MemoryGym
- Python libraries for solving reinforcement learning problems implemented in OpenAI gym
- How do I change the maximum number of steps for training