How to tune hypeparametes in A2C-ppo?

This page summarizes the projects mentioned and recommended in the original post on /r/reinforcementlearning

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • baselines

    OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

  • Im currently working with A2C. The model was able to learn open ai pong, i ran this as a sanity check that i havent made any bugs. Now im trying to make the model play breakout, but still after 10m steps the model has not made any significant progress. Im using baseline hyperparameters which can be found here https://github.com/openai/baselines/blob/master/baselines/a2c/a2c.py, except my buffersize have been from 512 to 4096. Ive noticed that entropy decreases extremely slowly given the buffersize from the interval which i just gave. So my questions are how to make entropy decrease and how to increase rewards per buffer? Ive tried to decrease the entropy coefficient to almost zero, but still it acts very weirdly.

  • ppo-implementation-details

    The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

  • You might find our PPO blog post helpful - https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts