How to tune hypeparametes in A2C-ppo?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

baselines

14 15,339 0.0 Python

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Im currently working with A2C. The model was able to learn open ai pong, i ran this as a sanity check that i havent made any bugs. Now im trying to make the model play breakout, but still after 10m steps the model has not made any significant progress. Im using baseline hyperparameters which can be found here https://github.com/openai/baselines/blob/master/baselines/a2c/a2c.py, except my buffersize have been from 512 to 4096. Ive noticed that entropy decreases extremely slowly given the buffersize from the interval which i just gave. So my questions are how to make entropy decrease and how to increase rewards per buffer? Ive tried to decrease the entropy coefficient to almost zero, but still it acts very weirdly.

ppo-implementation-details

18 551 0.0 Python

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

You might find our PPO blog post helpful - https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Brunoamaral/gregory: Gregory uses AI to help find scientific research
1 project | news.ycombinator.com | 28 Apr 2024
Einsum in 40 Lines of Python
6 projects | news.ycombinator.com | 27 Apr 2024
Show HN: Free GitHub Copilot CLI with your own model or API
1 project | news.ycombinator.com | 27 Apr 2024
Show HN: Cognita – open-source RAG framework for modular applications
2 projects | news.ycombinator.com | 27 Apr 2024
Show HN: Data Bonsai: a Python package to clean your data with LLMs
1 project | news.ycombinator.com | 27 Apr 2024

How to tune hypeparametes in A2C-ppo?

This page summarizes the projects mentioned and recommended in the original post on /r/reinforcementlearning Post date: 15 Jun 2022

baselines

ppo-implementation-details

WorkOS

Related posts