RecurrentPPO (SB3-contrib) learning for autonomous driving

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

HighwayEnv

3 2,356 7.6 Python

A minimalist environment for decision-making in autonomous driving

Hi everyone! I'm a complete newbie to DRL, so please forgive my lack of understanding of some things on here. I'm training a recPPO from SB3-contrib on E.Leurent's Highway env [https://github.com/eleurent/highway-env] (I customized the action to be more high-level). During training I get the desired behavioural outcome from the agent but I noticed that some training metrics of the model seem quite off respect to the trend found online (especially the explained variance). I just wanted an opinion from some more navigated fellas in here! Can I somehow fix this trend by hyperparameter tuning or do I have e.g. to modify the reward function somehow? How can I improve the training? For any details I'm always available. I share the tensorboard plots obtained for RecPPO.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project