Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Hi everyone! I'm a complete newbie to DRL, so please forgive my lack of understanding of some things on here. I'm training a recPPO from SB3-contrib on E.Leurent's Highway env [https://github.com/eleurent/highway-env] (I customized the action to be more high-level). During training I get the desired behavioural outcome from the agent but I noticed that some training metrics of the model seem quite off respect to the trend found online (especially the explained variance). I just wanted an opinion from some more navigated fellas in here! Can I somehow fix this trend by hyperparameter tuning or do I have e.g. to modify the reward function somehow? How can I improve the training? For any details I'm always available. I share the tensorboard plots obtained for RecPPO.
Related posts
- Looking for a a tutorial/blog post/ codebase/ anything that deals with highway-env possibly the racetrack variant) with a DQN
- Show HN: I wrote a multicopter simulation library in Python
- Trading Environment for Reinforcement Learning - Documentation available
- SimpleGrid env for OpenAI gym
- drone environment ?