Question about the old policy and new policy in TRPO code

This page summarizes the projects mentioned and recommended in the original post on /r/reinforcementlearning

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • pytorch-trpo

    PyTorch implementation of Trust Region Policy Optimization

  • The code is a TRPO code. In this code, when "get_kl" , I can't understand the differences between the "mean0, log_std0, std0" and "mean1, log_std1, std1", aren't they equal in the code? And both the difference between the log_probs of old policy and new policy in the part of "get_loss" , aren't they equal in the code? Thanks for the help!

  • stable-baselines3

    PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

  • Good point...I'll check in more detail when I get a chance later today! I would suggest looking at a more recent implementation like https://github.com/DLR-RM/stable-baselines3 or https://github.com/thu-ml/tianshou if you're trying to build. https://spinningup.openai.com/en/latest/algorithms/trpo.html is particularly good for understanding

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • tianshou

    An elegant PyTorch deep reinforcement learning library.

  • Good point...I'll check in more detail when I get a chance later today! I would suggest looking at a more recent implementation like https://github.com/DLR-RM/stable-baselines3 or https://github.com/thu-ml/tianshou if you're trying to build. https://spinningup.openai.com/en/latest/algorithms/trpo.html is particularly good for understanding

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts