Overall loss in PPO, why does it matter?

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

ppo-implementation-details

18 557 0.0 Python

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

I am using as base code the Phils Tabor Implementation and this site (and sometimes OpenAi repository), but I can't figure out how tensorflow/PyTorch knows which loss belongs to whom. When the loss is split, you create two separate tape.Gradient, but when overall loss is used, how can the model understand which part propagates and which doesn't?

Youtube-Code-Repository

5 838 1.6 Python

Repository for most of the code from my YouTube channel

In Phil tabor's implementation it calculates Actor and Critic loss separately (line 95+) and does not calculate equation 9.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Intrinsic Curiosity Module Pytorch multithreading cpu unable to fix seeds

1 project | /r/reinforcementlearning | 1 Apr 2022
PPO cannot play CartPole ?

1 project | /r/reinforcementlearning | 1 Nov 2021
Lunar Lander using Deep Q-Learning

3 projects | /r/deeplearning | 18 Mar 2021
Rl algorithm implemented

2 projects | /r/reinforcementlearning | 18 Jul 2021
Recapping the AI, Machine Learning and Data Science Meetup — May 2, 2024

2 projects | dev.to | 2 May 2024

Overall loss in PPO, why does it matter?

This page summarizes the projects mentioned and recommended in the original post on /r/reinforcementlearning
reinforcement-learning monte-carlo-methods qlearning-algorithm convolutional-neural-networks sarsa
Post date: 28 Apr 2023

ppo-implementation-details

Youtube-Code-Repository

InfluxDB

Related posts

Intrinsic Curiosity Module Pytorch multithreading cpu unable to fix seeds

PPO cannot play CartPole ?

Lunar Lander using Deep Q-Learning

Rl algorithm implemented

Recapping the AI, Machine Learning and Data Science Meetup — May 2, 2024

Overall loss in PPO, why does it matter?

This page summarizes the projects mentioned and recommended in the original post on /r/reinforcementlearning reinforcement-learning monte-carlo-methods qlearning-algorithm convolutional-neural-networks sarsa Post date: 28 Apr 2023

ppo-implementation-details

Youtube-Code-Repository

InfluxDB

Related posts

Intrinsic Curiosity Module Pytorch multithreading cpu unable to fix seeds

PPO cannot play CartPole ?

Lunar Lander using Deep Q-Learning

Rl algorithm implemented

Recapping the AI, Machine Learning and Data Science Meetup — May 2, 2024

This page summarizes the projects mentioned and recommended in the original post on /r/reinforcementlearning
reinforcement-learning monte-carlo-methods qlearning-algorithm convolutional-neural-networks sarsa
Post date: 28 Apr 2023