-
ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I am using as base code the Phils Tabor Implementation and this site (and sometimes OpenAi repository), but I can't figure out how tensorflow/PyTorch knows which loss belongs to whom. When the loss is split, you create two separate tape.Gradient, but when overall loss is used, how can the model understand which part propagates and which doesn't?
In Phil tabor's implementation it calculates Actor and Critic loss separately (line 95+) and does not calculate equation 9.