Pytorch implementation of "Maximum a Posteriori Policy Optimization" with Retrace for Discrete gym environments
Why do you think that https://github.com/google-deepmind/acme is a good alternative to MPO
Pytorch implementation of "Maximum a Posteriori Policy Optimization" with Retrace for Discrete gym environments
Why do you think that https://github.com/google-deepmind/acme is a good alternative to MPO