Policy-based optimization : single-step policy gradient seen as an evolution strategy
Why do you think that https://github.com/kaifishr/RocketLander is a good alternative to pbo
Policy-based optimization : single-step policy gradient seen as an evolution strategy
Why do you think that https://github.com/kaifishr/RocketLander is a good alternative to pbo