Our great sponsors
-
stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I noticed that off-policy algorithms including DQN, DDPG and TD3 in different baselines and stable-baselines are implemented with a single environment. And even if more environments were added, this won't affect performance because this will only be adding more fresh samples to replay buffer(s). What are some ways to improve speed without major changes to the algorithms? The only thing that I could think of is adding an on-policy update like in ACER but this is going to change the algorithms and I don't know whether it will improve/worsen model convergence.
I noticed that off-policy algorithms including DQN, DDPG and TD3 in different baselines and stable-baselines are implemented with a single environment. And even if more environments were added, this won't affect performance because this will only be adding more fresh samples to replay buffer(s). What are some ways to improve speed without major changes to the algorithms? The only thing that I could think of is adding an on-policy update like in ACER but this is going to change the algorithms and I don't know whether it will improve/worsen model convergence.