Our great sponsors
-
stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SuperSuit
A collection of wrappers for Gymnasium and PettingZoo environments (being merged into gymnasium.wrappers and pettingzoo.wrappers
The ks_env.train() method seems to be the one from kaggle_environments.Environment:
Q1. However I got confused. Why does environment contains train() method? That is why environment should do training? I feel, train() should be part of the model: above article uses PPO algorithm which contains train() method. This PPO.train() gets called when we call PPO.learn() which makes sense.
I haven't read the code, but stable-baselines doesn't support multi-agent environments (https://github.com/hill-a/stable-baselines/issues/423), so I think they're trying to make learning multi-agent easier with Environment.train().
Multi-agent isn’t supported by default in stable baselines, but you can make it work with PettingZoo. This example trains a single policy to control every agent in an environment (Parameter sharing). You could use these SuperSuit wrappers to work with other methods (self-play, independent learning, etc) but you would probably need to write some custom training code. https://github.com/PettingZoo-Team/SuperSuit#parallel-environment-vectorization