A custom MARL (multi-agent reinforcement learning) environment where multiple agents trade against one another (self-play) in a zero-sum continuous double auction. Ray [RLlib] is used for training.
Why do you think that https://github.com/MarcoMeter/recurrent-ppo-truncated-bptt is a good alternative to gym-continuousDoubleAuction