Fine-tune LLM agents with online reinforcement learning
Why do you think that https://github.com/xoolive/traffic is a good alternative to LlamaGym
Fine-tune LLM agents with online reinforcement learning
Why do you think that https://github.com/xoolive/traffic is a good alternative to LlamaGym