Our great sponsors
-
PettingZoo
An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Hi, if you are gonna train a deep RL algorithm on a real robot and you are a beginner, I suggest you try out tmrl. This will allow you to try out a readily available algorithm (Soft Actor-Critic) in real-time on a real video-game (TrackMania), as real-world-like proxy for all the concerns you will encounter on real robot, and to rather easily develop your own robot-learning pipeline from there for your own robot. The repo has a huge tutorial exactly for this purpose.
I'd say this is a great path but I'd also look at the basic on-policy gradient actor critic methods like A2C and eventually PPO. Someone recommended SAC which also really good. There are tons of environments in the https://github.com/Farama-Foundation/PettingZoo as well if you want to mess with those. You can also check out stable baselines https://github.com/DLR-RM/stable-baselines3 which is pretty popular. If you want to get into the theory more I recommend reading the Sutton and Barto book on reinforcement learning.
I'd say this is a great path but I'd also look at the basic on-policy gradient actor critic methods like A2C and eventually PPO. Someone recommended SAC which also really good. There are tons of environments in the https://github.com/Farama-Foundation/PettingZoo as well if you want to mess with those. You can also check out stable baselines https://github.com/DLR-RM/stable-baselines3 which is pretty popular. If you want to get into the theory more I recommend reading the Sutton and Barto book on reinforcement learning.
Related posts
- How to proceed further? (Learning RL)
- PPO rollout buffer for turn-based two-player game with varying turn lengths
- Question about the old policy and new policy in TRPO code
- Show HN: An end-to-end reinforcement learning library for infinite horizon tasks
- Problem with Truncated Quantile Critics (TQC) and n-step learning algorithm.