stable-baselines3
baselines
Our great sponsors
stable-baselines3 | baselines | |
---|---|---|
46 | 14 | |
7,894 | 15,339 | |
5.2% | 1.0% | |
8.2 | 0.0 | |
7 days ago | 5 months ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
stable-baselines3
-
Sim-to-real RL pipeline for open-source wheeled bipeds
The latest release (v3.0.0) of Upkie's software brings a functional sim-to-real reinforcement learning pipeline based on Stable Baselines3, with standard sim-to-real tricks. The pipeline trains on the Gymnasium environments distributed in upkie.envs (setup: pip install upkie) and is implemented in the PPO balancer. Here is a policy running on an Upkie:
-
[P] PettingZoo 1.24.0 has been released (including Stable-Baselines3 tutorials)
PettingZoo 1.24.0 is now live! This release includes Python 3.11 support, updated Chess and Hanabi environment versions, and many bugfixes, documentation updates and testing expansions. We are also very excited to announce 3 tutorials using Stable-Baselines3, and a full training script using CleanRL with TensorBoard and WandB.
-
[Question] Why there is so few algorithms implemented in SB3?
I am wondering why there is so few algorithms in Stable Baselines 3 (SB3, https://github.com/DLR-RM/stable-baselines3/tree/master)? I was expecting some algorithms like ICM, HIRO, DIAYN, ... Why there is no model-based, skill-chaining, hierarchical-RL, ... algorithms implemented there?
-
Stable baselines! Where my people at?
Discord is more focused, and they have a page for people who wants to contribute https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md
-
SB3 - NotImplementedError: Box([-1. -1. -8.], [1. 1. 8.], (3,), <class 'numpy.float32'>) observation space is not supported
Therefore, I debugged this error to the ReplayBuffer that was imported from `SB3`. This is the problem function -
- Exporting an A2C model created with stable-baselines3 to PyTorch
-
Shimmy 1.0: Gymnasium & PettingZoo bindings for popular external RL environments
Have you ever wanted to use dm-control with stable-baselines3? Within Reinforcement learning (RL), a number of APIs are used to implement environments, with limited ability to convert between them. This makes training agents across different APIs highly difficult, and has resulted in a fractured ecosystem.
-
Stable-Baselines3 v1.8 Release
Changelog: https://github.com/DLR-RM/stable-baselines3/releases/tag/v1.8.0
-
[P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up
Great project! One question though, is there any reason why you are not using existing RL models instead of creating your own, such as stable baselines?
- Is stable-baselines3 compatible with gymnasium/gymnasium-robotics?
baselines
-
What if I can't reproduce a UMAP exactly for a paper in revision?
I do not for the life of me pretend to really understand either GPUs or floating-point arithmetic, but I think the basic problem is that floating point arithmetic isn't associative, so the order of operations matters immensely. For reasons I don't fully grok, GPUs don't always dispatch the same inputs the same way, so these floating-point non-associativity discrepancies can pile up. See also here and here.
-
How to proceed further? (Learning RL)
Ah sorry I understood your post. It has helped me to code quite a few of them from scratch but you can also check out https://github.com/openai/baselines or similar
-
Does the value of the reward matter?
Yes this is a good point, I always normalize my rewards such that *returns* are around -3 to 3. The baselines implementation has a good example of this. Aside from normalizing returns it's common to also normalize the advantages. Together this should allow any scale of rewards (I have games where scores range from 0-20 and games that range from 0-600,000 and haven't found a problem so long as I normalize everything :) )
-
How to tune hypeparametes in A2C-ppo?
Im currently working with A2C. The model was able to learn open ai pong, i ran this as a sanity check that i havent made any bugs. Now im trying to make the model play breakout, but still after 10m steps the model has not made any significant progress. Im using baseline hyperparameters which can be found here https://github.com/openai/baselines/blob/master/baselines/a2c/a2c.py, except my buffersize have been from 512 to 4096. Ive noticed that entropy decreases extremely slowly given the buffersize from the interval which i just gave. So my questions are how to make entropy decrease and how to increase rewards per buffer? Ive tried to decrease the entropy coefficient to almost zero, but still it acts very weirdly.
-
Boycotting 2.0 or rather PoS
I used a multitude of agents to train it but the best I found was A3C, there are a bunch of examples here you can use to test their performance (although they may require some tweaking).
-
How to speed up off-policy algorithms?
I noticed that off-policy algorithms including DQN, DDPG and TD3 in different baselines and stable-baselines are implemented with a single environment. And even if more environments were added, this won't affect performance because this will only be adding more fresh samples to replay buffer(s). What are some ways to improve speed without major changes to the algorithms? The only thing that I could think of is adding an on-policy update like in ACER but this is going to change the algorithms and I don't know whether it will improve/worsen model convergence.
-
Any beginner resources for RL in Robotics?
OpenAI baselines https://github.com/openai/baselines
-
Atary BreakoutDeterministic-v4
Without seeing your source code/hyperparams it's going to be difficult to give you advice. I would say to compare against good open-source implementations such as OpenAI Baselines and make sure you have implemented it correctly.
-
Convergence of the PPO
It might be worth comparing your implementation to the DeepMind PPO1 & 2 ones to see if they have the same side effect: https://github.com/openai/baselines
-
Using CNN in Reinforcement Learning
For Atari games, RGB is often not useful. However, channels are still needed because a single-frame observation is often not a good proxy of a state. For example in games like Breakout or Pong, a single screenshot doesn't tell you the direction to which the ball is moving. So typically we preprocess the observation and use channels to stack a few recent frames. You can take a look at https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py
What are some alternatives?
Ray - Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
dm_control - Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
stable-baselines - A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
gym - A toolkit for developing and comparing reinforcement learning algorithms.
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
gdrl - Grokking Deep Reinforcement Learning
cleanrl - High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Robotics Library (RL) - The Robotics Library (RL) is a self-contained C++ library for rigid body kinematics and dynamics, motion planning, and control.
tianshou - An elegant PyTorch deep reinforcement learning library.
ppo-implementation-details - The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
Super-mario-bros-PPO-pytorch - Proximal Policy Optimization (PPO) algorithm for Super Mario Bros
gym-solutions - OpenAI Gym Solutions