OpenAI Baselines: high-quality implementations of reinforcement learning algorithms (by openai)

Baselines Alternatives

Similar projects and alternatives to baselines

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better baselines alternative or higher similarity.

baselines reviews and mentions

Posts with mentions or reviews of baselines. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-10-03.
  • What if I can't reproduce a UMAP exactly for a paper in revision?
    1 project | /r/bioinformatics | 7 Dec 2022
    I do not for the life of me pretend to really understand either GPUs or floating-point arithmetic, but I think the basic problem is that floating point arithmetic isn't associative, so the order of operations matters immensely. For reasons I don't fully grok, GPUs don't always dispatch the same inputs the same way, so these floating-point non-associativity discrepancies can pile up. See also here and here.
  • How to proceed further? (Learning RL)
    3 projects | /r/reinforcementlearning | 3 Oct 2022
    Ah sorry I understood your post. It has helped me to code quite a few of them from scratch but you can also check out or similar
  • Does the value of the reward matter?
    1 project | /r/reinforcementlearning | 23 Jun 2022
    Yes this is a good point, I always normalize my rewards such that *returns* are around -3 to 3. The baselines implementation has a good example of this. Aside from normalizing returns it's common to also normalize the advantages. Together this should allow any scale of rewards (I have games where scores range from 0-20 and games that range from 0-600,000 and haven't found a problem so long as I normalize everything :) )
  • How to tune hypeparametes in A2C-ppo?
    2 projects | /r/reinforcementlearning | 15 Jun 2022
    Im currently working with A2C. The model was able to learn open ai pong, i ran this as a sanity check that i havent made any bugs. Now im trying to make the model play breakout, but still after 10m steps the model has not made any significant progress. Im using baseline hyperparameters which can be found here, except my buffersize have been from 512 to 4096. Ive noticed that entropy decreases extremely slowly given the buffersize from the interval which i just gave. So my questions are how to make entropy decrease and how to increase rewards per buffer? Ive tried to decrease the entropy coefficient to almost zero, but still it acts very weirdly.
  • Boycotting 2.0 or rather PoS
    2 projects | /r/EtherMining | 15 May 2021
    I used a multitude of agents to train it but the best I found was A3C, there are a bunch of examples here you can use to test their performance (although they may require some tweaking).
  • How to speed up off-policy algorithms?
    2 projects | /r/reinforcementlearning | 21 Apr 2021
    I noticed that off-policy algorithms including DQN, DDPG and TD3 in different baselines and stable-baselines are implemented with a single environment. And even if more environments were added, this won't affect performance because this will only be adding more fresh samples to replay buffer(s). What are some ways to improve speed without major changes to the algorithms? The only thing that I could think of is adding an on-policy update like in ACER but this is going to change the algorithms and I don't know whether it will improve/worsen model convergence.
  • Any beginner resources for RL in Robotics?
    3 projects | /r/robotics | 19 Apr 2021
    OpenAI baselines
  • Atary BreakoutDeterministic-v4
    1 project | /r/reinforcementlearning | 1 Apr 2021
    Without seeing your source code/hyperparams it's going to be difficult to give you advice. I would say to compare against good open-source implementations such as OpenAI Baselines and make sure you have implemented it correctly.
  • Convergence of the PPO
    2 projects | /r/reinforcementlearning | 27 Mar 2021
    It might be worth comparing your implementation to the DeepMind PPO1 & 2 ones to see if they have the same side effect:
  • Using CNN in Reinforcement Learning
    1 project | /r/reinforcementlearning | 3 Mar 2021
    For Atari games, RGB is often not useful. However, channels are still needed because a single-frame observation is often not a good proxy of a state. For example in games like Breakout or Pong, a single screenshot doesn't tell you the direction to which the ball is moving. So typically we preprocess the observation and use channels to stack a few recent frames. You can take a look at
  • A note from our sponsor - SaaSHub | 23 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →


Basic baselines repo stats
6 months ago

openai/baselines is an open source project licensed under MIT License which is an OSI approved license.

The primary programming language of baselines is Python.

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives