d3rlpy
Coursera_Reinforcement_Learning
d3rlpy | Coursera_Reinforcement_Learning | |
---|---|---|
2 | 1 | |
1,215 | 197 | |
- | - | |
9.1 | 10.0 | |
10 days ago | over 4 years ago | |
Python | Jupyter Notebook | |
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
d3rlpy
- Python libraries for solving reinforcement learning problems implemented in OpenAI gym
-
Conservative Q Learning TD error not converging
Hi, I am using the discrete conservative Q learning implementation in the d3rlpy library (https://github.com/takuseno/d3rlpy) to train a policy offline to optimize mechanical ventilation treatment by using the MIMIC-III dataset (https://physionet.org/content/mimiciii-demo/1.4/).
Coursera_Reinforcement_Learning
-
Python libraries for solving reinforcement learning problems implemented in OpenAI gym
I meant State Of The Art (SOTA) ;) Look here for a simple implementation of Expected Sarsa ( you can also find sarsa on github) https://github.com/LucasBoTang/Coursera_Reinforcement_Learning/blob/master/02Sample-based_Learning_Methods/02Q-Learning_and_Expected_Sarsa.ipynb
What are some alternatives?
cleanrl - High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
rlai - This is a Python implementation of concepts and algorithms described in "Reinforcement Learning: An Introduction" (Sutton and Barto, 2018, 2nd edition).
exorl - ExORL: Exploratory Data for Offline Reinforcement Learning
Minari - A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities
pytorch-a2c-ppo-acktr-gail - PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).