Deep-Reinforcement-Learning-Algorithms
DeepRL-TensorFlow2
Deep-Reinforcement-Learning-Algorithms | DeepRL-TensorFlow2 | |
---|---|---|
3 | 2 | |
903 | 607 | |
0.0% | 0.0% | |
3.6 | 0.0 | |
about 4 years ago | about 3 years ago | |
Jupyter Notebook | Python | |
- | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Deep-Reinforcement-Learning-Algorithms
-
Is there a canonical simple "helloworld" neural network design? Something beyond AND/OR logic, a handful of nodes that does something mildly "useful"?
I guess the most spectacular in terms of performance/"brain size" ratio is a 2 neuron, 8 weights network https://github.com/Rafael1s/Deep-Reinforcement-Learning-Algorithms/tree/master/CartPole-Policy-Based-Hill-Climbing
-
Training time of CartPole is way to long
It can be solved in 113 episodes by Hill Climbing algorithm, https://github.com/Rafael1s/Deep-Reinforcement-Learning-Algorithms/tree/master/CartPole-Policy-Based-Hill-Climbingor by Double Deep Q-Learning in 612 episodes, https://github.com/Rafael1s/Deep-Reinforcement-Learning-Algorithms/tree/master/Cartpole-Double-Deep-Q-Learning
-
Need help with PyTorch script for Actor_Critic implementation of MountainCar env.
You can find the solution for MountainCar env here: https://github.com/Rafael1s/Deep-Reinforcement-Learning-Algorithms/tree/master/MountainCarContinuous-TD3This solution implemented using PyTorch. The TD3 model is the successor to DDPG algorithm using the Actor-Critic model.
DeepRL-TensorFlow2
-
PPO implementation in TensorFlow2
I've been searching for a clean, good, and understandable implementation of PPO for continuous action space with TF2 witch is understandable enough for me to apply my modifications, but the closest thing that I have found is this code which seems to not work properly even on a simple gym cartpole env (discussed issues in git-hub repo suggest the same problem) so I have some doubts :). I was wondering whether you could recommend an implementation that you trust and suggest :)
-
Question about using tf.stop_gradient in separate Actor-Critic networks for A2C implementation for TF2
I have been looking at this implementation of A2C. Here the author of the code uses stop_gradient only on the critic network at L90 bur not in the actor network L61 for the continuous case. However , it is used both in actor and critic networks for the discrete case. Can someone explain me why?
What are some alternatives?
Popular-RL-Algorithms - PyTorch implementation of Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), Actor-Critic (AC/A2C), Proximal Policy Optimization (PPO), QT-Opt, PointNet..
Reinforcement-Learning - Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning
Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020 - Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy. ICAIF 2020. Please star. [Moved to: https://github.com/AI4Finance-Foundation/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020]
ydata-synthetic - Synthetic data generators for tabular and time-series data
rl_lib - Series of deep reinforcement learning algorithms 🤖
IRL - Algorithms for Inverse Reinforcement Learning