Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free. Learn more →
Top 23 Python reinforcement-learning Projects
-
nn
🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes đź“ť; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... đź§
-
Nutrient
Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
-
Ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
I'm guessing this comment is some kind of "if you know, you know." Likely starting from https://docs.ray.io/en/latest/cluster/vms/user-guides/launch... and then trawling through one of these I guess https://github.com/ray-project/ray/issues?q=is%3Aissue+prem+...
-
d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
-
reinforcement-learning-an-introduction
Python Implementation of Reinforcement Learning: An Introduction
-
stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
-
wandb
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Project mention: Bullish on AI infrastructure, bearish on AI developer frameworks | dev.to | 2025-01-31Experiment tracking and reproducibility: Tools like Weights & Biases solve the hard problem of managing hundreds of experiments with varying hyperparameters, dataset splits, and evaluation results. This is critical for teams working collaboratively on model improvements.
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
Gymnasium
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
-
Project mention: Maxtext: A simple, performant and scalable Jax LLM | news.ycombinator.com | 2024-04-23
Is t5x an encoder/decoder architecture?
Some more general options.
The Flax ecosystem
https://github.com/google/flax?tab=readme-ov-file
or dm-haiku
https://github.com/google-deepmind/dm-haiku
were some of the best developed communities in the Jax AI field
Perhaps the “trax” repo? https://github.com/google/trax
Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...
Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py
-
-
PaLM-rlhf-pytorch
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
-
-
cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
-
-
trlx
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Project mention: Recapping the AI, Machine Learning and Data Science Meetup — May 2, 2024 | dev.to | 2024-05-02Transformer Reinforcement Learning X on GitHub
-
OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
-
-
dm_control
Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
-
-
pytorch-a2c-ppo-acktr-gail
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python reinforcement-learning discussion
Python reinforcement-learning related posts
-
AgiBot X1, a modular humanoid robot with high dof
-
Gymnasium Release v1.0
-
Ask HN: Best way to learn robotics with a 10 year old?
-
Deep Reinforcement Learning: Zero to Hero
-
Recapping the AI, Machine Learning and Data Science Meetup — May 2, 2024
-
Bayesianbandits: A Pythonic microframework for multi-armed bandit problems
-
Adding Weapons
-
A note from our sponsor - Nutrient
nutrient.io | 16 Feb 2025
Index
What are some of the best open-source reinforcement-learning projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | nn | 58,489 |
2 | Ray | 35,405 |
3 | d2l-en | 24,927 |
4 | reinforcement-learning-an-introduction | 13,802 |
5 | stable-baselines3 | 9,731 |
6 | wandb | 9,457 |
7 | machine_learning_examples | 8,489 |
8 | Gymnasium | 8,160 |
9 | trax | 8,159 |
10 | pysc2 | 8,064 |
11 | PaLM-rlhf-pytorch | 7,747 |
12 | TensorLayer | 7,338 |
13 | cleanrl | 6,285 |
14 | keras-rl | 5,529 |
15 | trlx | 4,563 |
16 | OpenRLHF | 4,534 |
17 | stable-baselines | 4,169 |
18 | dm_control | 3,914 |
19 | ElegantRL | 3,847 |
20 | pytorch-a2c-ppo-acktr-gail | 3,612 |
21 | polyaxon | 3,604 |
22 | acme | 3,582 |
23 | football | 3,377 |