SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python reinforcement-learning Projects
-
nn
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
unsloth
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
Project mention: I Trained an LLM on 75K of My Own Messages So It Would Stop Writing Like a Chatbot | dev.to | 2026-05-08Training: unsloth + trl (SFTTrainer). Unsloth handles the 4-bit quantization and gradient checkpointing; trl handles the training loop.
-
Ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Project mention: GSoC 2026 Predictions: 30 NEW AI/ML/Security Organizations You Should Start Contributing to NOW! | dev.to | 2026-02-06Main: https://github.com/ray-project/ray ⭐ 34k+
-
sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
Project mention: DeepSeek makes the V4 Pro price discount permanent | news.ycombinator.com | 2026-05-22There are several things at play:
Inference stack efficiency: Many of these providers take off the shelf sglang / vllm / trtllm and hope for the best. Meanwhile DeepSeek team is known for pushing the boundary of optimizations.
Now, sglang and vllm are great pieces of software, but take DeepSeek's Sparse Attention (DSA). Introduced 1.5 years ago (https://arxiv.org/abs/2512.02556), used by DeepSeek 3.2, GLM 5, DeepSeek V4. Only now is it slowly strating to get optimized in the major inference engines: (https://github.com/sgl-project/sglang/issues/19380 https://github.com/sgl-project/sglang/pull/22851 etc.). Of course, DS V4 adds extra optimizations into the model architecture on top of DSA, and those will take more time to be taken full advantage of by the open source inference engines.
Privacy: Betting that people will pay extra for inference hosted outside China. This is especially true with DeepSeek, because DeepSeek is transparent about using API data for model improvements.
And few other things (scale (matters a lot for MoEs), reliability, soft enterprise lock in, etc.)
---
There is also, likely, tacit collusion at play here. Look at GLM 5 and GLM 5.1 prices. GLM 5 and 5.1 cost the same to run, but providers decided to charge much more for 5.1 because it is much better model, and because Z.AI raised their price as well.
-
d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
-
Microsoft's agent-lightning project offers a comprehensive toolkit aimed at accelerating the process of building, testing, and deploying AI Agents. This open-source initiative highlights the industry's commitment to enabling faster development and implementation of advanced AI capabilities, providing developers with robust resources to streamline AI agent creation.
-
reinforcement-learning-an-introduction
Python Implementation of Reinforcement Learning: An Introduction
-
Project mention: One Open Source Project per Day #74: ai-engineering-from-scratch - Build AI Full-stack Skills from Ground Up | dev.to | 2026-05-23
git clone https://github.com/rohitg00/ai-engineering-from-scratch.git cd ai-engineering-from-scratch python phases/01-math-foundations/01-linear-algebra-intuition/code/vectors.py
-
stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
-
Gymnasium
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
-
wandb
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Project mention: The $100 ChatGPT: Why Karpathy's nanochat Represnts the Next Big Thing | dev.to | 2026-05-04Each stage is comprehensible. Each stage is hackable. You can literally watch it get smarter in real-time through the wandb plots.
-
cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
-
ART
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!
Huh?
I thought the latest advance in computing (spring 2025) is self-play / reinforcement learning. Like we've ran out of training data a few years ago.
https://github.com/OpenPipe/ART
Reinforcement learning having the large language model devise puzzles that they solve via llm-as-judge.
The definition of llm-as-judge is your llm generate 8-12 trajectories and a different llm judges the result. I'd use an oracle like windows or linux operating system execution for the problem of ISA-assembly creation.
The winning entries are used to train the large language model.
-
OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
-
-
-
-
-
While mainstream AI chatter circles ever-larger models, two research drops last weeks point to something more tactical: faster, cheaper ways to customize and train what you already have. Sakana AI's Text-to-LoRA (T2L) slashes adapter creation to a single prompt, and AReaL framework squeezes 2-3× more throughput from your RLHF cluster. Let's unpack the wins and risks.
-
-
trlx
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
-
dm_control
Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
-
Python reinforcement-learning discussion
Python reinforcement-learning related posts
-
I'm Scared About Biological Computing
-
The Evolution of GUI Agents: From RPA Scripts to AI That Sees Your Screen
-
Open Source Project of the Day (Part 10): AgentEvolver - Self-Evolving Agent System for Autonomous Learning and Evolution
-
Simular Agent S hits 72.6% success on 369 real computer tasks (human: 72.36%)
-
Learning to Model the World with Language
-
maze VS pi-optimal - a user suggested alternative
2 projects | 30 Oct 2025 -
Daily Artificial Intelligence Digest - Oct 26, 2025
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 Jun 2026
Index
What are some of the best open-source reinforcement-learning projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | nn | 66,902 |
| 2 | unsloth | 65,904 |
| 3 | Ray | 42,791 |
| 4 | sglang | 28,872 |
| 5 | d2l-en | 28,853 |
| 6 | agent-lightning | 17,276 |
| 7 | reinforcement-learning-an-introduction | 14,640 |
| 8 | ai-engineering-from-scratch | 13,774 |
| 9 | stable-baselines3 | 13,381 |
| 10 | Gymnasium | 12,001 |
| 11 | wandb | 11,104 |
| 12 | cleanrl | 9,911 |
| 13 | ART | 9,893 |
| 14 | OpenRLHF | 9,596 |
| 15 | machine_learning_examples | 8,877 |
| 16 | pysc2 | 8,276 |
| 17 | TensorLayer | 7,389 |
| 18 | keras-rl | 5,554 |
| 19 | AReaL | 5,252 |
| 20 | xtuner | 5,151 |
| 21 | trlx | 4,743 |
| 22 | dm_control | 4,607 |
| 23 | ElegantRL | 4,337 |