SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Transformer Projects
-
Project mention: Llama-Factory: Unified, Efficient Fine-Tuning for 100 Open LLMs | news.ycombinator.com | 2025-09-18
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
nn
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
-
PEFT is more niche as it's a specific library (https://github.com/huggingface/peft) that creates LoRAs, but for a machine learning engineer working with models outside of just prompting an API, it's within scope to know why/how a LoRA should be used.
-
For kernel-level performance tuning you can use the occupancy calculator as pointed out by jplusqualt or you can profile your kernel with Nsight compute which will give you a ton of info.
But for model-wide performance, you basically have to come up with your own calculation to estimate the FLOPs required by your model and based on that figure out how well your model is maxing out the GPU capabilities (MFU/HFU).
Here is a more in-depth example on how you might do this: https://github.com/stas00/ml-engineering/tree/master/trainin...
-
DocsGPT
Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.
-
-
RWKV-LM
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.
Project mention: RWKV-7 beats Llama 3.2 with 3x fewer training tokens and formally exceeds TC^0 | news.ycombinator.com | 2026-02-23 -
Project mention: One Open Source Project per Day #74: ai-engineering-from-scratch - Build AI Full-stack Skills from Ground Up | dev.to | 2026-05-23
git clone https://github.com/rohitg00/ai-engineering-from-scratch.git cd ai-engineering-from-scratch python phases/01-math-foundations/01-linear-algebra-intuition/code/vectors.py
-
-
-
Star the Speech Brain repository ⭐
-
segmentation_models.pytorch
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
-
OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
-
presidio
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
Microsoft's Presidio identifies and redacts PII well, including a fair number of international and borderline-uncommon formats. Yelp's detect-secrets is the other obvious building block — but it's a detector, not a redactor. It finds credentials so a pre-commit baseline can block them; it doesn't rewrite anything on the wire. Wire either into a proxy like LiteLLM or Bifrost and you get detection plus outbound redaction.
-
Project mention: Sana high-resolution image and video generation from | news.ycombinator.com | 2026-05-30
-
Project mention: Gradient Descent on Token Input Embeddings: A ModernBERT experiment | dev.to | 2025-06-23
ModernBERT-large was chosen because it is relatively lightweight model with a strong visualization suite and a simplified attention mask (full cross-attention) that is easy to reason about. It would be interesting to see if the results in this post hold across other models.
-
-
gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
-
mlx-audio
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Project mention: The Free, Open-Source Alternative to ElevenLabs Is Finally Here | dev.to | 2026-05-24uv pip install "git+https://github.com/Blaizzy/mlx-audio" --prerelease=allow uv pip install soundfile
-
courses
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI) (by SkalskiP)
-
-
train-llm-from-scratch
A straightforward method for training your LLM, from downloading data to generating text.
Project mention: Train LLMs from Scratch, Hermes Agent WebUI, & Efficient OlmoEarth v1.1 for Local AI | dev.to | 2026-05-31Source: https://github.com/FareedKhan-dev/train-llm-from-scratch
-
Python Transformers discussion
Python Transformers related posts
-
Seasons time-lapse - alignment
-
The Free, Open-Source Alternative to ElevenLabs Is Finally Here
-
Advanced Quantization Algorithm for LLMs
-
Wikipedia survives while the rest of the internet breaks
-
Anthropic: Persona Vectors
-
🚀 25+ Open Source AI APIs, Models & Tools (with GitHub Repo Links)
-
FlashMoE: DeepSeek-R1 671B and Qwen3MoE 235B with 1~2 Intel B580 GPU in IPEX-LLM
-
A note from our sponsor - SaaSHub
www.saashub.com | 13 Jun 2026
Index
What are some of the best open-source Transformer projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | LlamaFactory | 72,081 |
| 2 | nn | 66,926 |
| 3 | peft | 21,267 |
| 4 | ml-engineering | 18,080 |
| 5 | DocsGPT | 17,925 |
| 6 | Megatron-LM | 16,685 |
| 7 | RWKV-LM | 14,559 |
| 8 | ai-engineering-from-scratch | 13,774 |
| 9 | PaddleNLP | 12,952 |
| 10 | txtai | 12,649 |
| 11 | speechbrain | 11,610 |
| 12 | segmentation_models.pytorch | 11,609 |
| 13 | OpenRLHF | 9,627 |
| 14 | presidio | 8,565 |
| 15 | Sana | 8,228 |
| 16 | bertviz | 8,086 |
| 17 | BERTopic | 7,680 |
| 18 | gpt-neox | 7,442 |
| 19 | mlx-audio | 7,345 |
| 20 | courses | 6,435 |
| 21 | argos-translate | 6,141 |
| 22 | train-llm-from-scratch | 5,647 |
| 23 | alignment-handbook | 5,609 |