Python Transformers

Open-source Python projects categorized as Transformers

Top 23 Python Transformer Projects

  • nn

    🧑‍🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • LLaMA-Factory

    A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

    Project mention: Llama-Factory: A WebUI for Efficient Fine-Tuning of 100 LLMs | | 2024-07-17
  • vit-pytorch

    Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

  • peft

    🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

    Project mention: LoftQ: LoRA-fine-tuning-aware Quantization | | 2023-12-19
  • haystack

    :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

    Project mention: Haystack DB – 10x faster than FAISS with binary embeddings by default | | 2024-04-28

    I was confused for a bit but there is no relation to


    RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

    Project mention: Do LLMs need a context window? | | 2023-12-25 lists a number of implementations of various versions of RWKV. :

    > RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V)

    > RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.

    > So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).

    > "Our latest version is RWKV-6,*

  • PaddleNLP

    👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ml-engineering

    Machine Learning Engineering Open Book

    Project mention: Accelerators | | 2024-02-22
  • speechbrain

    A PyTorch-based Speech Toolkit

    Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | | 2024-02-28
  • PaLM-rlhf-pytorch

    Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

  • txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

    Project mention: txtai 7.3 released: Adds new RAG Web Apps and streaming LLM/RAG support | | 2024-07-15
  • gpt-neox

    An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

    Project mention: Why YC Went to DC | | 2024-06-03

    Closest to this would be whose training data is largely public and training processes are openly discussed, planned, and evaluated on their Discord server. Much of their training dataset is available at (their onion link is considered "primary", however, due to copyright concerns)

  • bertviz

    BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

    Project mention: StreamingLLM: tiny tweak to KV LRU improves long conversations | | 2024-02-13

    This seems only to work cause large GPTs have redundant, undercomplex attentions. See this issue in BertViz about attention in Llama:

  • BigDL

    Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

    Project mention: LLaMA Now Goes Faster on CPUs | | 2024-03-31

    Any performance benchmark against intel's 'IPEX-LLM'[0] or others?

    [0] -

  • BERTopic

    Leveraging BERT and c-TF-IDF to create easily interpretable topics.

  • DALLE-pytorch

    Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

    Project mention: The Eleuther AI Mafia | | 2023-09-03

    It all started originally on lucidrains/dalle-pytorch in the months following the release of DALL-E (1). The group started as `dalle-pytorch-replicate` but was never officially "blessed" by Phil Wang who seems to enjoy being a free agent (can't blame him). is where the discord got kicked off originally. There's a lot of other interactions between us in the github there. You should be able to find when Phil was approached by Jenia Jitsev, Jan Ebert, and Mehdi Cherti (all starting LAION members) who graciously offered the chance to replicate the DALL-E paper using their available compute at the JUWELS and JUWELS Booster HPC system. This all predates Emad's arrival. I believe he showed up around the time guided diffusion and GLIDE, but it may have been a bit earlier.

    Data work originally focused on amassing several of the bigger datasets of the time. Getting CC12M downloaded and trained on was something of an early milestone (robvanvolt's work). A lot of early work was like that though, shuffling through CC12M, COCO, etc. with the dalle-pytorch codebase until we got an avocado armchair.

    Christophe Schumann was an early contributor as well and great at organizing and rallying. He focused a lot on the early data scraping work for what would become the "LAION5B" dataset. I don't want to credit him with the coding and I'm ashamed to admit I can't recall who did much of the work there - but a distributed scraping program was developed (the name was something@home... not scraping@home?).

    The discord link on Phil Wang's readme at dalle-pytorch got a lot of traffic and a lot of people who wanted to pitch in with the scraping effort.

    Eventually a lot of people from Eleuther and many other teams mingled with us, some sort of non-profit org was created in Germany I believe for legal purposes. The dataset continued to grow and the group moved from training DALLE's to finetuning diffusion models.

    The `CompVis` team were great inspiration at the time and much of their work on VQGAN and then latent diffusion models basically kept us motivated. As I mentioned a personal motivation was Katherine Crowson's work on a variety of things like CLIP-guided vqgan, diffusion, etc.

    I believe Emad Mostaque showed up around the time GLIDE was coming out? I want to say he donated money for scrapers to be run on AWS to speed up data collection. I was largely hands off for much of the data scraping process and mostly enjoyed training new models on data we had.

    As with any online community things got pretty ill-defined, roles changed over, volunteers came/went, etc. I would hardly call this definitive and that's at least partially the reason it's hard to trace as an outsider. That much of the early history is scattered about GitHub issues and PR's can't have helped though.

  • openchat

    OpenChat: Advancing Open-source Language Models with Imperfect Data (by imoneoi)

    Project mention: Alternative of bard,bing, claude | /r/artificial | 2023-12-10

    Depending on your use case, might be woth looking into

  • courses

    This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI) (by SkalskiP)

  • superduper

    🔮 SuperDuper: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

    Project mention: Build fully portable AI applications on top of Snowflake with SuperDuperDB | | 2024-06-26

    Customize how AI and databases work together. Scale your AI projects to handle more data and users. Move AI projects between different environments easily. Extend the system with new AI features and database functionality. Check it out: Blog: Github: (leave us a star ⭐️🥳)

  • deep-daze

    Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by

  • x-transformers

    A simple but complete full-attention transformer with a set of promising experimental features from various papers

    Project mention: x-transformers | | 2024-03-31
  • marqo

    Unified embedding generation and search engine. Also available on cloud -

    Project mention: AI Search That Understands the Way Your Customer's Think | | 2024-05-28
  • llmware

    Unified framework for building enterprise RAG pipelines with small, specialized models

    Project mention: Are we all prompting wrong? Balancing Creativity and Consistency in RAG. | | 2024-06-17

    The full code for this example can be found in our Github repo.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Transformers discussion

Log in or Post with

Python Transformers related posts

  • Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars

    4 projects | | 28 May 2024
  • PaliGemma: Open-Source Multimodal Model by Google

    5 projects | | 15 May 2024
  • Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o

    8 projects | | 15 May 2024
  • Rabbit R1 can be run on a Android device

    1 project | | 5 May 2024
  • OpenAdapt: AI-First Process Automation with Large Multimodal Models

    1 project | | 5 May 2024
  • Adapter between LMMs and traditional desktop and web GUI

    1 project | | 1 May 2024
  • I Witnessed the Future of AI, and It's a Broken Toy

    1 project | | 30 Apr 2024
  • A note from our sponsor - InfluxDB | 18 Jul 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →


What are some of the best open-source Transformer projects in Python? This list will help you:

Project Stars
1 nn 51,938
2 LLaMA-Factory 26,200
3 vit-pytorch 18,856
4 peft 15,067
5 haystack 14,603
6 RWKV-LM 12,006
7 PaddleNLP 11,743
8 ml-engineering 10,215
9 speechbrain 8,258
10 PaLM-rlhf-pytorch 7,639
11 txtai 7,516
12 gpt-neox 6,718
13 bertviz 6,608
14 BigDL 6,289
15 BERTopic 5,812
16 DALLE-pytorch 5,534
17 openchat 5,136
18 courses 5,090
19 superduper 4,553
20 deep-daze 4,379
21 x-transformers 4,393
22 marqo 4,336
23 llmware 4,242

Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in is all you need to start monitoring your apps. Sign up for our free tier today.