Top 23 Transformer Open-Source Projects

nn

26 47,503 7.7 Jupyter Notebook

🧑‍🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
vit-pytorch

11 17,790 7.3 Python

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Project mention: Is it easier to go from Pytorch to TF and Keras than the other way around? | /r/pytorch | 2023-05-13

I also need to learn Pyspark so right now I am going to download the Fashion Mnist dataset, use Pyspark to downsize each image and put the into separate folders according to their labels (just to show employers I can do some basic ETL with Pyspark, not sure how I am going to load for training in Pytorch yet though). Then I am going to write the simplest Le Net to try to categorize the fashion MNIST dataset (results will most likely be bad but it's okay). Next, try to learn transfer learning in Pytorch for both CNN or maybe skip ahead to ViT. Ideally at this point I want to study the Attention mechanism a bit more and try to implement Simple Vit which I saw here: https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/simple_vit.py

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
LLaMA-Factory

2 16,319 9.9 Python

Unify Efficient Fine-Tuning of 100+ LLMs

Project mention: Show HN: GPU Prices on eBay | news.ycombinator.com | 2024-02-23

Depends what model you want to train, and how well you want your computer to keep working while you're doing it.
If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.
You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.
Spend a bit more and you'll probably have a better time.
[1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

CVPR2024-Papers-with-Code

1 15,923 6.6

CVPR 2024 论文和开源项目合集
peft

26 13,670 9.7 Python

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Project mention: LoftQ: LoRA-fine-tuning-aware Quantization | news.ycombinator.com | 2023-12-19

haystack

54 13,564 9.9 Python

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Project mention: Release Radar • March 2024 Edition | dev.to | 2024-04-07

View on GitHub

RWKV-LM

84 11,579 8.8 Python

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Project mention: Do LLMs need a context window? | news.ycombinator.com | 2023-12-25

https://github.com/BlinkDL/RWKV-LM#rwkv-discord-httpsdiscord... lists a number of implementations of various versions of RWKV.
https://github.com/BlinkDL/RWKV-LM#rwkv-parallelizable-rnn-w... :
> RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V)
> RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.
> So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).
> "Our latest version is RWKV-6,*

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
PaddleNLP

2 11,386 9.8 Python

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
ml-engineering

9 9,680 9.8 Python

Machine Learning Engineering Open Book

Project mention: Accelerators | news.ycombinator.com | 2024-02-22

tokenizers

8 8,375 8.5 Rust

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Project mention: HF Transfer: Speed up file transfers | /r/rust | 2023-07-07

Hugging Face seems to like Rust. They also wrote Tokenizers in Rust.

speechbrain

26 7,836 9.8 Python

A PyTorch-based Speech Toolkit

Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28

PaLM-rlhf-pytorch

25 7,587 4.6 Python

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

Project mention: How should I get an in-depth mathematical understanding of generative AI? | /r/datascience | 2023-05-18

ChatGPT isn't open sourced so we don't know what the actual implementation is. I think you can read Open Assistant's source code for application design. If that is too much, try Open Chat Toolkit's source code for developer tools . If you need very bare implementation, you should go for lucidrains/PaLM-rlhf-pytorch.

Transformers-Tutorials

7 7,460 8.4 Jupyter Notebook

This repository contains demos I made with the Transformers library by HuggingFace.

Project mention: AI enthusiasm #6 - Finetune any LLM you want💡 | dev.to | 2024-04-16

Most of this tutorial is based on Hugging Face course about Transformers and on Niels Rogge's Transformers tutorials: make sure to check their work and give them a star on GitHub, if you please ❤️

transformers.js

26 7,341 9.5 JavaScript

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!

Project mention: Transformers.js: Machine Learning for the Web | news.ycombinator.com | 2024-04-11

We have some other WebGPU demos, including:
- WebGPU embedding benchmark: https://huggingface.co/spaces/Xenova/webgpu-embedding-benchm...
- Real-time object detection: https://huggingface.co/spaces/Xenova/webgpu-video-object-det...
- Real-time background removal: https://huggingface.co/spaces/Xenova/webgpu-video-background...
- WebGPU depth estimation: https://huggingface.co/spaces/Xenova/webgpu-depth-anything
- Image background removal: https://huggingface.co/spaces/Xenova/remove-background-webgp...
You can follow the progress for full WebGPU support in the v3 development branch (https://github.com/xenova/transformers.js/pull/545).
To answer your question, while there are certain ops missing, the main limitation at the moment is for models with decoders... which are not very fast (yet) due to inefficient buffer reuse and many redundant copies between CPU and GPU. We're working closely with the ORT team to fix these issues though!

txtai

354 6,910 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Project mention: Build knowledge graphs with LLM-driven entity extraction | dev.to | 2024-02-21

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

gpt-neox

52 6,556 9.0 Python

An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

Project mention: FLaNK Stack 26 February 2024 | dev.to | 2024-02-26

bertviz

15 6,356 3.9 Python

BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

Project mention: StreamingLLM: tiny tweak to KV LRU improves long conversations | news.ycombinator.com | 2024-02-13

This seems only to work cause large GPTs have redundant, undercomplex attentions. See this issue in BertViz about attention in Llama: https://github.com/jessevig/bertviz/issues/128

openvino

17 5,864 10.0 C++

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05

BigDL

5 5,857 9.9 Python

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc.

Project mention: LLaMA Now Goes Faster on CPUs | news.ycombinator.com | 2024-03-31

Any performance benchmark against intel's 'IPEX-LLM'[0] or others?
[0] - https://github.com/intel-analytics/ipex-llm

BERTopic

22 5,519 8.2 Python

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Project mention: how can a top2vec output be improved | /r/learnmachinelearning | 2023-07-04

Try experimenting with different hyperparameters, clustering algorithms and embedding representations. Try https://github.com/MaartenGr/BERTopic/tree/master/bertopic

DALLE-pytorch

20 5,492 2.5 Python

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Project mention: The Eleuther AI Mafia | news.ycombinator.com | 2023-09-03

It all started originally on lucidrains/dalle-pytorch in the months following the release of DALL-E (1). The group started as `dalle-pytorch-replicate` but was never officially "blessed" by Phil Wang who seems to enjoy being a free agent (can't blame him).
https://github.com/lucidrains/DALLE-pytorch/issues/116 is where the discord got kicked off originally. There's a lot of other interactions between us in the github there. You should be able to find when Phil was approached by Jenia Jitsev, Jan Ebert, and Mehdi Cherti (all starting LAION members) who graciously offered the chance to replicate the DALL-E paper using their available compute at the JUWELS and JUWELS Booster HPC system. This all predates Emad's arrival. I believe he showed up around the time guided diffusion and GLIDE, but it may have been a bit earlier.
Data work originally focused on amassing several of the bigger datasets of the time. Getting CC12M downloaded and trained on was something of an early milestone (robvanvolt's work). A lot of early work was like that though, shuffling through CC12M, COCO, etc. with the dalle-pytorch codebase until we got an avocado armchair.
Christophe Schumann was an early contributor as well and great at organizing and rallying. He focused a lot on the early data scraping work for what would become the "LAION5B" dataset. I don't want to credit him with the coding and I'm ashamed to admit I can't recall who did much of the work there - but a distributed scraping program was developed (the name was something@home... not scraping@home?).
The discord link on Phil Wang's readme at dalle-pytorch got a lot of traffic and a lot of people who wanted to pitch in with the scraping effort.
Eventually a lot of people from Eleuther and many other teams mingled with us, some sort of non-profit org was created in Germany I believe for legal purposes. The dataset continued to grow and the group moved from training DALLE's to finetuning diffusion models.
The `CompVis` team were great inspiration at the time and much of their work on VQGAN and then latent diffusion models basically kept us motivated. As I mentioned a personal motivation was Katherine Crowson's work on a variety of things like CLIP-guided vqgan, diffusion, etc.
I believe Emad Mostaque showed up around the time GLIDE was coming out? I want to say he donated money for scrapers to be run on AWS to speed up data collection. I was largely hands off for much of the data scraping process and mostly enjoyed training new models on data we had.
As with any online community things got pretty ill-defined, roles changed over, volunteers came/went, etc. I would hardly call this definitive and that's at least partially the reason it's hard to trace as an outsider. That much of the early history is scattered about GitHub issues and PR's can't have helped though.

openchat

18 4,947 9.2 Python

OpenChat: Advancing Open-source Language Models with Imperfect Data (by imoneoi)

Project mention: Alternative of bard,bing, claude | /r/artificial | 2023-12-10

Depending on your use case, https://openchat.team/ might be woth looking into

courses

7 4,436 6.4 Python

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI) (by SkalskiP)

Project mention: If you are looking for free courses about AI, LLMs, CV, or NLP, I created the repository with links to resources that I found super high quality and helpful. The link is in the comment. | /r/ChatGPT | 2023-07-02

I found it: https://github.com/SkalskiP/courses

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-16.

Transformers related posts

Show HN: Open-source Google Docs for audio transcriptions (Whisper)
2 projects | news.ycombinator.com | 17 Apr 2024
AI enthusiasm #6 - Finetune any LLM you want💡
2 projects | dev.to | 16 Apr 2024
Transformers.js: Machine Learning for the Web
4 projects | news.ycombinator.com | 11 Apr 2024
AI-First Process Automation with LLMs/Action/Multimodal/Visual Language Models
1 project | news.ycombinator.com | 9 Apr 2024
WebGPT: GPT Model on the Browser with WebGPU
1 project | news.ycombinator.com | 1 Apr 2024
x-transformers
1 project | news.ycombinator.com | 31 Mar 2024
Show HN: Skyvern – open-source browser automation tool
11 projects | news.ycombinator.com | 14 Mar 2024
A note from our sponsor - WorkOS
workos.com | 20 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source Transformer projects? This list will help you:

	Project	Stars
1	nn	47,503
2	vit-pytorch	17,790
3	LLaMA-Factory	16,319
4	CVPR2024-Papers-with-Code	15,923
5	peft	13,670
6	haystack	13,564
7	RWKV-LM	11,579
8	PaddleNLP	11,386
9	ml-engineering	9,680
10	tokenizers	8,375
11	speechbrain	7,836
12	PaLM-rlhf-pytorch	7,587
13	Transformers-Tutorials	7,460
14	transformers.js	7,341
15	txtai	6,910
16	gpt-neox	6,556
17	bertviz	6,356
18	openvino	5,864
19	BigDL	5,857
20	BERTopic	5,519
21	DALLE-pytorch	5,492
22	openchat	4,947
23	courses	4,436