Top 23 attention-mechanism Open-Source Projects

vit-pytorch

11 17,790 7.3 Python

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Project mention: Is it easier to go from Pytorch to TF and Keras than the other way around? | /r/pytorch | 2023-05-13

I also need to learn Pyspark so right now I am going to download the Fashion Mnist dataset, use Pyspark to downsize each image and put the into separate folders according to their labels (just to show employers I can do some basic ETL with Pyspark, not sure how I am going to load for training in Pytorch yet though). Then I am going to write the simplest Le Net to try to categorize the fashion MNIST dataset (results will most likely be bad but it's okay). Next, try to learn transfer learning in Pytorch for both CNN or maybe skip ahead to ViT. Ideally at this point I want to study the Attention mechanism a bit more and try to implement Simple Vit which I saw here: https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/simple_vit.py

RWKV-LM

84 11,579 8.8 Python

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Project mention: Do LLMs need a context window? | news.ycombinator.com | 2023-12-25

https://github.com/BlinkDL/RWKV-LM#rwkv-discord-httpsdiscord... lists a number of implementations of various versions of RWKV.
https://github.com/BlinkDL/RWKV-LM#rwkv-parallelizable-rnn-w... :
> RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V)
> RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.
> So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).
> "Our latest version is RWKV-6,*

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
DALLE-pytorch

20 5,487 2.5 Python

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Project mention: The Eleuther AI Mafia | news.ycombinator.com | 2023-09-03

It all started originally on lucidrains/dalle-pytorch in the months following the release of DALL-E (1). The group started as `dalle-pytorch-replicate` but was never officially "blessed" by Phil Wang who seems to enjoy being a free agent (can't blame him).
https://github.com/lucidrains/DALLE-pytorch/issues/116 is where the discord got kicked off originally. There's a lot of other interactions between us in the github there. You should be able to find when Phil was approached by Jenia Jitsev, Jan Ebert, and Mehdi Cherti (all starting LAION members) who graciously offered the chance to replicate the DALL-E paper using their available compute at the JUWELS and JUWELS Booster HPC system. This all predates Emad's arrival. I believe he showed up around the time guided diffusion and GLIDE, but it may have been a bit earlier.
Data work originally focused on amassing several of the bigger datasets of the time. Getting CC12M downloaded and trained on was something of an early milestone (robvanvolt's work). A lot of early work was like that though, shuffling through CC12M, COCO, etc. with the dalle-pytorch codebase until we got an avocado armchair.
Christophe Schumann was an early contributor as well and great at organizing and rallying. He focused a lot on the early data scraping work for what would become the "LAION5B" dataset. I don't want to credit him with the coding and I'm ashamed to admit I can't recall who did much of the work there - but a distributed scraping program was developed (the name was something@home... not scraping@home?).
The discord link on Phil Wang's readme at dalle-pytorch got a lot of traffic and a lot of people who wanted to pitch in with the scraping effort.
Eventually a lot of people from Eleuther and many other teams mingled with us, some sort of non-profit org was created in Germany I believe for legal purposes. The dataset continued to grow and the group moved from training DALLE's to finetuning diffusion models.
The `CompVis` team were great inspiration at the time and much of their work on VQGAN and then latent diffusion models basically kept us motivated. As I mentioned a personal motivation was Katherine Crowson's work on a variety of things like CLIP-guided vqgan, diffusion, etc.
I believe Emad Mostaque showed up around the time GLIDE was coming out? I want to say he donated money for scrapers to be run on AWS to speed up data collection. I was largely hands off for much of the data scraping process and mostly enjoyed training new models on data we had.
As with any online community things got pretty ill-defined, roles changed over, volunteers came/went, etc. I would hardly call this definitive and that's at least partially the reason it's hard to trace as an outsider. That much of the early history is scattered about GitHub issues and PR's can't have helped though.

awesome-graph-classification

1 4,698 1.0 Python

A collection of important graph embedding, classification and representation learning papers with implementations.
Awesome-Transformer-Attention

4 4,213 9.8

An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
x-transformers

10 4,109 8.9 Python

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Project mention: x-transformers | news.ycombinator.com | 2024-03-31

GAT

2 3,045 0.0 Python

Graph Attention Networks (https://arxiv.org/abs/1710.10903)
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
awesome-speech-recognition-speech-synthesis-papers

0 2,865 3.5

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
a-PyTorch-Tutorial-to-Image-Captioning

1 2,591 0.0 Python

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
pytorch-GAT

14 2,222 0.0 Jupyter Notebook

My implementation of the original GAT paper (Veličković et al.). I've additionally included the playground.py file for visualizing the Cora dataset, GAT embeddings, an attention mechanism, and entropy histograms. I've supported both Cora (transductive) and PPI (inductive) examples!
reformer-pytorch

2 2,050 1.8 Python

Reformer, the efficient Transformer, in Pytorch
whisper-timestamped

2 1,481 8.3 Python

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28

Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped

alphafold2

1 1,458 0.0 Python

To eventually become an unofficial Pytorch implementation / replication of Alphafold2, as details of the architecture get released
flamingo-pytorch

3 1,134 0.0 Python

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
soundstorm-pytorch

1 1,110 7.7 Python

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

Project mention: Meta introduces Voicebox: state-of-the-art generative AI model for speech | news.ycombinator.com | 2023-06-19

got a response here https://github.com/lucidrains/soundstorm-pytorch/discussions...

perceiver-pytorch

11 1,046 3.1 Python

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
performer-pytorch

2 1,029 1.8 Python

An implementation of Performer, a linear attention-based transformer, in Pytorch
awesome-transformer-nlp

1 1,023 6.3

A curated list of NLP resources focused on Transformer networks, attention mechanism, GPT, BERT, ChatGPT, LLMs, and transfer learning.
CoCa-pytorch

1 972 6.2 Python

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
RETRO-pytorch

2 824 2.8 Python

Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
PaLM-pytorch

3 802 0.0 Python

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways
tab-transformer-pytorch

1 696 4.5 Python

Implementation of TabTransformer, attention network for tabular data, in Pytorch
TimeSformer-pytorch

1 658 0.0 Python

Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-31.

attention-mechanism related posts

x-transformers
1 project | news.ycombinator.com | 31 Mar 2024
Do LLMs need a context window?
1 project | news.ycombinator.com | 25 Dec 2023
Metal-flash-attention: Faster alternative to Metal Performance Shaders
1 project | news.ycombinator.com | 25 Dec 2023
Paving the way to efficient architectures: StripedHyena-7B
1 project | news.ycombinator.com | 8 Dec 2023
Understanding Deep Learning
1 project | news.ycombinator.com | 26 Nov 2023
Q-Transformer: Scalable Reinforcement Learning via Autoregressive Q-Functions
3 projects | news.ycombinator.com | 19 Sep 2023
Large Language Models for Compiler Optimization
3 projects | news.ycombinator.com | 17 Sep 2023
A note from our sponsor - SaaSHub
www.saashub.com | 19 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source attention-mechanism projects? This list will help you:

	Project	Stars
1	vit-pytorch	17,790
2	RWKV-LM	11,579
3	DALLE-pytorch	5,487
4	awesome-graph-classification	4,698
5	Awesome-Transformer-Attention	4,213
6	x-transformers	4,109
7	GAT	3,045
8	awesome-speech-recognition-speech-synthesis-papers	2,865
9	a-PyTorch-Tutorial-to-Image-Captioning	2,591
10	pytorch-GAT	2,222
11	reformer-pytorch	2,050
12	whisper-timestamped	1,481
13	alphafold2	1,458
14	flamingo-pytorch	1,134
15	soundstorm-pytorch	1,110
16	perceiver-pytorch	1,046
17	performer-pytorch	1,029
18	awesome-transformer-nlp	1,023
19	CoCa-pytorch	972
20	RETRO-pytorch	824
21	PaLM-pytorch	802
22	tab-transformer-pytorch	696
23	TimeSformer-pytorch	658