InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python attention-mechanism Projects
-
vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
RWKV-LM
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.
Project mention: Ask HN: Is anybody building an alternative transformer? | news.ycombinator.com | 2025-02-14You can see all the development directly from them: https://github.com/BlinkDL/RWKV-LM
Last week version 7 was released and every time they make significant improvements.
-
DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
-
x-transformers
A concise but complete full-attention transformer with a set of promising experimental features from various papers
-
swarms
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
Worth noting there is an interesting multi-agent open source project named Swarms. When I saw this on X earlier I thought maybe the team had joined OpenAI but there's no connection between these projects
> "Swarms: The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework"
[0] https://github.com/kyegomez/swarms
[1] https://docs.swarms.world/en/latest/
-
awesome-graph-classification
A collection of important graph embedding, classification and representation learning papers with implementations.
-
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
a-PyTorch-Tutorial-to-Image-Captioning
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
-
alphafold2
To eventually become an unofficial Pytorch implementation / replication of Alphafold2, as details of the architecture get released
-
soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
-
flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
-
CoCa-pytorch
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
-
perceiver-pytorch
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
-
-
tab-transformer-pytorch
Implementation of TabTransformer, attention network for tabular data, in Pytorch
-
-
PaLM-pytorch
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways
-
TimeSformer-pytorch
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
-
memorizing-transformers-pytorch
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch
-
TokenFormer
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Project mention: Tokenformer: Rethinking transformer scaling with tokenized model parameters | news.ycombinator.com | 2024-10-31Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on a fixed number of parameters within linear projections. When architectural modifications (e.g., channel dimensions) are introduced, the entire model typically requires retraining from scratch. As model sizes continue growing, this strategy results in increasingly high computational costs and becomes unsustainable. To overcome this problem, we introduce Tokenformer, a natively scalable architecture that leverages the attention mechanism not only for computations among input tokens but also for interactions between tokens and model parameters, thereby enhancing architectural flexibility. By treating model parameters as tokens, we replace all the linear projections in Transformers with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values. This reformulation allows for progressive and efficient scaling without necessitating retraining from scratch. Our model scales from 124M to 1.4B parameters by incrementally adding new key-value parameter pairs, achieving performance comparable to Transformers trained from scratch while greatly reducing training costs. Code and models are available at https://github.com/Haiyang-W/TokenFormer
-
nuwa-pytorch
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python attention-mechanism discussion
Python attention-mechanism related posts
-
Ask HN: Is anybody building an alternative transformer?
-
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision
-
HMT: Hierarchical Memory Transformer for Long Context Language Processing
-
What can LLMs never do?
-
x-transformers
-
Do LLMs need a context window?
-
Paving the way to efficient architectures: StripedHyena-7B
-
A note from our sponsor - InfluxDB
www.influxdata.com | 14 Jul 2025
Index
What are some of the best open-source attention-mechanism projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | vit-pytorch | 23,351 |
2 | RWKV-LM | 13,770 |
3 | DALLE-pytorch | 5,615 |
4 | x-transformers | 5,440 |
5 | swarms | 4,991 |
6 | awesome-graph-classification | 4,790 |
7 | GAT | 3,388 |
8 | a-PyTorch-Tutorial-to-Image-Captioning | 2,835 |
9 | whisper-timestamped | 2,498 |
10 | reformer-pytorch | 2,170 |
11 | alphafold2 | 1,598 |
12 | soundstorm-pytorch | 1,521 |
13 | flamingo-pytorch | 1,249 |
14 | CoCa-pytorch | 1,144 |
15 | perceiver-pytorch | 1,142 |
16 | performer-pytorch | 1,126 |
17 | tab-transformer-pytorch | 944 |
18 | RETRO-pytorch | 863 |
19 | PaLM-pytorch | 820 |
20 | TimeSformer-pytorch | 716 |
21 | memorizing-transformers-pytorch | 634 |
22 | TokenFormer | 563 |
23 | nuwa-pytorch | 550 |