Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python Transformer Projects
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
-
PaddleSeg
Easy-to-use image segmentation library with awesome pre-trained model zoo, supporting wide-range of practical tasks in Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, 3D Segmentation, etc.
-
LMFlow
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Most of this tutorial is based on Hugging Face course about Transformers and on Niels Rogge's Transformers tutorials: make sure to check their work and give them a star on GitHub, if you please ❤️
The easiest is to use vllm (https://github.com/vllm-project/vllm) to run it on a Couple of A100's, and you can benchmark this using this library (https://github.com/EleutherAI/lm-evaluation-harness)
https://github.com/BlinkDL/RWKV-LM#rwkv-discord-httpsdiscord... lists a number of implementations of various versions of RWKV.
https://github.com/BlinkDL/RWKV-LM#rwkv-parallelizable-rnn-w... :
> RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V)
> RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.
> So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).
> "Our latest version is RWKV-6,*
PaddlePaddle/PaddleSpeech
So how long until we can do an open source Mistral Large?
We could make a start on Petals or some other open source distributed training network cluster possibly?
For our real-time STT needs, we'll employ a fantastic library called faster-whisper.
and the implementation https://github.com/google/trax/blob/master/trax/models/resea... if you are interested.
Hope you get to look into this!
Project mention: Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat | news.ycombinator.com | 2024-04-12I wanted to write that TGI inference engine is not Open Source anymore, but they have reverted the license back to Apache 2.0 for the new version TGI v2.0: https://github.com/huggingface/text-generation-inference/rel...
Good news!
openai/jukebox: Music Generation
Project mention: StreamingLLM: tiny tweak to KV LRU improves long conversations | news.ycombinator.com | 2024-02-13This seems only to work cause large GPTs have redundant, undercomplex attentions. See this issue in BertViz about attention in Llama: https://github.com/jessevig/bertviz/issues/128
The easiest is to use vllm (https://github.com/vllm-project/vllm) to run it on a Couple of A100's, and you can benchmark this using this library (https://github.com/EleutherAI/lm-evaluation-harness)
Project mention: [DISC] - The angel who came to pick me up is a Gal (Oneshot by Shiraishi Kouhei) | /r/manga | 2023-09-06OCR works pretty good. ocr.space, ocr.best and cotrans.touhou.ai/ are all pretty nice.
Python Transformer related posts
- Mistral AI Launches New 8x22B Moe Model
- LLMs on your local Computer (Part 1)
- Voxos.ai – An Open-Source Desktop Voice Assistant
- RAG Using Structured Data: Overview and Important Questions
- I made an Educational Transformer from scratch
- How can I make a better tokenizer?
- Detexify LaTeX Handwriting Symbol Recognition
-
A note from our sponsor - InfluxDB
www.influxdata.com | 19 Apr 2024
Index
What are some of the best open-source Transformer projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 124,557 |
2 | mmdetection | 27,658 |
3 | vllm | 17,656 |
4 | best-of-ml-python | 15,302 |
5 | RWKV-LM | 11,579 |
6 | LaTeX-OCR | 10,711 |
7 | PaddleSpeech | 10,069 |
8 | petals | 8,631 |
9 | faster-whisper | 8,578 |
10 | PaddleSeg | 8,227 |
11 | LMFlow | 7,975 |
12 | trax | 7,948 |
13 | text-generation-inference | 7,722 |
14 | jukebox | 7,554 |
15 | mmsegmentation | 7,342 |
16 | GPT2-Chinese | 7,342 |
17 | bertviz | 6,356 |
18 | BERT-pytorch | 5,979 |
19 | Informer2020 | 4,890 |
20 | lm-evaluation-harness | 4,848 |
21 | OpenPrompt | 4,141 |
22 | manga-image-translator | 4,127 |
23 | SwinIR | 4,060 |