Top 23 Python vision-transformer Projects

mmdetection

23 27,742 8.7 Python

OpenMMLab Detection Toolbox and Benchmark
LaTeX-OCR

21 10,770 3.6 Python

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Project mention: Detexify LaTeX Handwriting Symbol Recognition | news.ycombinator.com | 2023-11-14

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
SwinIR

27 4,060 0.0 Python

SwinIR: Image Restoration Using Swin Transformer (official repository)
Efficient-AI-Backbones

3 3,783 4.4 Python

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
mmpretrain

2 3,156 7.8 Python

OpenMMLab Pre-training Toolbox and Benchmark
scenic

5 2,995 8.6 Python

Scenic: A Jax Library for Computer Vision Research and Beyond (by google-research)
towhee

26 2,989 8.6 Python

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Project mention: FLaNK Stack Weekly for 14 Aug 2023 | dev.to | 2023-08-14

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
EVA

2 1,957 6.2 Python

EVA Series: Visual Representation Fantasies from BAAI (by baaivision)
EasyCV

2 1,679 6.2 Python

An all-in-one toolkit for computer vision

Project mention: FLaNK Stack Weekly for 20 June 2023 | dev.to | 2023-06-20

All in One Computer Vision https://github.com/alibaba/EasyCV

VRT

1 1,244 0.0 Python

VRT: A Video Restoration Transformer (official repository)
VoxFormer

2 961 6.9 Python

Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]
InternVideo

3 909 8.0 Python

Video Foundation Models & Data for Multimodal Understanding
ONE-PEACE

2 838 8.6 Python

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25

how-do-vits-work

3 784 0.0 Python

(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
vit-explain

2 708 0.0 Python

Explainability for Vision Transformers
ImageNet21K

1 695 10.0 Python

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
DAT

1 693 3.7 Python

Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention (by LeapLabTHU)
swin2sr

2 526 2.6 Python

Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration. Advances in Image Manipulation (AIM) workshop ECCV 2022. Try it out! over 3.3M runs https://replicate.com/mv-lab/swin2sr
thepipe

2 506 9.0 Python

Feed PDFs, docs, slides, web pages and more into GPT-4-Vision in one line of code ⚡

Project mention: Show HN: I just open sourced my document/website extractor for Vision-LLMs | news.ycombinator.com | 2024-04-02

parseq

1 496 6.7 Python

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

Project mention: need help for license plate number segmentation | /r/deeplearning | 2023-05-31

I really recommend the usage of scene text recognition models. They are perfect for these type of usecases: https://github.com/baudm/parseq or check https://paperswithcode.com/task/scene-text-recognition make sure to check the licenses and good luck 👍🏻

GCVit

1 414 7.0 Python

[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers
MPViT

1 340 1.8 Python

[CVPR 2022] MPViT:Multi-Path Vision Transformer for Dense Prediction
CrossViT

1 299 0.0 Python

Official implementation of CrossViT. https://arxiv.org/abs/2103.14899
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python vision-transformer related posts

Show HN: I just open sourced my document/website extractor for Vision-LLMs
2 projects | news.ycombinator.com | 2 Apr 2024
[Demo] Watch Videos with ChatGPT
7 projects | /r/ChatGPT | 19 Apr 2023
[D] Off-the-shelf image saliency scoring models?
2 projects | /r/MachineLearning | 17 Apr 2023
Scratch Implementation of Vision Transformer in PyTorch
2 projects | /r/computervision | 13 Apr 2023
[R] InternVideo: General Video Foundation Models via Generative and Discriminative Learning
1 project | /r/MachineLearning | 10 Apr 2023
[P] Can I do better than this? [Image near-duplicate and similarities clustering]
1 project | /r/MachineLearning | 27 Mar 2023
[N] First-Ever Course on Transformers: NOW PUBLIC
2 projects | /r/MachineLearning | 9 Jul 2022
A note from our sponsor - WorkOS
workos.com | 26 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source vision-transformer projects in Python? This list will help you:

	Project	Stars
1	mmdetection	27,742
2	LaTeX-OCR	10,770
3	SwinIR	4,060
4	Efficient-AI-Backbones	3,783
5	mmpretrain	3,156
6	scenic	2,995
7	towhee	2,989
8	EVA	1,957
9	EasyCV	1,679
10	VRT	1,244
11	VoxFormer	961
12	InternVideo	909
13	ONE-PEACE	838
14	how-do-vits-work	784
15	vit-explain	708
16	ImageNet21K	695
17	DAT	693
18	swin2sr	526
19	thepipe	506
20	parseq	496
21	GCVit	414
22	MPViT	340
23	CrossViT	299