The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 Python vision-transformer Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
-
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
-
how-do-vits-work
(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
-
ImageNet21K
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
-
DAT
Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention (by LeapLabTHU)
-
swin2sr
Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration. Advances in Image Manipulation (AIM) workshop ECCV 2022. Try it out! over 3.3M runs https://replicate.com/mv-lab/swin2sr
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
All in One Computer Vision https://github.com/alibaba/EasyCV
Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25
Project mention: Show HN: I just open sourced my document/website extractor for Vision-LLMs | news.ycombinator.com | 2024-04-02
I really recommend the usage of scene text recognition models. They are perfect for these type of usecases: https://github.com/baudm/parseq or check https://paperswithcode.com/task/scene-text-recognition make sure to check the licenses and good luck 👍🏻
Python vision-transformer related posts
- Show HN: I just open sourced my document/website extractor for Vision-LLMs
- [Demo] Watch Videos with ChatGPT
- [D] Off-the-shelf image saliency scoring models?
- Scratch Implementation of Vision Transformer in PyTorch
- [R] InternVideo: General Video Foundation Models via Generative and Discriminative Learning
- [P] Can I do better than this? [Image near-duplicate and similarities clustering]
- [N] First-Ever Course on Transformers: NOW PUBLIC
-
A note from our sponsor - WorkOS
workos.com | 26 Apr 2024
Index
What are some of the best open-source vision-transformer projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | mmdetection | 27,742 |
2 | LaTeX-OCR | 10,770 |
3 | SwinIR | 4,060 |
4 | Efficient-AI-Backbones | 3,783 |
5 | mmpretrain | 3,156 |
6 | scenic | 2,995 |
7 | towhee | 2,989 |
8 | EVA | 1,957 |
9 | EasyCV | 1,679 |
10 | VRT | 1,244 |
11 | VoxFormer | 961 |
12 | InternVideo | 909 |
13 | ONE-PEACE | 838 |
14 | how-do-vits-work | 784 |
15 | vit-explain | 708 |
16 | ImageNet21K | 695 |
17 | DAT | 693 |
18 | swin2sr | 526 |
19 | thepipe | 506 |
20 | parseq | 496 |
21 | GCVit | 414 |
22 | MPViT | 340 |
23 | CrossViT | 299 |
Sponsored