maxvit
vision-transformer-from-scratch
maxvit | vision-transformer-from-scratch | |
---|---|---|
1 | 1 | |
421 | 85 | |
1.9% | - | |
0.0 | 4.9 | |
11 months ago | 10 months ago | |
Jupyter Notebook | Jupyter Notebook | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
maxvit
-
GOOGLE new computer vision multi-axis approach improves high level tasks, such as object detection, as well as motion deblurring, denoising, deraining
Today we present a new multi-axis approach that is simple and effective, improves on the original ViT and MLP models, can better adapt to high-resolution, dense prediction tasks, and can naturally adapt to different input sizes with high flexibility and low complexity. Based on this approach, we have built two backbone models for high-level and low-level vision tasks. We describe the first in “MaxViT: Multi-Axis Vision Transformer”, to be presented in ECCV 2022, and show it significantly improves the state of the art for high-level tasks, such as image classification, object detection, segmentation, quality assessment, and generation. The second, presented in “MAXIM: Multi-Axis MLP for Image Processing” at CVPR 2022, is based on a UNet-like architecture and achieves competitive performance on low-level imaging tasks including denoising, deblurring, dehazing, deraining, and low-light enhancement. To facilitate further research on efficient Transformer and MLP models, we have open-sourced the code and models for both MaxViT and MAXIM.
vision-transformer-from-scratch
-
[P] Implementing Vision Transformer (ViT) from Scratch using PyTorch
Github: https://github.com/tintn/vision-transformer-from-scratch
What are some alternatives?
maxim - [CVPR 2022 Oral] Official repository for "MAXIM: Multi-Axis MLP for Image Processing". SOTA for denoising, deblurring, deraining, dehazing, and enhancement.
continual-pretraining-nlp-vision - Code to reproduce experiments from the paper "Continual Pre-Training Mitigates Forgetting in Language and Vision" https://arxiv.org/abs/2205.09357
vision_transformer_tf - This repository contains the TensorFlow implementation of the paper "AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE" known as vision transformers.
super-gradients - Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
Azure-Computer-Vision-in-a-day-workshop - Azure Computer Vision 4 (March 2023 - Florence) workshop in a day
Transformers-Tutorials - This repository contains demos I made with the Transformers library by HuggingFace.
astrophotography_stack_align - Align sequence of star field / astro images taken with a stationary camera (stationary relative to all those stars light years away).
glami-1m - The largest multilingual image-text classification dataset. It contains fashion products.
optc-box-exporter - Export your One Piece Treasure Cruise Box with just using Screenshots
HugsVision - HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision
liga-pytorch - Let Data Dance with PyTorch Models