Swin-Transformer-Object-Detection
Video-Swin-Transformer
Our great sponsors
Swin-Transformer-Object-Detection | Video-Swin-Transformer | |
---|---|---|
4 | 7 | |
1,710 | 1,309 | |
0.7% | 0.5% | |
0.0 | 0.0 | |
about 1 year ago | about 1 year ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Swin-Transformer-Object-Detection
-
Transfer Learning on Swin Transformer as a backbone for instance segmentation using MRCNN
I'm currently trying to transfer learn a set of custom classes of fish, for instance segmentation. I have found the official implementation of Swin Transformer as a backbone for instance segmentation using MRCNN: https://github.com/SwinTransformer/Swin-Transformer-Object-Detection.
-
Advice on Masters project | Vision transformers
Hi, So my project is to do with object detection on trash in the wild on this fairly obscure dataset: http://tacodataset.org/ and I was thinking of applying vision transformers to it for feature extraction. I was thinking of taking the YOLOX implementation and swapping out the backbone with swin transformers and perform bunch of comparisons/experiments for the write up. Sort of like how they applied swin transformers to mask R-CNN here but I am struggling to understand where to begin.
-
[P] I implemented DeepMind's "Perceiver" in PyTorch
Yes, have a look at this paper.
-
[P] Code and pretrained models for Swin Transformer are released (SOTA models on COCO and ADE20K)
Object detection on COCO: https://github.com/SwinTransformer/Swin-Transformer-Object-Detection
Video-Swin-Transformer
- Explanation needed
- Explanation needed [P]
- Explanation needed [R]
-
Weekly Entering & Transitioning Thread | 20 Feb 2022 - 27 Feb 2022
PROBLEM STATEMENT Develop an efficient common strategy and relevant implementation to extract the video-based models in the black box and grey box setting across the following 2 problem statements. 1.Action Classification Model Extraction for Swin-T Model for Action Classification on Kinetics-400 dataset. Download the model from here- https://github.com/SwinTransformer/Video-Swin-Transformer 2.Video Classification Model Extraction for MoViNet-A2-Base Model for Video Classification on Kinetics- 600 dataset Download the model from here- https://tfhub.dev/tensorflow/movinet/a2/base/kinetics-600/classification/3 Blackbox Setting Do not use any relevant data set available and use synthetic or generated data without using the Kinetics series dataset. Also, do not use the same model architecture as the original model to train the extracted model. Greybox Setting You can use 5% of original data (balanced representation of classes) as a starting point to generate the attack dataset. Also, do not use the same model architecture as the original model to train the extracted model. Can someone explain the problem statement in a easy / understandable way ?? What I think is the models have already been provided and we have to do something in Blackbox and greybox . Can someone explain in brief what we have to do in the blackbox / greybox??
-
Action recognition models for images
There are two main variants for the SWIN transformer the original SWIN transformer, official implementation here, and the Video SWIN transformer, official implementation here. Both architectures are very similar with the differences being mainly in the size of the input. The SWIN transformer pretrained on imagenet can be used as the backbone for different applications either image or video-based. In fact, the authors pretrained the original SWIN transformer on imagenet then they modified the input size and then fine-tuned it on video action recognition datasets. In your case, you can use the original SWIN transformer pretrained on imagenet then fine-tune it on your own dataset without modifying anything about the input size, since it is designed to process images.
-
[R] New Study Proposes CW Networks: Greater Expressive Power Than GNNs
The code is available on project GitHub. The paper Video Swin Transformer is on arXiv.
- [R] Video Swin Transformer: SOTA on Video Recognition (84.9% top 1 on Kinetics-400 and 69.6% top 1 on Something-Something V2)
What are some alternatives?
Mask_RCNN - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Swin-Transformer-Tensorflow - Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)
YOLOX - YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
MoViNet-pytorch - MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;
Swin-Transformer-Semantic-Segmentation - This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.
Swin-Transformer - This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Perceiver - Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow
data - Data and code behind the articles and graphics at FiveThirtyEight
Swin-Transformer-Serve - Deploy Swin Transformer using TorchServe
PaddleClas - A treasure chest for visual classification and recognition powered by PaddlePaddle