dino
YOLOX
dino | YOLOX | |
---|---|---|
7 | 12 | |
5,881 | 9,030 | |
1.4% | 1.0% | |
1.0 | 1.0 | |
25 days ago | 4 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dino
- Batch-wise processing or image-by-image processing? (DINO V1)
-
[P] Image search with localization and open-vocabulary reranking.
I also implemented one based on the self attention maps from the DINO trained ViT’s. This worked pretty well when the attention maps were combined with some traditional computer vision to get bounding boxes. It seemed an ok compromise between domain specialization and location specificity. I did not try any saliency or gradient based methods as i was not sure on generalization and speed respectively. I know LAVIS has an implementation of grad cam and it seems to work well in the plug'n'play vqa.
-
Unsupervised semantic segmentation
You will probably need an unwieldy amount of data and compute to reproduce it, so your best option would be to use the pretrained models available on github.
-
[D] Why Transformers are taking over the Compute Vision world: Self-Supervised Vision Transformers with DINO explained in 7 minutes!
[Full Explanation Post] [Arxiv] [Project Page]
-
A major part of real-world AI has to be solved to make unsupervised, generalized full self-driving work, as the entire road system is designed for biological neural nets with optical imagers
Except he is actually talking about the new DINO model created by facebook that was released on friday. Which is a new approach to image transformers for unsupervised segmentation. Here's its github.
-
[D] Paper Explained - DINO: Emerging Properties in Self-Supervised Vision Transformers (Full Video Analysis)
Code: https://github.com/facebookresearch/dino
- [R] DINO and PAWS: Advancing the state of the art in computer vision with self-supervised Transformers
YOLOX
-
Learning Exchange, lets training YoloX
So I am trying to do my best and train YOLOX for an object detection case using Google Colab.
-
Understanding heatmaps
https://github.com/Megvii-BaseDetection/YOLOX I have only tried the pretrained yolo X nano. I get corner responses even if the inference image is padded with a large margin which is unexpected
-
Open discussion and useful links people trying to do Object Detection
* Nice implemention of Yolo that is BSD license (not GPL) https://github.com/Megvii-BaseDetection/YOLOX
-
[P] Image search with localization and open-vocabulary reranking.
I wanted to have a few choices getting localization into image search (index and search time). I immediately thought of using a region proposal network (rpn) from mask-rcnn to create patches that can also be indexed and searched (and add the localisation). I figured it might be somewhat agnostic to classes. I did not want to use mmdetection or detectron2 due to their dependencies and just getting the rpn was not worth it. I was encouraged by the PyTorch native implementations of detection/segmentation models but ended up finding yolox the best.
-
DeepSort with PyTorch(support yolo series)
Megvii-BaseDetection/YOLOX
- [D][P] YOLOv6: state-of-the-art object detection at 1242 FPS
-
Looking for help for hire
Modern video can be broken into a series of still problems. AI vision models can make these types of classification in as fast as video. Here is a particularly there is a controversial company from China that does this very well on faces in video and they have open sourced the models: https://github.com/Megvii-BaseDetection/YOLOX
-
High-tech
Not really a problem, see results here. Just use yolox_x. Thank you for your attention.
-
Advice on Masters project | Vision transformers
From what I understand the swin transformer outputs a single dimension feature vector and the yolo head takes inputs from 3 different layers from the backbone?? and I think I will need to write the backbone implementation here.
- Is YOLOX object detector NMS free?
What are some alternatives?
simsiam-cifar10 - Code to train the SimSiam model on cifar10 using PyTorch
tensorflow-yolov4-tflite - YOLOv4, YOLOv4-tiny, YOLOv3, YOLOv3-tiny Implemented in Tensorflow 2.0, Android. Convert YOLO v4 .weights tensorflow, tensorrt and tflite
Transformer-SSL - This is an official implementation for "Self-Supervised Learning with Swin Transformers".
Swin-Transformer-Object-Detection - This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.
pytorch-metric-learning - The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
tensorrt_demos - TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet
pytorch-lightning - Build high-performance AI models with PyTorch Lightning (organized PyTorch). Deploy models with Lightning Apps (organized Python to build end-to-end ML systems). [Moved to: https://github.com/Lightning-AI/lightning]
PINTO_model_zoo - A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8), EdgeTPU, CoreML.
unsupervised-depth-completion-visual-inertial-odometry - Tensorflow and PyTorch implementation of Unsupervised Depth Completion from Visual Inertial Odometry (in RA-L January 2020 & ICRA 2020)
yolov5 - YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
lightly - A python library for self-supervised learning on images.
RPi_64-bit_Zero-2-image - Raspberry Pi Zero 2 W 64-bit OS image with OpenCV, TensorFlow Lite and ncnn Framework.