Advice on Masters project | Vision transformers

This page summarizes the projects mentioned and recommended in the original post on /r/MLQuestions

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Swin-Transformer-Object-Detection

    This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

  • Hi, So my project is to do with object detection on trash in the wild on this fairly obscure dataset: http://tacodataset.org/ and I was thinking of applying vision transformers to it for feature extraction. I was thinking of taking the YOLOX implementation and swapping out the backbone with swin transformers and perform bunch of comparisons/experiments for the write up. Sort of like how they applied swin transformers to mask R-CNN here but I am struggling to understand where to begin.

  • TACO

    🌮 Trash Annotations in Context Dataset Toolkit (by pedropro)

  • Hi, So my project is to do with object detection on trash in the wild on this fairly obscure dataset: http://tacodataset.org/ and I was thinking of applying vision transformers to it for feature extraction. I was thinking of taking the YOLOX implementation and swapping out the backbone with swin transformers and perform bunch of comparisons/experiments for the write up. Sort of like how they applied swin transformers to mask R-CNN here but I am struggling to understand where to begin.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • YOLOX

    YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

  • From what I understand the swin transformer outputs a single dimension feature vector and the yolo head takes inputs from 3 different layers from the backbone?? and I think I will need to write the backbone implementation here.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts