[R] end-to-end image captioning

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • a-PyTorch-Tutorial-to-Image-Captioning

    Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

  • I have found this repository: https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning that, seemingly, requires only images and captions, but this is quite old (3 years ago), and is based on LSTMs. I was hoping there are transformers-based implementations that I could use.

  • meshed-memory-transformer

    Meshed-Memory Transformer for Image Captioning. CVPR 2020

  • I could use some up-to-date models (e.g, this one: https://github.com/aimagelab/meshed-memory-transformer), but all those I looked into require pre-processing step of features/bounding-boxes generation. The problem is that I can't use an off-the shelf bounding-box extraction model as it would not perform well on the dataset I have (images are not like COCO at all). So I was wondering if there is a relatively up-to-date architecture that I can use that will not require this processing step. That is, an implementation that requires only inputs (images) and outputs (sentences).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • clip-glass

    Repository for "Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search"

  • CLIP-GLaSS

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts