[R] end-to-end image captioning

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

a-PyTorch-Tutorial-to-Image-Captioning

1 2,657 0.0 Python

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

I have found this repository: https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning that, seemingly, requires only images and captions, but this is quite old (3 years ago), and is based on LSTMs. I was hoping there are transformers-based implementations that I could use.

meshed-memory-transformer

2 497 0.0 Python

Meshed-Memory Transformer for Image Captioning. CVPR 2020

I could use some up-to-date models (e.g, this one: https://github.com/aimagelab/meshed-memory-transformer), but all those I looked into require pre-processing step of features/bounding-boxes generation. The problem is that I can't use an off-the shelf bounding-box extraction model as it would not perform well on the dataset I have (images are not like COCO at all). So I was wondering if there is a relatively up-to-date architecture that I can use that will not require this processing step. That is, an implementation that requires only inputs (images) and outputs (sentences).

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
clip-glass

13 177 0.0 Python

Repository for "Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search"

CLIP-GLaSS

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project