Are there any multimodal AI models I can use to provide a paired text *and* image input, to then generate an expanded descriptive text output? [D]

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

LLaMA-Adapter

2 5,535 8.1 Python

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Try LLaMA_Adapter

open_flamingo

4 3,493 6.8 Python

An open-source framework for training large multimodal models.

Maybe the recent OpenFlamingo gives you better results (they have a demo on HF).

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

[D] Multi modal for visual qna based on a given image. Need suggestions.

1 project | /r/MachineLearning | 2 May 2023
Open Flamingo: An open-source framework for training large multimodal models

1 project | news.ycombinator.com | 30 Mar 2023
Database of 16,000 Artists Used to Train Midjourney AI Goes Viral

1 project | news.ycombinator.com | 7 Jan 2024
Is Nicholas Renotte a good guide for a person who knows nothing about ML?

1 project | /r/learnmachinelearning | 27 Jun 2023
Generate Image from Vector Embedding

1 project | /r/StableDiffusion | 6 Jun 2023

Are there any multimodal AI models I can use to provide a paired text and image input, to then generate an expanded descriptive text output? [D]

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Computer Vision Deep Learning in-context-learning language-model multimodal-learning
Post date: 5 Jul 2023

LLaMA-Adapter

open_flamingo

InfluxDB

Related posts

[D] Multi modal for visual qna based on a given image. Need suggestions.

Open Flamingo: An open-source framework for training large multimodal models

Database of 16,000 Artists Used to Train Midjourney AI Goes Viral

Is Nicholas Renotte a good guide for a person who knows nothing about ML?

Generate Image from Vector Embedding