Are there any multimodal AI models I can use to provide a paired text *and* image input, to then generate an expanded descriptive text output? [D]

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • LLaMA-Adapter

    [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

  • Try LLaMA_Adapter

  • open_flamingo

    An open-source framework for training large multimodal models.

  • Maybe the recent OpenFlamingo gives you better results (they have a demo on HF).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • [D] Multi modal for visual qna based on a given image. Need suggestions.

    1 project | /r/MachineLearning | 2 May 2023
  • Open Flamingo: An open-source framework for training large multimodal models

    1 project | news.ycombinator.com | 30 Mar 2023
  • Database of 16,000 Artists Used to Train Midjourney AI Goes Viral

    1 project | news.ycombinator.com | 7 Jan 2024
  • Is Nicholas Renotte a good guide for a person who knows nothing about ML?

    1 project | /r/learnmachinelearning | 27 Jun 2023
  • Generate Image from Vector Embedding

    1 project | /r/StableDiffusion | 6 Jun 2023