Recommended open LLMs with image input modality?

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Awesome-Multimodal-Large-Language-Models

2 8,991 9.7

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation this is pretty comprehensive. tldr; blip is probably the best, though i've heard it does need a lot of vram. In my experience its the most responsive to prompt engineering.

unilm

40 18,358 9.0 Python

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

It is missing kosmos-2. I remember its image captioning was(demo currently down) really good and it's almost as fast as llava and lavin.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
instructblip-pipeline

1 27 4.1 Python

A multimodal inference pipeline that integrates InstructBLIP with textgen-webui for Vicuna and related models.

I've been using it in oobabooga. There's a repo for the extension here: https://github.com/kjerk/instructblip-pipeline/tree/main

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf]

1 project | news.ycombinator.com | 21 Mar 2024
The Era of 1-Bit LLMs: Training Tips, Code and FAQ

1 project | news.ycombinator.com | 20 Mar 2024
The Era of 1-bit LLMs: ternary parameters for cost-effective computing

6 projects | news.ycombinator.com | 28 Feb 2024
I'm an Old Fart and AI Makes Me Sad

2 projects | news.ycombinator.com | 16 Feb 2024
On building a semantic search engine

3 projects | news.ycombinator.com | 6 Jan 2024

Recommended open LLMs with image input modality?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
NLP instruction-tuning pre-trained-model instruction-following unilm
Post date: 8 Jul 2023

Awesome-Multimodal-Large-Language-Models

unilm

InfluxDB

instructblip-pipeline

Related posts

The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf]

The Era of 1-Bit LLMs: Training Tips, Code and FAQ

The Era of 1-bit LLMs: ternary parameters for cost-effective computing

I'm an Old Fart and AI Makes Me Sad

On building a semantic search engine

Recommended open LLMs with image input modality?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA NLP instruction-tuning pre-trained-model instruction-following unilm Post date: 8 Jul 2023

Awesome-Multimodal-Large-Language-Models

unilm

InfluxDB

instructblip-pipeline

Related posts

The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf]

The Era of 1-Bit LLMs: Training Tips, Code and FAQ

The Era of 1-bit LLMs: ternary parameters for cost-effective computing

I'm an Old Fart and AI Makes Me Sad

On building a semantic search engine

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
NLP instruction-tuning pre-trained-model instruction-following unilm
Post date: 8 Jul 2023