Top 10 vision-language-model Open-Source Projects

Qwen-VL

4 3,787 8.7 Python

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Project mention: LLM for object detection/labelling | /r/LocalLLaMA | 2023-11-28

DeepSeek-VL

1 1,631 6.1 Python

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Project mention: FLaNK AI Weekly 18 March 2024 | dev.to | 2024-03-18

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
prismer

5 1,286 5.2 Python

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
multimodal-maestro

1 951 8.6 Python

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Project mention: Show HN: Multimodal Maestro – Prompt tools for use with LMMs | news.ycombinator.com | 2023-11-29

MGM

2 2,991 7.7 Python

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Project mention: Implementation for Mini-Gemini | news.ycombinator.com | 2024-04-17

AlphaCLIP

1 509 8.6 Jupyter Notebook

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Project mention: CVPR 2024 Survival Guide: Five Vision-Language Papers You Don’t Want to Miss | dev.to | 2024-04-15

GitHub

Multi-Modality-Arena

1 374 7.7 Python

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Project mention: [R] Tiny LVLM-eHub: Early Multimodal Experiments with Bard - OpenGVLab, Shanghai AI Laboratory 2023 - Encourages innovative strategies aimed at advancing multimodal techniques! | /r/MachineLearning | 2023-08-13

Github: https://github.com/OpenGVLab/Multi-Modality-Arena

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
VPGTrans

3 264 6.6 Python

Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
multi_token

1 139 8.5 Python

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Project mention: Embed arbitrary modalities (images, audio, documents, etc.) into LLMs | news.ycombinator.com | 2023-12-18

vision-language-examples

1 7 6.4 Python

Vision-lanugage model example code.

Project mention: [D] how to accelerate ViT models more faster | /r/MachineLearning | 2023-07-13

I have conducted experiments and examples on accelerating ViT (Vision Transformer) using methods such as TensorRT, FasterTransformer, and xFormers. The experiments were conducted using a single A100 as a baseline. - https://github.com/bnabis93/vision-language-examples/tree/main/acceleration

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

vision-language-model related posts

Implementation for Mini-Gemini

1 project | news.ycombinator.com | 17 Apr 2024
Mini-Gemini: Mining the Potential of Multi-Modality Vision Language Models

2 projects | news.ycombinator.com | 31 Mar 2024
LLM for object detection/labelling

1 project | /r/LocalLLaMA | 28 Nov 2023
VPGTrans/VPGTrans: Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

1 project | /r/AITechTips | 2 May 2023
[D] Tracking Dancing People

1 project | /r/MachineLearning | 12 Mar 2023
Meet Prismer: An Open Source Vision-Language Model with An Ensemble of Experts

1 project | /r/machinelearningnews | 11 Mar 2023
Prismer: A Vision-Language Model with Multi-Modal Experts

1 project | news.ycombinator.com | 9 Mar 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 15 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source vision-language-model projects? This list will help you:

	Project	Stars
1	Qwen-VL	3,787
2	DeepSeek-VL	1,631
3	prismer	1,286
4	multimodal-maestro	951
5	MGM	2,991
6	AlphaCLIP	509
7	Multi-Modality-Arena	374
8	VPGTrans	264
9	multi_token	139
10	vision-language-examples	7