Top 10 Python vision-language-model Projects

LLaVA

20 16,517 9.3 Python

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Project mention: Show HN: I Remade the Fake Google Gemini Demo, Except Using GPT-4 and It's Real | news.ycombinator.com | 2023-12-10

Update: For anyone else facing the commercial use question on LLaVA - it is licensed under Apache 2.0. Can be used commercially with attribution: https://github.com/haotian-liu/LLaVA/blob/main/LICENSE

Qwen-VL

4 3,787 8.7 Python

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Project mention: LLM for object detection/labelling | /r/LocalLLaMA | 2023-11-28

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
MGM

2 2,954 7.7 Python

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Project mention: Implementation for Mini-Gemini | news.ycombinator.com | 2024-04-17

DeepSeek-VL

1 1,533 6.1 Python

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Project mention: FLaNK AI Weekly 18 March 2024 | dev.to | 2024-03-18

prismer

5 1,288 5.2 Python

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
multimodal-maestro

1 951 8.6 Python

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Project mention: Show HN: Multimodal Maestro – Prompt tools for use with LMMs | news.ycombinator.com | 2023-11-29

Multi-Modality-Arena

1 367 7.7 Python

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Project mention: [R] Tiny LVLM-eHub: Early Multimodal Experiments with Bard - OpenGVLab, Shanghai AI Laboratory 2023 - Encourages innovative strategies aimed at advancing multimodal techniques! | /r/MachineLearning | 2023-08-13

Github: https://github.com/OpenGVLab/Multi-Modality-Arena

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
VPGTrans

3 265 6.6 Python

Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
multi_token

1 137 8.5 Python

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Project mention: Embed arbitrary modalities (images, audio, documents, etc.) into LLMs | news.ycombinator.com | 2023-12-18

vision-language-examples

1 7 6.4 Python

Vision-lanugage model example code.

Project mention: [D] how to accelerate ViT models more faster | /r/MachineLearning | 2023-07-13

I have conducted experiments and examples on accelerating ViT (Vision Transformer) using methods such as TensorRT, FasterTransformer, and xFormers. The experiments were conducted using a single A100 as a baseline. - https://github.com/bnabis93/vision-language-examples/tree/main/acceleration

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python vision-language-model related posts

Implementation for Mini-Gemini

1 project | news.ycombinator.com | 17 Apr 2024
Mini-Gemini: Mining the Potential of Multi-Modality Vision Language Models

2 projects | news.ycombinator.com | 31 Mar 2024
LLM for object detection/labelling

1 project | /r/LocalLLaMA | 28 Nov 2023
VPGTrans/VPGTrans: Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

1 project | /r/AITechTips | 2 May 2023
[D] Tracking Dancing People

1 project | /r/MachineLearning | 12 Mar 2023
Meet Prismer: An Open Source Vision-Language Model with An Ensemble of Experts

1 project | /r/machinelearningnews | 11 Mar 2023
Prismer: A Vision-Language Model with Multi-Modal Experts

1 project | news.ycombinator.com | 9 Mar 2023
A note from our sponsor - SaaSHub
www.saashub.com | 9 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source vision-language-model projects in Python? This list will help you:

	Project	Stars
1	LLaVA	16,517
2	Qwen-VL	3,787
3	MGM	2,954
4	DeepSeek-VL	1,533
5	prismer	1,288
6	multimodal-maestro	951
7	Multi-Modality-Arena	367
8	VPGTrans	265
9	multi_token	137
10	vision-language-examples	7

Python vision-language-model

Top 10 Python vision-language-model Projects

Python vision-language-model related posts

Implementation for Mini-Gemini

Mini-Gemini: Mining the Potential of Multi-Modality Vision Language Models

LLM for object detection/labelling

VPGTrans/VPGTrans: Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

[D] Tracking Dancing People

Meet Prismer: An Open Source Vision-Language Model with An Ensemble of Experts

Prismer: A Vision-Language Model with Multi-Modal Experts

Index