SaaSHub helps you find the best software and product alternatives Learn more →
Top 10 Python vision-language-model Projects
-
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
-
Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
-
Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Show HN: I Remade the Fake Google Gemini Demo, Except Using GPT-4 and It's Real | news.ycombinator.com | 2023-12-10Update: For anyone else facing the commercial use question on LLaVA - it is licensed under Apache 2.0. Can be used commercially with attribution: https://github.com/haotian-liu/LLaVA/blob/main/LICENSE
Project mention: Show HN: Multimodal Maestro – Prompt tools for use with LMMs | news.ycombinator.com | 2023-11-29
Project mention: [R] Tiny LVLM-eHub: Early Multimodal Experiments with Bard - OpenGVLab, Shanghai AI Laboratory 2023 - Encourages innovative strategies aimed at advancing multimodal techniques! | /r/MachineLearning | 2023-08-13Github: https://github.com/OpenGVLab/Multi-Modality-Arena
Project mention: Embed arbitrary modalities (images, audio, documents, etc.) into LLMs | news.ycombinator.com | 2023-12-18
I have conducted experiments and examples on accelerating ViT (Vision Transformer) using methods such as TensorRT, FasterTransformer, and xFormers. The experiments were conducted using a single A100 as a baseline. - https://github.com/bnabis93/vision-language-examples/tree/main/acceleration
Python vision-language-model related posts
-
Implementation for Mini-Gemini
-
Mini-Gemini: Mining the Potential of Multi-Modality Vision Language Models
-
LLM for object detection/labelling
-
VPGTrans/VPGTrans: Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
-
[D] Tracking Dancing People
-
Meet Prismer: An Open Source Vision-Language Model with An Ensemble of Experts
-
Prismer: A Vision-Language Model with Multi-Modal Experts
-
A note from our sponsor - SaaSHub
www.saashub.com | 9 May 2024
Index
What are some of the best open-source vision-language-model projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | LLaVA | 16,517 |
2 | Qwen-VL | 3,787 |
3 | MGM | 2,954 |
4 | DeepSeek-VL | 1,533 |
5 | prismer | 1,288 |
6 | multimodal-maestro | 951 |
7 | Multi-Modality-Arena | 367 |
8 | VPGTrans | 265 |
9 | multi_token | 137 |
10 | vision-language-examples | 7 |
Sponsored