Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more โ
Top 10 vision-language-model Open-Source Projects
-
Qwen-VL
The official repo of Qwen-VL (้ไนๅ้ฎ-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. ๐ฅ
-
Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Show HN: Multimodal Maestro โ Prompt tools for use with LMMs | news.ycombinator.com | 2023-11-29
Project mention: CVPR 2024 Survival Guide: Five Vision-Language Papers You Donโt Want to Miss | dev.to | 2024-04-15GitHub
Project mention: [R] Tiny LVLM-eHub: Early Multimodal Experiments with Bard - OpenGVLab, Shanghai AI Laboratory 2023 - Encourages innovative strategies aimed at advancing multimodal techniques! | /r/MachineLearning | 2023-08-13Github: https://github.com/OpenGVLab/Multi-Modality-Arena
Project mention: Embed arbitrary modalities (images, audio, documents, etc.) into LLMs | news.ycombinator.com | 2023-12-18
I have conducted experiments and examples on accelerating ViT (Vision Transformer) using methods such as TensorRT, FasterTransformer, and xFormers. The experiments were conducted using a single A100 as a baseline. - https://github.com/bnabis93/vision-language-examples/tree/main/acceleration
vision-language-model related posts
-
Implementation for Mini-Gemini
-
Mini-Gemini: Mining the Potential of Multi-Modality Vision Language Models
-
LLM for object detection/labelling
-
VPGTrans/VPGTrans: Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
-
[D] Tracking Dancing People
-
Meet Prismer: An Open Source Vision-Language Model with An Ensemble of Experts
-
Prismer: A Vision-Language Model with Multi-Modal Experts
-
A note from our sponsor - InfluxDB
www.influxdata.com | 15 May 2024
Index
What are some of the best open-source vision-language-model projects? This list will help you:
Project | Stars | |
---|---|---|
1 | Qwen-VL | 3,787 |
2 | DeepSeek-VL | 1,631 |
3 | prismer | 1,286 |
4 | multimodal-maestro | 951 |
5 | MGM | 2,991 |
6 | AlphaCLIP | 509 |
7 | Multi-Modality-Arena | 374 |
8 | VPGTrans | 264 |
9 | multi_token | 139 |
10 | vision-language-examples | 7 |
Sponsored