vision-language-model

Open-source projects categorized as vision-language-model

Top 10 vision-language-model Open-Source Projects

  • Qwen-VL

    The official repo of Qwen-VL (้€šไน‰ๅƒ้—ฎ-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

  • Project mention: LLM for object detection/labelling | /r/LocalLLaMA | 2023-11-28
  • DeepSeek-VL

    DeepSeek-VL: Towards Real-World Vision-Language Understanding

  • Project mention: FLaNK AI Weekly 18 March 2024 | dev.to | 2024-03-18
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • prismer

    The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

  • multimodal-maestro

    Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. ๐Ÿ”ฅ

  • Project mention: Show HN: Multimodal Maestro โ€“ Prompt tools for use with LMMs | news.ycombinator.com | 2023-11-29
  • MGM

    Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

  • Project mention: Implementation for Mini-Gemini | news.ycombinator.com | 2024-04-17
  • AlphaCLIP

    [CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

  • Project mention: CVPR 2024 Survival Guide: Five Vision-Language Papers You Donโ€™t Want to Miss | dev.to | 2024-04-15

    GitHub

  • Multi-Modality-Arena

    Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

  • Project mention: [R] Tiny LVLM-eHub: Early Multimodal Experiments with Bard - OpenGVLab, Shanghai AI Laboratory 2023 - Encourages innovative strategies aimed at advancing multimodal techniques! | /r/MachineLearning | 2023-08-13

    Github: https://github.com/OpenGVLab/Multi-Modality-Arena

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • VPGTrans

    Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

  • multi_token

    Embed arbitrary modalities (images, audio, documents, etc) into large language models.

  • Project mention: Embed arbitrary modalities (images, audio, documents, etc.) into LLMs | news.ycombinator.com | 2023-12-18
  • vision-language-examples

    Vision-lanugage model example code.

  • Project mention: [D] how to accelerate ViT models more faster | /r/MachineLearning | 2023-07-13

    I have conducted experiments and examples on accelerating ViT (Vision Transformer) using methods such as TensorRT, FasterTransformer, and xFormers. The experiments were conducted using a single A100 as a baseline. - https://github.com/bnabis93/vision-language-examples/tree/main/acceleration

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

vision-language-model related posts

  • Implementation for Mini-Gemini

    1 project | news.ycombinator.com | 17 Apr 2024
  • Mini-Gemini: Mining the Potential of Multi-Modality Vision Language Models

    2 projects | news.ycombinator.com | 31 Mar 2024
  • LLM for object detection/labelling

    1 project | /r/LocalLLaMA | 28 Nov 2023
  • VPGTrans/VPGTrans: Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

    1 project | /r/AITechTips | 2 May 2023
  • [D] Tracking Dancing People

    1 project | /r/MachineLearning | 12 Mar 2023
  • Meet Prismer: An Open Source Vision-Language Model with An Ensemble of Experts

    1 project | /r/machinelearningnews | 11 Mar 2023
  • Prismer: A Vision-Language Model with Multi-Modal Experts

    1 project | news.ycombinator.com | 9 Mar 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 15 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more โ†’

Index

What are some of the best open-source vision-language-model projects? This list will help you:

Project Stars
1 Qwen-VL 3,787
2 DeepSeek-VL 1,631
3 prismer 1,286
4 multimodal-maestro 951
5 MGM 2,991
6 AlphaCLIP 509
7 Multi-Modality-Arena 374
8 VPGTrans 264
9 multi_token 139
10 vision-language-examples 7

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com