Python vision-language-model

Open-source Python projects categorized as vision-language-model

Top 10 Python vision-language-model Projects

  • LLaVA

    [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

  • Project mention: Show HN: I Remade the Fake Google Gemini Demo, Except Using GPT-4 and It's Real | news.ycombinator.com | 2023-12-10

    Update: For anyone else facing the commercial use question on LLaVA - it is licensed under Apache 2.0. Can be used commercially with attribution: https://github.com/haotian-liu/LLaVA/blob/main/LICENSE

  • Qwen-VL

    The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

  • Project mention: LLM for object detection/labelling | /r/LocalLLaMA | 2023-11-28
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • MGM

    Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

  • Project mention: Implementation for Mini-Gemini | news.ycombinator.com | 2024-04-17
  • DeepSeek-VL

    DeepSeek-VL: Towards Real-World Vision-Language Understanding

  • Project mention: FLaNK AI Weekly 18 March 2024 | dev.to | 2024-03-18
  • prismer

    The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

  • multimodal-maestro

    Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

  • Project mention: Show HN: Multimodal Maestro – Prompt tools for use with LMMs | news.ycombinator.com | 2023-11-29
  • Multi-Modality-Arena

    Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

  • Project mention: [R] Tiny LVLM-eHub: Early Multimodal Experiments with Bard - OpenGVLab, Shanghai AI Laboratory 2023 - Encourages innovative strategies aimed at advancing multimodal techniques! | /r/MachineLearning | 2023-08-13

    Github: https://github.com/OpenGVLab/Multi-Modality-Arena

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • VPGTrans

    Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

  • multi_token

    Embed arbitrary modalities (images, audio, documents, etc) into large language models.

  • Project mention: Embed arbitrary modalities (images, audio, documents, etc.) into LLMs | news.ycombinator.com | 2023-12-18
  • vision-language-examples

    Vision-lanugage model example code.

  • Project mention: [D] how to accelerate ViT models more faster | /r/MachineLearning | 2023-07-13

    I have conducted experiments and examples on accelerating ViT (Vision Transformer) using methods such as TensorRT, FasterTransformer, and xFormers. The experiments were conducted using a single A100 as a baseline. - https://github.com/bnabis93/vision-language-examples/tree/main/acceleration

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python vision-language-model related posts

Index

What are some of the best open-source vision-language-model projects in Python? This list will help you:

Project Stars
1 LLaVA 16,517
2 Qwen-VL 3,787
3 MGM 2,954
4 DeepSeek-VL 1,533
5 prismer 1,288
6 multimodal-maestro 951
7 Multi-Modality-Arena 367
8 VPGTrans 265
9 multi_token 137
10 vision-language-examples 7

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com