Python vision-and-language

Open-source Python projects categorized as vision-and-language

Top 14 Python vision-and-language Projects

vision-and-language
  1. Multimodal-GPT

    Multimodal-GPT

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. maestro

    streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL (by roboflow)

  4. prismer

    The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

  5. ONE-PEACE

    A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

  6. DoRA

    [ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation (by NVlabs)

    Project mention: FLaNK-AIM Weekly 06 May 2024 | dev.to | 2024-05-06
  7. pytorch_mgie

    A Gradio demo of MGIE

    Project mention: Guiding Instruction-Based Image Editing via Multimodal Large Language Models | news.ycombinator.com | 2024-02-13
  8. ChatGPT-OpenAI-Smart-Speaker

    This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. CLIP-Caption-Reward

    PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)

  11. VL_adapter

    PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)

  12. ALPRO

    Align and Prompt: Video-and-Language Pre-training with Entity Prompts

  13. VLDet

    [ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)

  14. multimodal

    A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal" (by cdancette)

  15. robo-vln

    Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

  16. zeroshot-storytelling

    Github repository for Zero Shot Visual Storytelling

  17. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python vision-and-language discussion

Log in or Post with

Python vision-and-language related posts

  • Need help for a colab notebook running Lavis blip2_instruct_vicuna13b?

    1 project | /r/GoogleColab | 25 Jun 2023
  • most sane web3 job listing

    2 projects | /r/ProgrammerHumor | 29 May 2023
  • I work at a non-tech company and have been asked to make software that is impossible. How do I explain it to my boss?

    2 projects | /r/cscareerquestions | 26 May 2023
  • Two-minute Daily AI Update (Date: 5/15/2023)

    1 project | /r/ChatGPT | 15 May 2023
  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

    1 project | /r/MachineLearning | 14 May 2023
  • Is there a process that is the opposite of image generation?

    1 project | /r/StableDiffusion | 31 Jan 2023
  • Blip-2: harvesting development of pretrained vision models for LLM training

    1 project | news.ycombinator.com | 31 Jan 2023
  • A note from our sponsor - CodeRabbit
    coderabbit.ai | 10 Feb 2025
    Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →

Index

What are some of the best open-source vision-and-language projects in Python? This list will help you:

# Project Stars
1 Multimodal-GPT 1,478
2 maestro 1,617
3 prismer 1,305
4 ONE-PEACE 1,006
5 DoRA 690
6 pytorch_mgie 346
7 ChatGPT-OpenAI-Smart-Speaker 270
8 CLIP-Caption-Reward 241
9 VL_adapter 205
10 ALPRO 187
11 VLDet 183
12 multimodal 79
13 robo-vln 73
14 zeroshot-storytelling 15

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?