Python vision-and-language

Open-source Python projects categorized as vision-and-language

Top 14 Python vision-and-language Projects

vision-and-language
  1. maestro

    streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL (by roboflow)

  2. Civic Auth

    Simple auth for Python backends. Drop Civic Auth into your Python backend with just a few lines of code. Email login, SSO, and route protection built-in. Minimal config. Works with FastAPI, Flask, or Django.

    Civic Auth logo
  3. Multimodal-GPT

    Multimodal-GPT

  4. prismer

    The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

  5. ONE-PEACE

    A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

  6. DoRA

    [ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation (by NVlabs)

  7. top-cvpr-2023-papers

    This repository is a curated collection of the most exciting and influential CVPR 2023 papers. 🔥 [Paper + Code]

  8. pytorch_mgie

    A Gradio demo of MGIE

  9. Sevalla

    Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!

    Sevalla logo
  10. ChatGPT-OpenAI-Smart-Speaker

    This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.

  11. CLIP-Caption-Reward

    PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)

  12. VL_adapter

    PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)

  13. VLDet

    [ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)

  14. multimodal

    A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal" (by cdancette)

  15. robo-vln

    Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

  16. zeroshot-storytelling

    Github repository for Zero Shot Visual Storytelling

  17. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python vision-and-language discussion

Log in or Post with

Python vision-and-language related posts

  • Need help for a colab notebook running Lavis blip2_instruct_vicuna13b?

    1 project | /r/GoogleColab | 25 Jun 2023
  • most sane web3 job listing

    2 projects | /r/ProgrammerHumor | 29 May 2023
  • I work at a non-tech company and have been asked to make software that is impossible. How do I explain it to my boss?

    2 projects | /r/cscareerquestions | 26 May 2023
  • Two-minute Daily AI Update (Date: 5/15/2023)

    1 project | /r/ChatGPT | 15 May 2023
  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

    1 project | /r/MachineLearning | 14 May 2023
  • Is there a process that is the opposite of image generation?

    1 project | /r/StableDiffusion | 31 Jan 2023
  • Blip-2: harvesting development of pretrained vision models for LLM training

    1 project | news.ycombinator.com | 31 Jan 2023
  • A note from our sponsor - Civic Auth
    www.civic.com | 31 Aug 2025
    Drop Civic Auth into your Python backend with just a few lines of code. Email login, SSO, and route protection built-in. Minimal config. Works with FastAPI, Flask, or Django. Learn more →

Index

What are some of the best open-source vision-and-language projects in Python? This list will help you:

# Project Stars
1 maestro 2,630
2 Multimodal-GPT 1,509
3 prismer 1,308
4 ONE-PEACE 1,050
5 DoRA 834
6 top-cvpr-2023-papers 655
7 pytorch_mgie 346
8 ChatGPT-OpenAI-Smart-Speaker 298
9 CLIP-Caption-Reward 245
10 VL_adapter 205
11 VLDet 187
12 multimodal 82
13 robo-vln 79
14 zeroshot-storytelling 15

Sponsored
Simple auth for Python backends
Drop Civic Auth into your Python backend with just a few lines of code. Email login, SSO, and route protection built-in. Minimal config. Works with FastAPI, Flask, or Django.
www.civic.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?