Python vision-and-language

Open-source Python projects categorized as vision-and-language

Top 12 Python vision-and-language Projects

  • Multimodal-GPT

    Multimodal-GPT

  • Project mention: Meet MultiModal-GPT: A Vision and Language Model for Multi-Round Dialogue with Humans | /r/machinelearningnews | 2023-05-19
  • prismer

    The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Oscar

    Oscar and VinVL

  • ONE-PEACE

    A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

  • Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25
  • pytorch_mgie

    A Gradio demo of MGIE

  • Project mention: Guiding Instruction-Based Image Editing via Multimodal Large Language Models | news.ycombinator.com | 2024-02-13
  • CLIP-Caption-Reward

    PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)

  • VL_adapter

    PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ALPRO

    Align and Prompt: Video-and-Language Pre-training with Entity Prompts

  • VLDet

    [ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)

  • multimodal

    A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal" (by cdancette)

  • robo-vln

    Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

  • zeroshot-storytelling

    Github repository for Zero Shot Visual Storytelling

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python vision-and-language related posts

Index

What are some of the best open-source vision-and-language projects in Python? This list will help you:

Project Stars
1 Multimodal-GPT 1,401
2 prismer 1,287
3 Oscar 1,027
4 ONE-PEACE 838
5 pytorch_mgie 320
6 CLIP-Caption-Reward 220
7 VL_adapter 193
8 ALPRO 180
9 VLDet 169
10 multimodal 70
11 robo-vln 61
12 zeroshot-storytelling 15

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com