Top 12 Python vision-and-language Projects

Multimodal-GPT

4 1,401 5.4 Python

Multimodal-GPT

Project mention: Meet MultiModal-GPT: A Vision and Language Model for Multi-Round Dialogue with Humans | /r/machinelearningnews | 2023-05-19

prismer

5 1,287 5.2 Python

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Oscar

1 1,027 4.0 Python

Oscar and VinVL
ONE-PEACE

2 838 8.6 Python

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25

pytorch_mgie

1 320 2.6 Python

A Gradio demo of MGIE

Project mention: Guiding Instruction-Based Image Editing via Multimodal Large Language Models | news.ycombinator.com | 2024-02-13

CLIP-Caption-Reward

2 220 0.0 Python

PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
VL_adapter

1 193 0.0 Python

PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ALPRO

1 180 0.0 Python

Align and Prompt: Video-and-Language Pre-training with Entity Prompts
VLDet

1 169 3.1 Python

[ICLR 2023] PyTorch implementation of VLDet （https://arxiv.org/abs/2211.14843）
multimodal

1 70 0.0 Python

A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal" (by cdancette)
robo-vln

2 61 2.9 Python

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
zeroshot-storytelling

1 15 0.0 Python

Github repository for Zero Shot Visual Storytelling

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python vision-and-language related posts

Need help for a colab notebook running Lavis blip2_instruct_vicuna13b?
1 project | /r/GoogleColab | 25 Jun 2023
most sane web3 job listing
2 projects | /r/ProgrammerHumor | 29 May 2023
I work at a non-tech company and have been asked to make software that is impossible. How do I explain it to my boss?
2 projects | /r/cscareerquestions | 26 May 2023
Two-minute Daily AI Update (Date: 5/15/2023)
1 project | /r/ChatGPT | 15 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
1 project | /r/MachineLearning | 14 May 2023
Is there a process that is the opposite of image generation?
1 project | /r/StableDiffusion | 31 Jan 2023
Blip-2: harvesting development of pretrained vision models for LLM training
1 project | news.ycombinator.com | 31 Jan 2023
A note from our sponsor - SaaSHub
www.saashub.com | 27 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source vision-and-language projects in Python? This list will help you:

	Project	Stars
1	Multimodal-GPT	1,401
2	prismer	1,287
3	Oscar	1,027
4	ONE-PEACE	838
5	pytorch_mgie	320
6	CLIP-Caption-Reward	220
7	VL_adapter	193
8	ALPRO	180
9	VLDet	169
10	multimodal	70
11	robo-vln	61
12	zeroshot-storytelling	15