Top 16 vision-and-language Open-Source Projects

LAVIS

18 8,738 6.3 Jupyter Notebook

LAVIS - A One-stop Library for Language-Vision Intelligence

Project mention: FLaNK AI for 11 March 2024 | dev.to | 2024-03-11

Multimodal-GPT

4 1,407 5.4 Python

Multimodal-GPT

Project mention: Meet MultiModal-GPT: A Vision and Language Model for Multi-Round Dialogue with Humans | /r/machinelearningnews | 2023-05-19

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
prismer

5 1,288 5.2 Python

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Oscar

1 1,027 4.0 Python

Oscar and VinVL
ONE-PEACE

2 847 8.6 Python

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25

AlphaCLIP

1 498 8.6 Jupyter Notebook

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Project mention: CVPR 2024 Survival Guide: Five Vision-Language Papers You Don’t Want to Miss | dev.to | 2024-04-15

GitHub

pytorch_mgie

1 320 2.6 Python

A Gradio demo of MGIE

Project mention: Guiding Instruction-Based Image Editing via Multimodal Large Language Models | news.ycombinator.com | 2024-02-13

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
conceptual-12m

1 305 0.0

Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
CLIP-Caption-Reward

2 225 0.0 Python

PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
VL_adapter

1 193 0.0 Python

PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
ALPRO

1 180 0.0 Python

Align and Prompt: Video-and-Language Pre-training with Entity Prompts
VLDet

1 169 3.1 Python

[ICLR 2023] PyTorch implementation of VLDet （https://arxiv.org/abs/2211.14843）
DallEval

1 133 3.6 Jupyter Notebook

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)
multimodal

1 70 0.0 Python

A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal" (by cdancette)
robo-vln

2 61 2.9 Python

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
zeroshot-storytelling

1 15 0.0 Python

Github repository for Zero Shot Visual Storytelling
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

vision-and-language related posts

[D] Why is most Open Source AI happening outside the USA?

2 projects | /r/MachineLearning | 6 Dec 2023
Need help for a colab notebook running Lavis blip2_instruct_vicuna13b?

1 project | /r/GoogleColab | 25 Jun 2023
most sane web3 job listing

2 projects | /r/ProgrammerHumor | 29 May 2023
I work at a non-tech company and have been asked to make software that is impossible. How do I explain it to my boss?

2 projects | /r/cscareerquestions | 26 May 2023
Two-minute Daily AI Update (Date: 5/15/2023)

1 project | /r/ChatGPT | 15 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

1 project | /r/MachineLearning | 14 May 2023
Is there a process that is the opposite of image generation?

1 project | /r/StableDiffusion | 31 Jan 2023
A note from our sponsor - SaaSHub
www.saashub.com | 1 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source vision-and-language projects? This list will help you:

	Project	Stars
1	LAVIS	8,738
2	Multimodal-GPT	1,407
3	prismer	1,288
4	Oscar	1,027
5	ONE-PEACE	847
6	AlphaCLIP	498
7	pytorch_mgie	320
8	conceptual-12m	305
9	CLIP-Caption-Reward	225
10	VL_adapter	193
11	ALPRO	180
12	VLDet	169
13	DallEval	133
14	multimodal	70
15	robo-vln	61
16	zeroshot-storytelling	15