SaaSHub helps you find the best software and product alternatives Learn more →
Top 16 vision-and-language Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
conceptual-12m
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
-
CLIP-Caption-Reward
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
-
VL_adapter
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
-
DallEval
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)
-
multimodal
A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal" (by cdancette)
-
robo-vln
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Meet MultiModal-GPT: A Vision and Language Model for Multi-Round Dialogue with Humans | /r/machinelearningnews | 2023-05-19
Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25
Project mention: CVPR 2024 Survival Guide: Five Vision-Language Papers You Don’t Want to Miss | dev.to | 2024-04-15GitHub
Project mention: Guiding Instruction-Based Image Editing via Multimodal Large Language Models | news.ycombinator.com | 2024-02-13
vision-and-language related posts
-
[D] Why is most Open Source AI happening outside the USA?
-
Need help for a colab notebook running Lavis blip2_instruct_vicuna13b?
-
most sane web3 job listing
-
I work at a non-tech company and have been asked to make software that is impossible. How do I explain it to my boss?
-
Two-minute Daily AI Update (Date: 5/15/2023)
-
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
-
Is there a process that is the opposite of image generation?
-
A note from our sponsor - SaaSHub
www.saashub.com | 1 May 2024
Index
What are some of the best open-source vision-and-language projects? This list will help you:
Project | Stars | |
---|---|---|
1 | LAVIS | 8,738 |
2 | Multimodal-GPT | 1,407 |
3 | prismer | 1,288 |
4 | Oscar | 1,027 |
5 | ONE-PEACE | 847 |
6 | AlphaCLIP | 498 |
7 | pytorch_mgie | 320 |
8 | conceptual-12m | 305 |
9 | CLIP-Caption-Reward | 225 |
10 | VL_adapter | 193 |
11 | ALPRO | 180 |
12 | VLDet | 169 |
13 | DallEval | 133 |
14 | multimodal | 70 |
15 | robo-vln | 61 |
16 | zeroshot-storytelling | 15 |
Sponsored