SaaSHub helps you find the best software and product alternatives Learn more →
Top 12 Python vision-language Projects
-
GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
-
Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
-
ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
-
-
-
LViT
[IEEE Transactions on Medical Imaging/TMI 2023] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
-
-
-
ZeroGen
[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation
-
Python vision-language discussion
Python vision-language related posts
-
The vision AI checkup – take your LLM to the optometrist
-
MetaCLIP – Meta AI Research
-
A general representation modal across vision, audio, language modalities
-
[D] Object Detection Machine Learning
-
Code for RTIC Is Released
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 Jun 2026
Index
What are some of the best open-source vision-language projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | GroundingDINO | 10,063 |
| 2 | marqo | 5,022 |
| 3 | OFA | 2,557 |
| 4 | Video-ChatGPT | 1,501 |
| 5 | ONE-PEACE | 1,062 |
| 6 | TinyLLaVA_Factory | 985 |
| 7 | SEED | 642 |
| 8 | LViT | 388 |
| 9 | image-captioning | 50 |
| 10 | rtic-gcn-pytorch | 21 |
| 11 | ZeroGen | 14 |
| 12 | vision-language-examples | 10 |