Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →
Top 14 Python vision-and-language Projects
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
maestro
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL (by roboflow)
-
-
ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
-
DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation (by NVlabs)
-
Project mention: Guiding Instruction-Based Image Editing via Multimodal Large Language Models | news.ycombinator.com | 2024-02-13
-
ChatGPT-OpenAI-Smart-Speaker
This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
CLIP-Caption-Reward
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
-
VL_adapter
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
-
-
-
multimodal
A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal" (by cdancette)
-
robo-vln
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python vision-and-language discussion
Python vision-and-language related posts
-
Need help for a colab notebook running Lavis blip2_instruct_vicuna13b?
-
most sane web3 job listing
-
I work at a non-tech company and have been asked to make software that is impossible. How do I explain it to my boss?
-
Two-minute Daily AI Update (Date: 5/15/2023)
-
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
-
Is there a process that is the opposite of image generation?
-
Blip-2: harvesting development of pretrained vision models for LLM training
-
A note from our sponsor - CodeRabbit
coderabbit.ai | 10 Feb 2025
Index
What are some of the best open-source vision-and-language projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | Multimodal-GPT | 1,478 |
2 | maestro | 1,617 |
3 | prismer | 1,305 |
4 | ONE-PEACE | 1,006 |
5 | DoRA | 690 |
6 | pytorch_mgie | 346 |
7 | ChatGPT-OpenAI-Smart-Speaker | 270 |
8 | CLIP-Caption-Reward | 241 |
9 | VL_adapter | 205 |
10 | ALPRO | 187 |
11 | VLDet | 183 |
12 | multimodal | 79 |
13 | robo-vln | 73 |
14 | zeroshot-storytelling | 15 |