Drop Civic Auth into your Python backend with just a few lines of code. Email login, SSO, and route protection built-in. Minimal config. Works with FastAPI, Flask, or Django. Learn more →
Top 14 Python vision-and-language Projects
-
maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL (by roboflow)
-
Civic Auth
Simple auth for Python backends. Drop Civic Auth into your Python backend with just a few lines of code. Email login, SSO, and route protection built-in. Minimal config. Works with FastAPI, Flask, or Django.
-
-
-
ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
-
DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation (by NVlabs)
-
top-cvpr-2023-papers
This repository is a curated collection of the most exciting and influential CVPR 2023 papers. 🔥 [Paper + Code]
-
-
Sevalla
Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
-
ChatGPT-OpenAI-Smart-Speaker
This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.
-
CLIP-Caption-Reward
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
-
VL_adapter
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
-
-
multimodal
A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal" (by cdancette)
-
robo-vln
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
Python vision-and-language discussion
Python vision-and-language related posts
-
Need help for a colab notebook running Lavis blip2_instruct_vicuna13b?
-
most sane web3 job listing
-
I work at a non-tech company and have been asked to make software that is impossible. How do I explain it to my boss?
-
Two-minute Daily AI Update (Date: 5/15/2023)
-
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
-
Is there a process that is the opposite of image generation?
-
Blip-2: harvesting development of pretrained vision models for LLM training
-
A note from our sponsor - Civic Auth
www.civic.com | 31 Aug 2025
Index
What are some of the best open-source vision-and-language projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | maestro | 2,630 |
2 | Multimodal-GPT | 1,509 |
3 | prismer | 1,308 |
4 | ONE-PEACE | 1,050 |
5 | DoRA | 834 |
6 | top-cvpr-2023-papers | 655 |
7 | pytorch_mgie | 346 |
8 | ChatGPT-OpenAI-Smart-Speaker | 298 |
9 | CLIP-Caption-Reward | 245 |
10 | VL_adapter | 205 |
11 | VLDet | 187 |
12 | multimodal | 82 |
13 | robo-vln | 79 |
14 | zeroshot-storytelling | 15 |