SaaSHub helps you find the best software and product alternatives Learn more →
Top 12 Python vision-and-language Projects
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
-
CLIP-Caption-Reward
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
-
VL_adapter
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
multimodal
A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal" (by cdancette)
-
robo-vln
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
Project mention: Meet MultiModal-GPT: A Vision and Language Model for Multi-Round Dialogue with Humans | /r/machinelearningnews | 2023-05-19
Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25
Project mention: Guiding Instruction-Based Image Editing via Multimodal Large Language Models | news.ycombinator.com | 2024-02-13
Python vision-and-language related posts
- Need help for a colab notebook running Lavis blip2_instruct_vicuna13b?
- most sane web3 job listing
- I work at a non-tech company and have been asked to make software that is impossible. How do I explain it to my boss?
- Two-minute Daily AI Update (Date: 5/15/2023)
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
- Is there a process that is the opposite of image generation?
- Blip-2: harvesting development of pretrained vision models for LLM training
-
A note from our sponsor - SaaSHub
www.saashub.com | 27 Apr 2024
Index
What are some of the best open-source vision-and-language projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Multimodal-GPT | 1,401 |
2 | prismer | 1,287 |
3 | Oscar | 1,027 |
4 | ONE-PEACE | 838 |
5 | pytorch_mgie | 320 |
6 | CLIP-Caption-Reward | 220 |
7 | VL_adapter | 193 |
8 | ALPRO | 180 |
9 | VLDet | 169 |
10 | multimodal | 70 |
11 | robo-vln | 61 |
12 | zeroshot-storytelling | 15 |
Sponsored