SaaSHub helps you find the best software and product alternatives Learn more →
Top 18 Python multi-modal Projects
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
CogVLM is very good in my (brief) testing: https://github.com/THUDM/CogVLM
The model weights seem to be under a non-commercial license, not true open source, but it is "open access" as you requested.
-
DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
-
Project mention: AI Search That Understands the Way Your Customer's Think | news.ycombinator.com | 2024-05-28
-
Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
-
-
Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
-
-
-
LISA
-
GPTDiscord
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
-
MotionGPT
[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
-
-
transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Project mention: Transfusion: Predict the Next Token and Diffuse Images with One Multimodal Model | news.ycombinator.com | 2024-09-10Doesn't appear to be any weights uploaded anywhere that I can find.
There are the starts of two (non-original-author) public implementations available on Github, but again -- doesn't appear to be any pretrained weights in either.
* https://github.com/lucidrains/transfusion-pytorch
* https://github.com/VachanVY/Transfusion.torch
-
-
Project mention: Zetascale, Build high-performance AI models with modular building blocks | news.ycombinator.com | 2024-02-09
-
-
vlm-api
REST API for computing cross-modal similarity between images and text using the ColPaLI vision-language model
Project mention: Show HN: Documind – Open-source AI tool to turn documents into structured data | news.ycombinator.com | 2024-11-18VLMs are cool - they generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. Here's an open-source API demo I built if you want to try it out: https://github.com/DataFog/vlm-api
Python multi-modal discussion
Python multi-modal related posts
-
Show HN: I built an open source AI video search engine to learn more about AI
-
CogAgent-18B – visual-based GUI Agent capabilities
-
What do you think. When should we expect the next SDXL version?
-
shinning the spotlight on CogVLM
-
Gemini: Google's most capable AI model yet
-
Open-source LLMs with Image Interpretation
-
FLaNK Stack Weekly for 27 November 2023
-
A note from our sponsor - SaaSHub
www.saashub.com | 3 Dec 2024
Index
What are some of the best open-source multi-modal projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | modelscope | 7,057 |
2 | CogVLM | 6,147 |
3 | DALLE-pytorch | 5,578 |
4 | marqo | 4,662 |
5 | Chinese-CLIP | 4,611 |
6 | DeepKE | 3,591 |
7 | Video-LLaVA | 3,024 |
8 | docarray | 2,988 |
9 | CogVLM2 | 2,139 |
10 | LISA | 1,895 |
11 | GPTDiscord | 1,826 |
12 | MotionGPT | 1,514 |
13 | SALMONN | 1,065 |
14 | transfusion-pytorch | 752 |
15 | UniControl | 624 |
16 | zeta | 430 |
17 | VLDet | 185 |
18 | vlm-api | 3 |