Top 10 Python vision-language Projects
-
GroundingDINO
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
-
OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
-
ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
-
LViT
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Some of the foundation/base models include: * GroundedSAM (Segment Anything Model) * DETIC * GroundingDINO
We (Marqo) are doing a lot on 1 and 2. There is a huge amount to be done on the ML side of vector search and we are investing heavily in it. I think it has not quite sunk in that vector search systems are ML systems and everything that comes with that. I would love to chat about 1 and 2 so feel free to email me (email is in my profile). What we have done so far is here -> https://github.com/marqo-ai/marqo
Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25
Project mention: List of Stable Diffusion research softwares that I don't think gotten widespread adoption. | /r/StableDiffusion | 2023-12-10https://github.com/AILab-CVC/SEED -
I have conducted experiments and examples on accelerating ViT (Vision Transformer) using methods such as TensorRT, FasterTransformer, and xFormers. The experiments were conducted using a single A100 as a baseline. - https://github.com/bnabis93/vision-language-examples/tree/main/acceleration
Python vision-language related posts
Index
What are some of the best open-source vision-language projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | GroundingDINO | 4,978 |
2 | marqo | 4,111 |
3 | Chinese-CLIP | 3,590 |
4 | OFA | 2,323 |
5 | ONE-PEACE | 838 |
6 | SEED | 464 |
7 | LViT | 249 |
8 | image-captioning | 29 |
9 | rtic-gcn-pytorch | 20 |
10 | vision-language-examples | 7 |
Sponsored