Top 5 vision-language-transformer Open-Source Projects
-
GroundingDINO
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
-
APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception (by shenyunhang)
-
UPop
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Some of the foundation/base models include: * GroundedSAM (Segment Anything Model) * DETIC * GroundingDINO
I suggest trying BLIP for this. I've had really good results from that.
https://github.com/salesforce/BLIP
I built a tiny Python CLI wrapper for it to make it easier to try: https://github.com/simonw/blip-caption
https://github.com/shenyunhang/APE (super new, idk usability on this one yet)
Project mention: Show HN: Compress vision-language and unimodal AI models by structured pruning | news.ycombinator.com | 2023-07-31
vision-language-transformer related posts
Index
What are some of the best open-source vision-language-transformer projects? This list will help you:
Project | Stars | |
---|---|---|
1 | LAVIS | 8,781 |
2 | GroundingDINO | 5,075 |
3 | BLIP | 4,278 |
4 | APE | 424 |
5 | UPop | 82 |
Sponsored