ImageNet21K
OFA
ImageNet21K | OFA | |
---|---|---|
1 | 3 | |
695 | 2,331 | |
2.9% | 1.2% | |
10.0 | 2.8 | |
over 1 year ago | 21 days ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ImageNet21K
-
Improving Search Quality for Non-English Queries with Fine-tuned Multilingual CLIP Models
ViT-B/32, using the ImageNet-21k dataset
OFA
-
[R][P] Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework + VQA Hugging Face Spaces Demo
github: https://github.com/OFA-Sys/OFA
-
OFA: model that does text-to-image as well as other tasks
From this:
- [R] Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework. Shocking performance in text-to-image synthesis and open-domain tasks.
What are some alternatives?
vision_transformer
GroundingDINO - Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
LMOps - General technology for enabling AI capabilities w/ LLMs and MLLMs
ONE-PEACE - A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Fashion12K_german_queries
MAGIC - Language Models Can See: Plugging Visual Controls in Text Generation
mPLUG-Owl - mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
UPop - [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
docarray - Represent, send, store and search multimodal data