OFA
mPLUG-Owl
Our great sponsors
OFA | mPLUG-Owl | |
---|---|---|
3 | 2 | |
2,318 | 1,892 | |
2.2% | 6.0% | |
5.8 | 8.0 | |
6 months ago | 13 days ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OFA
-
[R][P] Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework + VQA Hugging Face Spaces Demo
github: https://github.com/OFA-Sys/OFA
-
OFA: model that does text-to-image as well as other tasks
From this:
- [R] Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework. Shocking performance in text-to-image synthesis and open-domain tasks.
mPLUG-Owl
-
Unleash the Power of Video-LLaMA: Revolutionizing Language Models with Video and Audio Understanding!
We extend our deepest gratitude to the extraordinary projects that have influenced and contributed to the development of Video-LLaMA. We're indebted to MiniGPT-4, FastChat, BLIP-2, EVA-CLIP, ImageBind, LLaMA, VideoChat, LLaVA, WebVid, and mPLUG-Owl for their invaluable contributions. Special thanks to Midjourney for creating the stunning Video-LLaMA logo, encapsulating the essence of our groundbreaking project.
- [P]mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
What are some alternatives?
ImageNet21K - Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
GroundingDINO - Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Video-LLaMA - [EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
ONE-PEACE - A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
LLMSurvey - The official GitHub page for the survey paper "A Survey of Large Language Models".
MAGIC - Language Models Can See: Plugging Visual Controls in Text Generation
NExT-GPT - Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
UPop - [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
ExpertLLaMA - An opensource ChatBot built with ExpertPrompting which achieves 96% of ChatGPT's capability.
ChatPDF - Chat with any PDF. Easily upload the PDF documents you'd like to chat with. Instant answers. Ask questions, extract information, and summarize documents with AI. Sources included.
CodeCapybara - Open-source Self-Instruction Tuning Code LLM