Otter
Video-LLaMA
Otter | Video-LLaMA | |
---|---|---|
4 | 8 | |
3,447 | 2,423 | |
- | 4.6% | |
9.1 | 8.4 | |
about 2 months ago | 6 months ago | |
Python | Python | |
MIT License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Otter
-
OpenAI vs Google, Detect ChatGPT Content with 99% accuracy, Navigating AI compute costs
👀 Video-LLaMA - Empower large language models with video and audio understanding capability. (link) 🦦 Otter - Multi-modal model with improved instruction-following and in-context learning ability. 🔗 Linkly.AI - AI-powered lead analytics and management platform that helps you track, analyze, and streamline your leads in one place. 🎬 Jet Cut Ready - AI plugin for Adobe Premiere Pro that automatically removes silent parts in videos. (link) 💬 HeyGen's ChatGPT Plugin - Convert text into high-quality videos using AI text and video generation.
- Multimodal models and "active" learning
- Otter: A Multi-Modal Model with In-Context Instruction Tuning
-
Otter is a multi-modal model developed on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on a dataset of multi-modal instruction-response pairs. Otter demonstrates remarkable proficiency in multi-modal perception, reasoning, and in-context learning.
GitHub repo includes HuggingFace links to the model: https://github.com/Luodian/Otter
Video-LLaMA
- Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
-
OpenAI vs Google, Detect ChatGPT Content with 99% accuracy, Navigating AI compute costs
👀 Video-LLaMA - Empower large language models with video and audio understanding capability. (link) 🦦 Otter - Multi-modal model with improved instruction-following and in-context learning ability. 🔗 Linkly.AI - AI-powered lead analytics and management platform that helps you track, analyze, and streamline your leads in one place. 🎬 Jet Cut Ready - AI plugin for Adobe Premiere Pro that automatically removes silent parts in videos. (link) 💬 HeyGen's ChatGPT Plugin - Convert text into high-quality videos using AI text and video generation.
- Video-LLaMA: Instruction-Tuned Audio-Visual Lang Model for Video Understanding
-
Unleash the Power of Video-LLaMA: Revolutionizing Language Models with Video and Audio Understanding!
Prepare to be blown away by the cutting-edge Video-LLaMA project! We're pushing the boundaries of language models by equipping them with the remarkable ability to comprehend video and audio. Get ready for an extraordinary adventure! 🌟
-
Video-LLaMA An Instruction-tuned Audio-Visual Language Model for Video Understanding
Source Code: The codebase for pre-training and fine-tuning the Video-LLaMA model as well as the model weights are available on GitHub: https://github.com/DAMO-NLP-SG/Video-LLaMA
-
Video-ChatGPT: Redefining Interactions with Visual Data
Tons of cool stuff happening in the space, also recently saw the LLaMa-Video version of this - https://github.com/DAMO-NLP-SG/Video-LLaMA
-
Meet Video-LLaMA: A Multi-Modal Framework that Empowers Large Language Models (LLMs) with the Capability of Understanding both Visual and Auditory Content in the Video
Code: https://github.com/DAMO-NLP-SG/Video-LLaMA
What are some alternatives?
LLaMA-Adapter - [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
mPLUG-Owl - mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
NExT-GPT - Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Sophia - Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Awesome-Multimodal-Large-Language-Models - :sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Chinese-LLaMA-Alpaca - ä¸æ–‡LLaMA&Alpaca大è¯è¨€æ¨¡åž‹+本地CPU/GPUè®ç»ƒéƒ¨ç½² (Chinese LLaMA & Alpaca LLMs)
LinkedInGPT - Skynet
MiniGPT-4-discord-bot - A true multimodal LLaMA derivative -- on Discord!
squeezelite-esp32 - ESP32 Music streaming based on Squeezelite, with support for multi-room sync, AirPlay, Bluetooth, Hardware buttons, display and more
nheko - Desktop client for Matrix using Qt and C++20.