Otter
Awesome-Multimodal-Large-Language-Models
Otter | Awesome-Multimodal-Large-Language-Models | |
---|---|---|
4 | 2 | |
3,447 | 8,991 | |
- | - | |
9.1 | 9.7 | |
about 2 months ago | 7 days ago | |
Python | ||
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Otter
-
OpenAI vs Google, Detect ChatGPT Content with 99% accuracy, Navigating AI compute costs
👀 Video-LLaMA - Empower large language models with video and audio understanding capability. (link) 🦦 Otter - Multi-modal model with improved instruction-following and in-context learning ability. 🔗 Linkly.AI - AI-powered lead analytics and management platform that helps you track, analyze, and streamline your leads in one place. 🎬 Jet Cut Ready - AI plugin for Adobe Premiere Pro that automatically removes silent parts in videos. (link) 💬 HeyGen's ChatGPT Plugin - Convert text into high-quality videos using AI text and video generation.
- Multimodal models and "active" learning
- Otter: A Multi-Modal Model with In-Context Instruction Tuning
-
Otter is a multi-modal model developed on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on a dataset of multi-modal instruction-response pairs. Otter demonstrates remarkable proficiency in multi-modal perception, reasoning, and in-context learning.
GitHub repo includes HuggingFace links to the model: https://github.com/Luodian/Otter
Awesome-Multimodal-Large-Language-Models
-
Don't we need a leaderboard for visual models?
There is this one: https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation As well as a leaderboard from OpenCompass (probably outdated): https://mmbench.opencompass.org.cn/leaderboard
-
Recommended open LLMs with image input modality?
https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation this is pretty comprehensive. tldr; blip is probably the best, though i've heard it does need a lot of vram. In my experience its the most responsive to prompt engineering.
What are some alternatives?
LLaMA-Adapter - [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
alpaca_farm - A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
NExT-GPT - Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Chain-of-ThoughtsPapers - A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".
Video-LLaMA - [EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
MindVideo - Official code base for MinD-Video
Sophia - Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
instructblip-pipeline - A multimodal inference pipeline that integrates InstructBLIP with textgen-webui for Vicuna and related models.
LinkedInGPT - Skynet
Awesome-LLM-Reasoning - Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
squeezelite-esp32 - ESP32 Music streaming based on Squeezelite, with support for multi-room sync, AirPlay, Bluetooth, Hardware buttons, display and more
Awesome-Multimodal-LLM - Research Trends in LLM-guided Multimodal Learning.