Video-LLaMA
MiniGPT-4-discord-bot
Video-LLaMA | MiniGPT-4-discord-bot | |
---|---|---|
8 | 1 | |
2,455 | 42 | |
5.8% | - | |
6.6 | 10.0 | |
5 days ago | about 1 year ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Video-LLaMA
- Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
-
OpenAI vs Google, Detect ChatGPT Content with 99% accuracy, Navigating AI compute costs
👀 Video-LLaMA - Empower large language models with video and audio understanding capability. (link) 🦦 Otter - Multi-modal model with improved instruction-following and in-context learning ability. 🔗 Linkly.AI - AI-powered lead analytics and management platform that helps you track, analyze, and streamline your leads in one place. 🎬 Jet Cut Ready - AI plugin for Adobe Premiere Pro that automatically removes silent parts in videos. (link) 💬 HeyGen's ChatGPT Plugin - Convert text into high-quality videos using AI text and video generation.
- Video-LLaMA: Instruction-Tuned Audio-Visual Lang Model for Video Understanding
-
Unleash the Power of Video-LLaMA: Revolutionizing Language Models with Video and Audio Understanding!
Prepare to be blown away by the cutting-edge Video-LLaMA project! We're pushing the boundaries of language models by equipping them with the remarkable ability to comprehend video and audio. Get ready for an extraordinary adventure! 🌟
-
Video-LLaMA An Instruction-tuned Audio-Visual Language Model for Video Understanding
Source Code: The codebase for pre-training and fine-tuning the Video-LLaMA model as well as the model weights are available on GitHub: https://github.com/DAMO-NLP-SG/Video-LLaMA
-
Video-ChatGPT: Redefining Interactions with Visual Data
Tons of cool stuff happening in the space, also recently saw the LLaMa-Video version of this - https://github.com/DAMO-NLP-SG/Video-LLaMA
-
Meet Video-LLaMA: A Multi-Modal Framework that Empowers Large Language Models (LLMs) with the Capability of Understanding both Visual and Auditory Content in the Video
Code: https://github.com/DAMO-NLP-SG/Video-LLaMA
MiniGPT-4-discord-bot
-
MiniGPT-4
linear layer, and train just the tiny layer on some datasets of image-text pairs.
But the results are pretty amazing. It completely knocks Openflamingo && even the original blip2 models out of the park. And best of all, it arrived before OpenAI's GPT-4 Image Modality did. Real win for Open Source AI.
The repo's default inference code is kind of bad -- vicuna is loaded in fp16 so it can't fit on any consumer hardware. I created a PR on the repo to load it with int8, so hopefully by tomorrow it'll be runnable by 3090/4090 users.
I also developed a toy discord bot (https://github.com/152334H/MiniGPT-4-discord-bot) to show the model to some people, but inference is very slow so I doubt I'll be hosting it publicly.
What are some alternatives?
mPLUG-Owl - mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
InternGPT - InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
NExT-GPT - Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
MiniGPT-4 - Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Otter - 🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
llama.go - llama.go is like llama.cpp in pure Golang!
LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
ROCm - AMD ROCm™ Software - GitHub Home [Moved to: https://github.com/ROCm/ROCm]
Chinese-LLaMA-Alpaca - 中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
AutoGPT - AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Auto-GPT - An experimental open-source attempt to make GPT-4 fully autonomous. [Moved to: https://github.com/Significant-Gravitas/AutoGPT]
llama.cpp - LLM inference in C/C++