Video-LLaMA
NExT-GPT
Video-LLaMA | NExT-GPT | |
---|---|---|
8 | 1 | |
2,455 | 2,882 | |
5.8% | - | |
6.6 | 9.3 | |
6 days ago | 4 months ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Video-LLaMA
- Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
-
OpenAI vs Google, Detect ChatGPT Content with 99% accuracy, Navigating AI compute costs
👀 Video-LLaMA - Empower large language models with video and audio understanding capability. (link) 🦦 Otter - Multi-modal model with improved instruction-following and in-context learning ability. 🔗 Linkly.AI - AI-powered lead analytics and management platform that helps you track, analyze, and streamline your leads in one place. 🎬 Jet Cut Ready - AI plugin for Adobe Premiere Pro that automatically removes silent parts in videos. (link) 💬 HeyGen's ChatGPT Plugin - Convert text into high-quality videos using AI text and video generation.
- Video-LLaMA: Instruction-Tuned Audio-Visual Lang Model for Video Understanding
-
Unleash the Power of Video-LLaMA: Revolutionizing Language Models with Video and Audio Understanding!
Prepare to be blown away by the cutting-edge Video-LLaMA project! We're pushing the boundaries of language models by equipping them with the remarkable ability to comprehend video and audio. Get ready for an extraordinary adventure! 🌟
-
Video-LLaMA An Instruction-tuned Audio-Visual Language Model for Video Understanding
Source Code: The codebase for pre-training and fine-tuning the Video-LLaMA model as well as the model weights are available on GitHub: https://github.com/DAMO-NLP-SG/Video-LLaMA
-
Video-ChatGPT: Redefining Interactions with Visual Data
Tons of cool stuff happening in the space, also recently saw the LLaMa-Video version of this - https://github.com/DAMO-NLP-SG/Video-LLaMA
-
Meet Video-LLaMA: A Multi-Modal Framework that Empowers Large Language Models (LLMs) with the Capability of Understanding both Visual and Auditory Content in the Video
Code: https://github.com/DAMO-NLP-SG/Video-LLaMA
NExT-GPT
What are some alternatives?
mPLUG-Owl - mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
Otter - 🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
gpt_academic - 为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
InternChat - InternGPT / InternChat allows you to interact with ChatGPT by clicking, dragging and drawing using a pointing device. [Moved to: https://github.com/OpenGVLab/InternGPT]
Chinese-LLaMA-Alpaca - 中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
MiniGPT-4-discord-bot - A true multimodal LLaMA derivative -- on Discord!
InternGPT - InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
unilm - Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
LLMSurvey - The official GitHub page for the survey paper "A Survey of Large Language Models".