Unleash the Power of Video-LLaMA: Revolutionizing Language Models with Video and Audio Understanding!

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Video-LLaMA

8 2,396 8.4 Python

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Prepare to be blown away by the cutting-edge Video-LLaMA project! We're pushing the boundaries of language models by equipping them with the remarkable ability to comprehend video and audio. Get ready for an extraordinary adventure! 🌟

LLaVA

20 16,101 9.4 Python

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

1️⃣ VL Branch: Combining ViT-G/14 and BLIP-2 Q-Former, this branch is a visual encoding powerhouse. Trained on the vast Webvid-2M video caption dataset, it excels in generating accurate text descriptions of videos. To enhance its understanding of static visual concepts, we've incorporated image-text pairs from LLaVA.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
FastChat

82 33,877 9.6 Python

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

We extend our deepest gratitude to the extraordinary projects that have influenced and contributed to the development of Video-LLaMA. We're indebted to MiniGPT-4, FastChat, BLIP-2, EVA-CLIP, ImageBind, LLaMA, VideoChat, LLaVA, WebVid, and mPLUG-Owl for their invaluable contributions. Special thanks to Midjourney for creating the stunning Video-LLaMA logo, encapsulating the essence of our groundbreaking project.

mPLUG-Owl

2 1,917 8.0 Python

mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model

We extend our deepest gratitude to the extraordinary projects that have influenced and contributed to the development of Video-LLaMA. We're indebted to MiniGPT-4, FastChat, BLIP-2, EVA-CLIP, ImageBind, LLaMA, VideoChat, LLaVA, WebVid, and mPLUG-Owl for their invaluable contributions. Special thanks to Midjourney for creating the stunning Video-LLaMA logo, encapsulating the essence of our groundbreaking project.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project