-
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
The project currently only has a basic framework and includes two main subprojects by leveraging existing APIs and open-sourced solutions:
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Video MiniGPT-4: It implicitly encodes videos into features and feeds it into Vicuna to achieve simple Q&A. Currently, a video prompt based on MiniGPT-4 has been introduced. Since no training has been used in our project, it is insensitive to timing and the effect needs improvement.
-
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Video MiniGPT-4: It implicitly encodes videos into features and feeds it into Vicuna to achieve simple Q&A. Currently, a video prompt based on MiniGPT-4 has been introduced. Since no training has been used in our project, it is insensitive to timing and the effect needs improvement.
-
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
In terms of effectiveness, VideoChat can cover most Q&A, but it is still imperfect. Q&A heavily relies on explicitly encoding video text and requires delicate prompt design. Also, the inference cost is high, and there is a long way to go before the actual application. Recently, implicit encoding explored by BLIP2, MiniGPT-4, and LLaVA has shown a sound and imaginative direction.
-
Thanks for your interest! If you had any ideas to make the given demo more user-friendly, please do not hesitate to share them with us. We are open to discussing relevant ideas about video foundation models or other topics. We made some progress in these areas (InternVideo, VideoMAE v2, UMT, and more). We believe that user-level intelligent video understanding is on the horizon with the current LLM, computing power, and video data.
-
Thanks for your interest! If you had any ideas to make the given demo more user-friendly, please do not hesitate to share them with us. We are open to discussing relevant ideas about video foundation models or other topics. We made some progress in these areas (InternVideo, VideoMAE v2, UMT, and more). We believe that user-level intelligent video understanding is on the horizon with the current LLM, computing power, and video data.
-
unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Thanks for your interest! If you had any ideas to make the given demo more user-friendly, please do not hesitate to share them with us. We are open to discussing relevant ideas about video foundation models or other topics. We made some progress in these areas (InternVideo, VideoMAE v2, UMT, and more). We believe that user-level intelligent video understanding is on the horizon with the current LLM, computing power, and video data.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Related posts
-
[R] InternVideo: General Video Foundation Models via Generative and Discriminative Learning
-
LM-Kit.NET VS LLamaSharp - a user suggested alternative
2 projects | 4 Sep 2024 -
Ask HN: Most successful example using LLMs in daily work/life?
-
Show HN: I Remade the Fake Google Gemini Demo, Except Using GPT-4 and It's Real
-
Image-to-Caption Generator