SaaSHub helps you find the best software and product alternatives Learn more →
InternVideo Alternatives
Similar projects and alternatives to InternVideo
-
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
-
MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
-
CoCa-pytorch
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
-
unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
InternVideo reviews and mentions
-
[Demo] Watch Videos with ChatGPT
Thanks for your interest! If you had any ideas to make the given demo more user-friendly, please do not hesitate to share them with us. We are open to discussing relevant ideas about video foundation models or other topics. We made some progress in these areas (InternVideo, VideoMAE v2, UMT, and more). We believe that user-level intelligent video understanding is on the horizon with the current LLM, computing power, and video data.
-
[R] InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Found relevant code at https://github.com/OpenGVLab/InternVideo + all code implementations here
The foundation models have recently shown excellent performance on a variety of downstream tasks in computer vision. However, most existing vision foundation models simply focus on image-level pretraining and adaption, which are limited for dynamic and complex video-level understanding tasks. To fill the gap, we present general video foundation models, InternVideo, by taking advantage of both generative and discriminative self-supervised video learning. Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications. Without bells and whistles, InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications. Especially, our methods can obtain 91.1% and 77.2% top-1 accuracy on the challenging Kinetics-400 and Something-Something V2 benchmarks, respectively. All of these results effectively show the generality of our InternVideo for video understanding. The code will be released at https://github.com/OpenGVLab/InternVideo.
-
A note from our sponsor - SaaSHub
www.saashub.com | 28 Apr 2024
Stats
OpenGVLab/InternVideo is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of InternVideo is Python.
Sponsored