VideoMAEv2
openscene
VideoMAEv2 | openscene | |
---|---|---|
1 | 3 | |
412 | 553 | |
8.6% | - | |
4.1 | 4.9 | |
2 months ago | 7 months ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
VideoMAEv2
-
[Demo] Watch Videos with ChatGPT
Thanks for your interest! If you had any ideas to make the given demo more user-friendly, please do not hesitate to share them with us. We are open to discussing relevant ideas about video foundation models or other topics. We made some progress in these areas (InternVideo, VideoMAE v2, UMT, and more). We believe that user-level intelligent video understanding is on the horizon with the current LLM, computing power, and video data.
openscene
-
OPENSCENE can identify objects, materials, affordances, activities, and room types in complex 3D scenes, all using a single model trained without any labeled 3D data
Project website: github.io/openscene
-
Any recent tools for LiDAR segmentation?
Can anyone recommend recent tools/models that are good at segmenting point cloud data? My interest is semantic segmentation. Particularly segmenting objects in streets, such as traffic lanes, road signs, trees, power lines, etc. I tried some bits of conventional style a few years ago (YOLO and lots of 3D labeling and training, which was a pain), but I wanted to see if there is anything new out there. For example, I noticed Esri offers a power line extraction tool. I haven't tried but looks nice. Also, deep learning and language model fusion is really kicking in these days:https://github.com/pengsongyou/openscene.
What are some alternatives?
InternVideo - Video Foundation Models & Data for Multimodal Understanding
SadTalker - [CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
unmasked_teacher - [ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Torch-Pruning - [CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs
LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
IGEV - [CVPR 2023] Iterative Geometry Encoding Volume for Stereo Matching and Multi-View Stereo
Ask-Anything - [CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Instruct2Act - Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
mmaction2 - OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark