Home

Awesome

🦜 VideoChat Family: Ask-Anything

Open in OpenXLab | <a src="https://img.shields.io/discord/1099920215724277770?label=Discord&logo=discord" href="https://discord.gg/A2Ex6Pph6A"> <img src="https://img.shields.io/discord/1099920215724277770?label=Discord&logo=discord"> </a> | <a src="https://img.shields.io/badge/cs.CV-2305.06355-b31b1b?logo=arxiv&logoColor=red" href="https://arxiv.org/abs/2305.06355"> <img src="https://img.shields.io/badge/cs.CV-2305.06355-b31b1b?logo=arxiv&logoColor=red"> </a>| <a src="https://img.shields.io/badge/cs.CV-2311.17005-b31b1b?logo=arxiv&logoColor=red" href="https://arxiv.org/abs/2311.17005"> <img src="https://img.shields.io/badge/cs.CV-2311.17005-b31b1b?logo=arxiv&logoColor=red"> </a>| <a src="https://img.shields.io/twitter/follow/opengvlab?style=social" href="https://twitter.com/opengvlab"> <img src="https://img.shields.io/twitter/follow/opengvlab?style=social"> </a> </a> <br> <a href="https://huggingface.co/spaces/OpenGVLab/VideoChatGPT"><img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm-dark.svg" alt="Open in Spaces"> [VideoChat-7B-8Bit] End2End ChatBOT for video and image. </a> <a href="https://huggingface.co/spaces/OpenGVLab/InternVideo2-Chat-8B-HD"><img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm-dark.svg" alt="Open in Spaces"> [InternVideo2-Chat-8B-HD]</a>

δΈ­ζ–‡ README 及 中文亀桁羀 | Paper

<!-- πŸš€: We update `video_chat` by **instruction tuning for video & image chatting** now! Find its details [here](https://arxiv.org/pdf/2305.06355.pdf). We release **instruction data** at [InternVideo](https://github.com/OpenGVLab/InternVideo/tree/main/Data/instruction_data). The old version of `video_chat` moved to `video_chat_with_chatGPT`. -->

⭐️: We are also working on a updated version, stay tuned!

:fire: Updates

<!-- # :speech_balloon: Example https://user-images.githubusercontent.com/24236723/233631602-6a69d83c-83ef-41ed-a494-8e0d0ca7c1c8.mp4 -->

πŸ”¨ Getting Started

Build video chat with:

:clapper: [End2End ChatBot]

https://github.com/OpenGVLab/Ask-Anything/assets/24236723/a8667e87-49dd-4fc8-a620-3e408c058e26

<video controls> <source src="[https://user-images.githubusercontent.com/24236723/233630363-b20304ab-763b-40e5-b526-e2a6b9e9cae2.mp4](https://github.com/OpenGVLab/Ask-Anything/assets/24236723/a8667e87-49dd-4fc8-a620-3e408c058e26)" type="video/mp4"> Your browser does not support the video tag. </video>

:movie_camera: [Communication with ChatGPT]

https://user-images.githubusercontent.com/24236723/233630363-b20304ab-763b-40e5-b526-e2a6b9e9cae2.mp4

<video controls> <source src="https://user-images.githubusercontent.com/24236723/233630363-b20304ab-763b-40e5-b526-e2a6b9e9cae2.mp4" type="video/mp4"> Your browser does not support the video tag. </video>

:page_facing_up: Citation

If you find this project useful in your research, please consider cite:

@article{2023videochat,
  title={VideoChat: Chat-Centric Video Understanding},
  author={KunChang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, Limin Wang, and Yu Qiao},
  journal={arXiv preprint arXiv:2305.06355},
  year={2023}
}

@inproceedings{li2024mvbench,
  title={Mvbench: A comprehensive multi-modal video understanding benchmark},
  author={Li, Kunchang and Wang, Yali and He, Yinan and Li, Yizhuo and Wang, Yi and Liu, Yi and Wang, Zun and Xu, Jilan and Chen, Guo and Luo, Ping and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={22195--22206},
  year={2024}
}

🌀️ Discussion Group

If you have any questions during the trial, running or deployment, feel free to join our WeChat group discussion! If you have any ideas or suggestions for the project, you are also welcome to join our WeChat group discussion!

image

We are hiring researchers, engineers and interns in General Vision Group, Shanghai AI Lab. If you are interested in working with us, please contact Yi Wang (wangyi@pjlab.org.cn).