Home

Awesome

VTimeLLM [Paper]

Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

PWC

PWC

PWC

PWC

PWC

PWC

PWC


:loudspeaker: Latest Updates


VTimeLLM Overview :bulb:

VTimeLLM is a novel Video LLM designed for fine-grained video moment understanding and reasoning with respect to time boundary.

VTimeLLM adopts a boundary-aware three-stage training strategy, which respectively utilizes image-text pairs for feature alignment, multiple-event videos to increase temporal-boundary awareness, and high-quality video-instruction tuning to further improve temporal understanding ability as well as align with human intents.

framework


Contributions :trophy:


Installation :wrench:

We recommend setting up a conda environment for the project:

conda create --name=vtimellm python=3.10
conda activate vtimellm

git clone https://github.com/huangb23/VTimeLLM.git
cd VTimeLLM
pip install -r requirements.txt

Additionally, install additional packages for training cases.

pip install ninja
pip install flash-attn --no-build-isolation

Running Demo Offline :cd:

To run the demo offline, please refer to the instructions in offline_demo.md.

Training :train:

For training instructions, check out train.md.

Qualitative Analysis :mag:

A Comprehensive Evaluation of VTimeLLM's Performance across Multiple Tasks.

Video Understanding and Conversational Tasks :speech_balloon:

0


Creative Tasks :paintbrush:

1


Fine-grained Understanding Tasks :globe_with_meridians:

2


Video Reasoning Tasks :question:

3


Acknowledgements :pray:

We are grateful for the following awesome projects our VTimeLLM arising from:

If you're using VTimeLLM in your research or applications, please cite using this BibTeX:

@inproceedings{huang2024vtimellm,
  title={Vtimellm: Empower llm to grasp video moments},
  author={Huang, Bin and Wang, Xin and Chen, Hong and Song, Zihan and Zhu, Wenwu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={14271--14280},
  year={2024}
}

License :scroll:

<a rel="license" href="https://creativecommons.org/licenses/by-nc-nd/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-nd/4.0/80x15.png" /></a>

This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License</a>.

Looking forward to your feedback, contributions, and stars! :star2: