Home

Awesome

VTG-GPT

<a href='https://arxiv.org/abs/2403.02076'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> PWC

This is our implementation for the paper VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT.

VTG-GPT leverages frozen GPTs to enable zero-shot inference without training.

Alt text

Preparation

  1. Install dependencies
conda create -n vtg-gpt python=3.10
conda activate vtg-gpt
pip install -r requirements.txt
  1. Unzip caption files
cd data/qvhighlights/caption/
unzip val.zip

Inference on QVHighlights val split

# inference
python infer_qvhighlights.py val

# evaluation
bash standalone_eval/eval.sh

Run the above code to get:

MetricsR1@0.5R1@0.7mAP@0.5mAP@0.75mAP@avg
Values59.0338.9056.1135.4435.57

MiniGPT-v2 for Image captioning

cd minigpt
conda create --name minigptv python=3.9
pip install -r requirements.txt
python run_v2.py

Baichuan2 for Query debiasing

cd Baichuan2
conda activate vtg-gpt
python rephrase_query.py

Acknowledgement

We thank Youyao Jia for helpful discussions.

This code is based on Moment-DETR and SeViLA. We used resources from MiniGPT-4, Baichuan2, LLaMa2. We thank the authors for their awesome open-source contributions.

Citation

If you find this project useful for your research, please kindly cite our paper.

@article{xu2024vtg,
  title={VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT},
  author={Xu, Yifang and Sun, Yunzhuo and Xie, Zien and Zhai, Benxiang and Du, Sidan},
  journal={Applied Sciences},
  volume={14},
  number={5},
  pages={1894},
  year={2024},
  publisher={MDPI}
}