Awesome

VTG-GPT

This is our implementation for the paper VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT.

VTG-GPT leverages frozen GPTs to enable zero-shot inference without training.

Alt text

VTG-GPT
Acknowledgement
Citation

Preparation

Install dependencies

conda create -n vtg-gpt python=3.10
conda activate vtg-gpt
pip install -r requirements.txt

Unzip caption files

cd data/qvhighlights/caption/
unzip val.zip

Inference on QVHighlights val split

# inference
python infer_qvhighlights.py val

# evaluation
bash standalone_eval/eval.sh

Run the above code to get:

Metrics	R1@0.5	R1@0.7	mAP@0.5	mAP@0.75	mAP@avg
Values	59.03	38.90	56.11	35.44	35.57

MiniGPT-v2 for Image captioning

cd minigpt
conda create --name minigptv python=3.9
pip install -r requirements.txt

python run_v2.py

Baichuan2 for Query debiasing

cd Baichuan2
conda activate vtg-gpt

python rephrase_query.py

Acknowledgement

We thank Youyao Jia for helpful discussions.

This code is based on Moment-DETR and SeViLA. We used resources from MiniGPT-4, Baichuan2, LLaMa2. We thank the authors for their awesome open-source contributions.

Citation

If you find this project useful for your research, please kindly cite our paper.

@article{xu2024vtg,
  title={VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT},
  author={Xu, Yifang and Sun, Yunzhuo and Xie, Zien and Zhai, Benxiang and Du, Sidan},
  journal={Applied Sciences},
  volume={14},
  number={5},
  pages={1894},
  year={2024},
  publisher={MDPI}
}