Home

Awesome

Training-free Zero-Shot Video Temporal Grounding using Large-scale Pre-trained Models

In this work, we propose a training-free zero-shot video temporal grounding approach that leverages the ability of pre-trained large models. Our method achieves the best performance on zero-shot video temporal grounding on Charades-STA and ActivityNet Captions datasets without any training and demonstrates better generalization capabilities in cross-dataset and OOD settings.

Our paper was accepted by ECCV-2024.

pipeline

Quick Start

Requiments

Data Preparation

To reproduce the results in the paper, we provide the pre-extracted features of the VLM in this link and the outputs of the LLM in dataset/charades-sta/llm_outputs.json and dataset/activitynet/llm_outputs.json. Please download the pre-extracted features and configure the path for these features in data_configs.py file.

Main Results

Standard Split

# Charades-STA dataset
python evaluate.py --dataset charades --llm_output dataset/charades-sta/llm_outputs.json

# ActivityNet dataset
python evaluate.py --dataset activitynet --llm_output dataset/activitynet/llm_outputs.json
DatasetIoU=0.3IoU=0.5IoU=0.7mIoU
Charades-STA67.0449.9724.3244.51
ActivityNet49.3427.0213.3934.10

OOD Splits

# Charades-STA OOD-1
python evaluate.py --dataset charades --split OOD-1

# Charades-STA OOD-2
python evaluate.py --dataset charades --split OOD-2

# ActivityNet OOD-1
python evaluate.py --dataset activitynet --split OOD-1

# ActivityNet OOD-2
python evaluate.py --dataset activitynet --split OOD-2
DatasetIoU=0.3IoU=0.5IoU=0.7mIoU
Charades-STA OOD-166.0545.9120.7843.05
Charades-STA OOD-265.7543.7919.9542.62
ActivityNet OOD-143.8720.4111.2531.72
ActivityNet OOD-240.9718.5410.0330.33
# Charades-CD test-ood
python evaluate.py --dataset charades --split test-ood

# Charades-CG novel-composition
python evaluate.py --dataset charades --split novel-composition

# Charades-CG novel-word
python evaluate.py --dataset charades --split novel-word
DatasetIoU=0.3IoU=0.5IoU=0.7mIoU
Charades-STA test-ood65.0749.2423.0544.01
Charades-STA novel-composition61.5343.8418.6840.19
Charades-STA novel-word68.4956.2628.4946.90

Test on Custom Datasets

Feature Extraction

Please run feature_extraction.py to obtain the video features of your datasets.

python feature_extraction.py --input_root VIDEO_PATH --save_root FEATURE_SAVE_PATH

Data Configuration

Please add your dataset in the data_configs.py. You may need to adjust the stride and max_stride_factor to achieve better performance.

The format of the annotation file can refer to dataset/charades-sta/test_trivial.json.

Test without LLM

To test the performance with only VLM, please run:

python evaluate.py --dataset DATASET --split SPLIT

DATASET and SPLIT are the dataset name and split that you add in the data_configs.py.

Test with LLM

To obtain the outputs of LLM, please run:

python get_llm_outputs.py --api_key API_KEY --input_file ANNOTATION_FILE --output_file LLM_OUTPUT_FILE

We have implemented models from OpenAI, Google, and Groq. You can specify the model using --model_type and select a specific model with --model_name. You will need to apply for the corresponding model's API key and install the necessary dependencies, such as openai, google-generativeai, or groq.

To test the performance, please run:

python evaluate.py --dataset DATASET --split SPLIT --llm_output LLM_OUTPUT_FILE