Awesome
CPL: Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning
In this paper, we propose Contrastive ProposalLearning (CPL) for the weakly supervised temporal sentence grounding task. We use multiple learnable Gaussian functions to generate both positive and negative proposals within the same video that can characterize the multiple events in a long video. Then, we propose a controllable easy to hard negative proposal mining strategy to collect negative samples within the same video, which can ease the model optimization and enables CPL to distinguish highly confusing scenes. The experiments show that our method achieves state-of-the-art performance on Charades-STA and ActivityNet Captions datasets.
Our paper was accepted by CVPR-2022. [Paper] [Project Page]
Pipeline
Main Results
Charades-STA Dataset
Method | Rank1@0.3 | Rank1@0.5 | Rank1@0.7 | Rank5@0.3 | Rank5@0.5 | Rank5@0.7 |
---|---|---|---|---|---|---|
CPL | 66.40 | 49.24 | 22.39 | 96.99 | 84.71 | 52.37 |
CPL$^*$ | 65.99 | 49.05 | 22.61 | 96.99 | 84.71 | 52.37 |
Our trained model can be downloaded from here
ActivityNet Captions Dataset
Method | Rank1@0.1 | Rank1@0.3 | Rank1@0.5 | Rank5@0.1 | Rank5@0.3 | Rank5@0.5 |
---|---|---|---|---|---|---|
CPL | 79.86 | 53.67 | 31.24 | 87.24 | 63.05 | 43.13 |
CPL$^*$ | 82.55 | 55.73 | 31.37 | 87.24 | 63.05 | 43.13 |
Our trained model can be downloaded from here
If you can not reproduce the results with the configuration file, please try to adjust the hyperparameter lambda (e.g. from 0.125 to 0.135) in the configuration file, as in our experiments we found that the model is more sensitive to this hyperparameter.
Requiments
- pytorch
- h5py
- nltk
- fairseq
Quick Start
Data Preparation
We use the C3D feature for the ActivityNet Captions dataset. Please download from here and save as data/activitynet/c3d_features.hdf5
. We use the I3D feature provided by LGI and use this script to convert the file format to HDF5. We also provide the converted I3D feature for the Charades-STA dataset, and can be downloaded from here. We expect the directory structure to be the following:
data
├── activitynet
│ ├── sub_activitynet_v1-3.c3d.hdf5
│ ├── glove.pkl
│ ├── train_data.json
│ ├── val_data.json
│ ├── test_data.json
├── charades
│ ├── i3d_features.hdf5
│ ├── glove.pkl
│ ├── train.json
│ ├── test.json
Training
To train on the ActivityNet Captions dataset:
python train.py --config-path config/activitynet/main.json --log_dir LOG_DIR --tag TAG
To train on the Charades-STA dataset:
python train.py --config-path config/charades/main.json --log_dir LOG_DIR --tag TAG
Use --log_dir
to specify the directory where the logs are saved, and use --tag
to identify each experiment. They are both optional.
The model weights are saved in checkpoints/
by default and can be modified in the configuration file.
Inference
Our trained model are provided in checkpoints/
. Run the following commands for evaluation:
# Use loss-based strategy during inference
python train.py --config-path CONFIG_FILE --resume CHECKPOINT_FILE --eval
# Use vote-based strategy during inference
python train.py --config-path CONFIG_FILE --resume CHECKPOINT_FILE --eval --vote
The configuration file is the same as training.