Awesome
[ICCV2023] D3G:Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation
Datasets
- Annotations. We adopt the glance annotation released by ViGA for training. Specifically, we adapt the glance annotation to our framework in json format ( see in dataset ).
- Features. As for video features, we utilize the publicly available features for fair comparison following MMN.
Please download the video features to directory dataset as follows.
dataset
├── Charades_STA
│ ├── vgg_rgb_features.hdf5
│ ├── glance_charades_train.json
│ ├── charades_test.json
├── ActivityNet
│ ├── sub_activitynet_v1-3.c3d.hdf5
│ ├── glance_train.json
│ ├── val.json
│ ├── test.json
├── TACoS
│ ├── tall_c3d_features.hdf5
│ ├── glance_train.json
│ ├── val.json
│ ├── test.json
Main Results
Charades-STA Dataset
Method | Rank1@0.5 | Rank1@0.7 | Rank5@0.5 | Rank5@0.7 |
---|
ViGA | 36.56 | 16.10 | 48.90 | 25.86 |
D3G | 41.64 | 19.60 | 79.25 | 49.30 |
ActivityNet Captions Dataset
Method | Rank1@0.3 | Rank1@0.5 | Rank1@0.7 | Rank5@0.3 | Rank5@0.5 | Rank5@0.7 |
---|
ViGA | 59.78 | 35.39 | 16.25 | 72.19 | 53.19 | 32.69 |
D3G | 58.25 | 36.68 | 18.54 | 87.84 | 74.21 | 52.47 |
TACoS Dataset
Method | Rank1@0.3 | Rank1@0.5 | Rank1@0.7 | Rank5@0.3 | Rank5@0.5 | Rank5@0.7 |
---|
ViGA | 20.82 | 9.52 | 3.10 | 27.92 | 15.35 | 6.10 |
D3G | 26.99 | 12.62 | 4.77 | 54.71 | 31.59 | 12.10 |
Training & Inference
cd scipts
### charades
sh charades_train.sh # train
sh charades_test.sh # test
### activitynet
sh anet_train.sh # train
sh anet_test.sh # test
### tacos
sh tacos_train.sh # train
sh tacos_test.sh # test