Home

Awesome

SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding

This repository is an official PyTorch implementation of the ECCV 2022 paper SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding.

Introduction

we investigate a new training mechanism to improve the Transformer encoder, named Selective Retraining (SiRi), which continually update the parameters of the encoder while periodically re-initialize the rest parameters as the training goes on. In this way, the model can be better optimized based on an enhanced encoder. Figure below shows the training process of SiRi. For more details. please refer to our paper.

SiRi

Updates

Installation

Environment:

Dataset preparation

For more installation details, please see the repository of MDETR, our code is built based on it.

Training

Evaluation

Model Zoo

TASK1: Referring Expression Comprehension

ModelvaltestAtestBmodel
MDETR* +SiRi85.8388.5681.27gdrive
MDETR* + MT SiRi85.8289.1181.08gdrive
ModelvaltestAtestBmodel
MDETR* +SiRi76.68 (76.63)82.01 (81.99)66.33 (66.86)gdrive
MDETR* + MT SiRi77.47 (77.53)83.04 (82.47)67.11 (67.89)gdrive
Modelvaltestmodel
MDETR* +SiRi76.6376.46gdrive
MDETR* + MT SiRi77.3976.80gdrive

TASK2: Referring Expression Segmentation

Coming soon!

Citing SiRi

@inproceedings{qu2022siri,
  title={SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding},
  author={Qu,Mengxue and Wu, Yu and Liu, Wu and Gong, Qiqi and Liang, Xiaodan and Olga, Russakovsky and Zhao, Yao and Wei, Yunchao},
  booktitle={ECCV},
  year={2022}
}

Acknowledgement

Our code is built on the previous work MDETR.