Awesome
PyTorch-TransVG
An unofficial pytorch implementation of "TransVG: End-to-End Visual Grounding with Transformers".
paper: https://arxiv.org/abs/2104.08541
<img src="https://github.com/nku-shengzheliu/Pytorch-TransVG/blob/main/pipeline.PNG" width = 60% height = 60% align=center/>Due to some implementation details, I do not guarantee that I can reproduce the performance in the paper.
If you have any questions about the code please feel free to ask~
Update record
- 2021.5.10
- My model is still in training. My reproduced model performance table will be updated as soon as I finish the training.
- 2021.6.3
- The previously trained model was very slow to converge due to the wrong setting of
image mask
in transformer encoder. I fixed this bug and re-trained now.
- The previously trained model was very slow to converge due to the wrong setting of
- 2021.6.6 Reproduced model performance:
Prerequisites
Create the conda environment with the environment.yaml
file:
conda env create -f environment.yaml
Activate the environment with:
conda activate transvg
Installation
- Please refer to ReSC, and follow the steps to Prepare the submodules and associated data:
- RefCOCO, RefCOCO+, RefCOCOg, ReferItGame Dataset.
- Dataset annotations, which stored in
./data
- Please refer to DETR and download model weights, I used the DTER model with ResNet50, which reached an AP of 42.0 at COCO2017. Please store it in
./saved_models/detr-r50-e632da11.pth
Training
Train the model using the following commands:
python train.py --data_root XXX --dataset {dataset_name} --gpu {gpu_id}
Testing
Evaluate the model using the following commands:
python train.py --test --resume {saved_model_path} --data_root XXX --dataset {dataset_name} --gpu {gpu_id}
Acknowledgement
Thanks for the work of DETR and ReSC. My code is based on the implementation of them.