Awesome
Implicit Visual-Textual (IVT) - Pytorch
This repository is the implementation of the paper "See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval."
Installation
- Creating conda environment
conda create -n ivt python=3.7
conda activate ivt
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=10.2 -c pytorch
- Install others
git clone https://github.com/TencentYoutuResearch/PersonRetrieval-IVT.git
cd PersonRetrieval-IVT
pip install -r requirements.txt
Getting Started
Pretrain
You can use our pre-trained model[zxvu] directly, otherwise, you need to download several datasets: Conceptual Captions, SBU Captions, COCO, and Visual Genome
Change the data root in pretrain_cuhk.py, then run:
python train_pretrain.py
Training with text-based re-ID datasets
# We leverage four V-100 GPUs for training on CUHK-PEDES and ICFG-PEDES datasets.
# training with multi-gpus
sh start.sh
# or, you could also train them with a single gpu, but slow speed, maybe better performance.
python train_cuhkpedes_gpu.py
python train_icfg_gpu.py
# As RSTPReid is small, we leverage only one V-100 GPU for training.
python train_rstp.py
Trained Models
We provide our trained models at Baidu Pan[bpvu].
Text-based re-ID Datasets
You can obtain the datasets from corresponding authors. We provide our processed json files at Baidu Pan[xktc].
Citations
If you find our work helpful, please cite using this BibTeX:
@inproceedings{shu2023see,
title={See finer, see more: Implicit modality alignment for text-based person retrieval},
author={Shu, Xiujun and Wen, Wei and Wu, Haoqian and Chen, Keyu and Song, Yiran and Qiao, Ruizhi and Ren, Bo and Wang, Xiao},
booktitle={ Proceedings of the European conference on computer vision Workshops (ECCVW)},
pages={624--641},
year={2023},
organization={Springer}
}
Contact us
If you have any questions, comments or suggestions, please do not hesitate to contact us at shuxj@mail.ioa.ac.cn.