Awesome
Attributable Visual Similarity Learning
This repository is the official PyTorch implementation of Attributable Visual Similarity Learning (CVPR 2022).
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images. Extensive experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods and verify the interpretability of our framework.
Framework
Datasets
CUB-200-2011
Download from here.
Organize the dataset as follows:
- cub200
|- train
| |- class0
| | |- image0_1
| | |- ...
| |- ...
|- test
|- class100
| |- image100_1
| |- ...
|- ...
Cars196
Download from here.
Organize the dataset as follows:
- cars196
|- train
| |- class0
| | |- image0_1
| | |- ...
| |- ...
|- test
|- class98
| |- image98_1
| |- ...
|- ...
Stanford Online Products
Download from here.
Organize the dataset as follows:
- online_products
|- images
| |- bicycle_final
| |- chair_final
| |- ...
|- Info_Files
|- Ebay_final.txt
|- Ebay_info.txt
|- ...
Requirements
To install requirements:
pip install -r requirements.txt
Training
Baseline models
To train resnet50
on Cars196
with ProxyAnchor-baseline
, run this command as follows:
python examples/demo.py --data_path <path-to-data> --save_path <path-to-log> --device 0 --batch_size 180 --test_batch_size 180 --setting proxy_anchor --embeddings_dim 512 --proxyanchor_margin 0.1 --proxyanchor_alpha 32 --num_classes 98 --wd 0.0001 --gamma 0.5 --step 10 --lr_trunk 0.0001 --lr_embedder 0.0001 --lr_collector 0.01 --dataset cars196 --model resnet50 --delete_old --save_name proxy-anchor-resnet50-cars196-baseline --warm_up 5 --warm_up_list embedder collector
For more baseline settings, please refer to samples_baseline
.
Our models
To train resnet50
on Cars196
with ProxyAnchor-AVSL
, run this command as follows:
python examples/demo.py --data_path <path-to-data> --save_path <path-to-log> --device 0 --batch_size 180 --test_batch_size 180 --setting avsl_proxyanchor --feature_dim_list 512 1024 2048 --embeddings_dim 512 --avsl_m 0.5 --topk_corr 128 --prob_gamma 10 --index_p 2 --pa_pos_margin 1.8 --pa_neg_margin 2.2 --pa_alpha 16 --final_pa_pos_margin 1.8 --final_pa_neg_margin 2.2 --final_pa_alpha 16 --num_classes 98 --use_proxy --wd 0.0001 --gamma 0.5 --step 5 --dataset cars196 --model resnet50 --splits_to_eval test --warm_up 5 --warm_up_list embedder collector --loss0_weight=1 --loss1_weight=4 --loss2_weight=4 --lr_collector=0.1 --lr_embedder=0.0002 --lr_trunk=0.0002 \
--save_name proxy-anchor-resnet50-cars196-avsl
For more AVSL settings, please refer to samples_avsl
.
Device
We tested our code on a linux machine with an Nvidia RTX 3090 GPU card. We recommend using a GPU card with a memory > 16GB.
Results
Results on CUB-200-2011:
Model name | Recall @ 1 | Recall @ 2 | Recall @ 4 | Recall @ 8 |
---|---|---|---|---|
baseline-PA | 69.7 | 80.0 | 87.0 | 92.4 |
AVSL-PA | 71.9 | 81.7 | 88.1 | 93.2 |
Results on Cars196:
Model name | Recall @ 1 | Recall @ 2 | Recall @ 4 | Recall @ 8 |
---|---|---|---|---|
baseline-PA | 87.7 | 92.9 | 95.8 | 97.9 |
AVSL-PA | 91.5 | 95.0 | 97.0 | 98.4 |
Results on Stanford Online Products:
Model name | Recall @ 1 | Recall @ 10 | Recall @ 100 |
---|---|---|---|
baseline-PA | 78.4 | 90.5 | 96.2 |
AVSL-PA | 79.6 | 91.4 | 96.4 |
Bibtex
@article{zhang2022attributable,
title={Attributable Visual Similarity Learning},
author={Borui Zhang and Wenzhao Zheng and Jie Zhou and Jiwen Lu},
journal={arXiv preprint arXiv:2203.14932},
year={2022}
}