Awesome

Attributable Visual Similarity Learning

This repository is the official PyTorch implementation of Attributable Visual Similarity Learning (CVPR 2022).

This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images. Extensive experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods and verify the interpretability of our framework.

Framework

AEL

Datasets

CUB-200-2011

Download from here.

Organize the dataset as follows:

- cub200
    |- train
    |   |- class0
    |   |   |- image0_1
    |   |   |- ...
    |   |- ...
    |- test
        |- class100
        |   |- image100_1
        |   |- ...
        |- ...

Cars196

Download from here.

Organize the dataset as follows:

- cars196
    |- train
    |   |- class0
    |   |   |- image0_1
    |   |   |- ...
    |   |- ...
    |- test
        |- class98
        |   |- image98_1
        |   |- ...
        |- ...

Stanford Online Products

Download from here.

Organize the dataset as follows:

- online_products
    |- images
    |   |- bicycle_final 
    |   |- chair_final
    |   |- ...
    |- Info_Files
        |- Ebay_final.txt
        |- Ebay_info.txt
        |- ...

Requirements

To install requirements:

pip install -r requirements.txt

Training

Baseline models

To train resnet50 on Cars196 with ProxyAnchor-baseline, run this command as follows:

python examples/demo.py --data_path <path-to-data> --save_path <path-to-log> --device 0 --batch_size 180 --test_batch_size 180 --setting proxy_anchor --embeddings_dim 512 --proxyanchor_margin 0.1 --proxyanchor_alpha 32 --num_classes 98 --wd 0.0001 --gamma 0.5 --step 10 --lr_trunk 0.0001 --lr_embedder 0.0001 --lr_collector 0.01 --dataset cars196 --model resnet50 --delete_old --save_name proxy-anchor-resnet50-cars196-baseline --warm_up 5 --warm_up_list embedder collector

For more baseline settings, please refer to samples_baseline.

Our models

To train resnet50 on Cars196 with ProxyAnchor-AVSL, run this command as follows:

python examples/demo.py --data_path <path-to-data> --save_path <path-to-log> --device 0 --batch_size 180 --test_batch_size 180 --setting avsl_proxyanchor --feature_dim_list 512 1024 2048 --embeddings_dim 512 --avsl_m 0.5 --topk_corr 128 --prob_gamma 10 --index_p 2 --pa_pos_margin 1.8 --pa_neg_margin 2.2 --pa_alpha 16 --final_pa_pos_margin 1.8 --final_pa_neg_margin 2.2 --final_pa_alpha 16 --num_classes 98 --use_proxy --wd 0.0001 --gamma 0.5 --step 5 --dataset cars196 --model resnet50 --splits_to_eval test --warm_up 5 --warm_up_list embedder collector --loss0_weight=1 --loss1_weight=4 --loss2_weight=4 --lr_collector=0.1 --lr_embedder=0.0002 --lr_trunk=0.0002 \
--save_name proxy-anchor-resnet50-cars196-avsl

For more AVSL settings, please refer to samples_avsl.

Device

We tested our code on a linux machine with an Nvidia RTX 3090 GPU card. We recommend using a GPU card with a memory > 16GB.

Results

Results on CUB-200-2011:

Model name	Recall @ 1	Recall @ 2	Recall @ 4	Recall @ 8
baseline-PA	69.7	80.0	87.0	92.4
AVSL-PA	71.9	81.7	88.1	93.2

Results on Cars196:

Model name	Recall @ 1	Recall @ 2	Recall @ 4	Recall @ 8
baseline-PA	87.7	92.9	95.8	97.9
AVSL-PA	91.5	95.0	97.0	98.4

Results on Stanford Online Products:

Model name	Recall @ 1	Recall @ 10	Recall @ 100
baseline-PA	78.4	90.5	96.2
AVSL-PA	79.6	91.4	96.4

Bibtex

@article{zhang2022attributable,
  title={Attributable Visual Similarity Learning},
  author={Borui Zhang and Wenzhao Zheng and Jie Zhou and Jiwen Lu},
  journal={arXiv preprint arXiv:2203.14932},
  year={2022}
}