Awesome
Masked Images Are Counterfactual Samples for Robust Fine-tuning
This repository is the official PyTorch implementation of "Masked Images Are Counterfactual Samples for Robust Fine-tuning" [paper], accepted by CVPR 2023.
Updates
- 2023-03-24: Code released.
Setups
0. System environment
Our experiments are conducted on:
- OS: Ubuntu 20.04.4
- GPU: NVIDIA GeForce RTX 3090
1. Python environment
- Python 3.9
- PyTorch 1.11
- cudatoolkit 11.3.1
- torchvision 0.12.0
- tensorboard 2.8.0
- scikit-learn 1.0.2
- torchattacks
- tqdm
2. Prepare datasets
The data directory (DATA_DIR
) should contain the following sub-directories:
ILSVRC2012
: ImageNetimagenet-a
: ImageNet-Aimagenet-r
: ImageNet-Rimagenet-sketch
: ImageNet-Sketchimagenetv2-matched-frequency
: ImageNet-V2objectnet-1.0
: ObjectNet
3. Setup directories in run.sh
Please modify line 3-6 of the main script run.sh
to set the proper directories:
LOG_DIR
: root directory for the logging of all experiments and runsDATA_DIR
: the directory for all datasets as stated aboveMODEL_DIR
: the directory for pre-trained model weights (i.e., CLIP weights; the weights will be automatically downloaded if not exist)EXP_NAME
: experiment name; to be a sub-directory ofLOG_DIR
Code usage
The bash script run.sh
provides a uniform and simplified interface of the Python scripts for training and evaluation, which accepts the following arguments:
- script mode: to train or evaluate a model; can be
train
,eval
ortrain-eval
- architecture:
clip_{arch}
, where{arch}
can beViT-B/32
,ViT-B/16
orViT-L/14
. - method: the training method (see
example.sh
orrun.sh
for available options) - masking: the masking strategy (see
example.sh
) - seed: an integer seed number (note: we use three seeds (0, 1, 2) in the paper)
- other arguments that are passed to the Python scripts
The following commands show an example of fine-tuning a CLIP ViT-B/32 model with our proposed method, using object-mask (threshold 0.3) & single-fill. Please refer to example.sh
for more examples.
# Build the zero-shot model
CUDA_VISIBLE_DEVICES=0 bash run.sh train 'clip_ViT-B/32' 'zeroshot' '' 0
# Fine-tune using our approach
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run.sh train 'clip_ViT-B/32' 'FT_FD_image_mask' 'ObjMaskSingleFill(0.3)' 0
# Evaluate the fine-tuned model (replace `train` by `eval`)
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run.sh eval 'clip_ViT-B/32' 'FT_FD_image_mask' 'ObjMaskSingleFill(0.3)' 0
Results
(WIP)
Acknowledgement
Some of the code in this repository is based on the following repositories: