Awesome
DTL for Memory-Efficient Tuning
This repo is the official implementation of our AAAI2024 paper "DTL: Disentangled Transfer Learning for Visual Recognition" (arXiv).
<div align="center"> <div> <a href='https://www.lamda.nju.edu.cn/fumh/' target='_blank'>Minghao Fu</a>  <a href='https://www.lamda.nju.edu.cn/zhuk/' target='_blank'>Ke Zhu</a>  <a href='https://cs.nju.edu.cn/wujx/' target='_blank'>Jianxin Wu</a> </div> <div> LAMDA, Nanjing University </div> <h3>TL;DR</h3>Different from current efficient tuning methods Adapter, LoRA and VPT that closely entangle the small trainable modules with the huge frozen backbone. We disentangle the weights update from the backbone network using a lightweight Compact Side Network (CSN). DTL not only greatly reduces GPU memory footage, but also achieves high accuracy in knowledge transfer.
</div>
Environment
- python 3.8
- pytorch >= 1.7
- torchvision >= 0.8
- timm 0.5.4
Data Preparation
1. Visual Task Adaptation Benchmark (VTAB)
Please refer to SSF or VPT for preparing the 19 datasets included in VTAB-1K. For convenience, you can download the extracted file (VTAB.zip) to easily access the datasets.
2. Few-Shot Classification
We follow NOAH to conduct the few-shot evaluation. There are two parts you shold pay attention to:
-
Images
For improved organization and indexing, images from five datasets (
fgvc-aircraft, food101, oxford-flowers102, oxford-pets, standford-cars
) should be consolidated into a folder named FGFS. -
Train/Val/Test splits
The content, copied from the
data/few-shot
directory in NOAH, should be placed in the FGFS folder and renamed asfew-shot_split
for path correction.
The file structure should look like:
FGFS
├── few-shot_split
│ ├── fgvc-aircraft
│ │ └── annotations
│ │ ├── train_meta.list.num_shot_1.seed_0
│ │ └── ...
│ │ ...
│ └── food101
│ └── annotations
│ ├── train_meta.list.num_shot_1.seed_0
│ └── ...
├── fgvc-aircraft
│ ├── img1.jpeg
│ ├── img2.jpeg
│ └── ...
│ ...
└── food101
├── img1.jpeg
├── img2.jpeg
└── ...
For convenience, the extracted datasets are uploaded (FGFS.zip).
3. Domain Generalization
-
Images
Please refer to DATASETS.md for ImageNet, ImageNet-A, ImageNet-R, ImageNet-Sketch and ImageNetV2 to download the images on these datasets. Probably you should create a new directory named
DG
to contain them as the root. -
Test splits
Copied from the
data/domain-generalization
. The directory 'annotations' needs to be placed within the subdirectory of each dataset.
For convenience, the extracted datasets are uploaded (DG.zip).
Note: The training set for ImageNet (train
directiory in DG/imagenet/images
) has not been uploaded due to the large file size, so you will need to prepare it yourself (probably by a symbolic link).
Usage
Pre-trained Models
- The pre-trained weights of ViT-B/16 is stored at this link.
- For Swin-B, the pre-trained weights will be automatically download to cache directory when you run training scripts.
Fine-tuning ViT-B/16 on VTAB
bash train_scripts/vit/vtab/$DATASET_NAME/train_dtl(+).sh
- Replace
DATASET_NAME
with the name you want for your dataset. - Update the
data_dir
andload_path
variables in the script to your specified values.
Fine-tuning ViT-B/16 on Few-shot Learning
bash train_scripts/vit/few_shot/$DATASET_NAME/train_dtl(+)_shot_$SHOT.sh
Fine-tuning ViT-B/16 on Domain Generalization
bash train_scripts/vit/domain_generalization/$DATASET_NAME/train_dtl(+).sh
Fine-tuning Swin-B on VTAB
bash train_scripts/swin/vtab/$DATASET_NAME/train_dtl(+).sh
Citation
If this project is helpful for you, you can cite our paper:
@inproceedings{fu2024dtl,
title={DTL: Disentangled Transfer Learning for Visual Recognition},
author={Fu, Minghao and Zhu, Ke and Wu, Jianxin},
booktitle={Proceedings of AAAI Conference on Artificial Intelligence (AAAI)},
year={2024},
}