Home

Awesome

DTL for Memory-Efficient Tuning

This repo is the official implementation of our AAAI2024 paper "DTL: Disentangled Transfer Learning for Visual Recognition" (arXiv).

<div align="center"> <div> <a href='https://www.lamda.nju.edu.cn/fumh/' target='_blank'>Minghao Fu</a>&emsp; <a href='https://www.lamda.nju.edu.cn/zhuk/' target='_blank'>Ke Zhu</a>&emsp; <a href='https://cs.nju.edu.cn/wujx/' target='_blank'>Jianxin Wu</a> </div> <div> LAMDA, Nanjing University </div>

<h3>TL;DR</h3>

Different from current efficient tuning methods Adapter, LoRA and VPT that closely entangle the small trainable modules with the huge frozen backbone. We disentangle the weights update from the backbone network using a lightweight Compact Side Network (CSN). DTL not only greatly reduces GPU memory footage, but also achieves high accuracy in knowledge transfer.


</div>

Environment

Data Preparation

1. Visual Task Adaptation Benchmark (VTAB)

Please refer to SSF or VPT for preparing the 19 datasets included in VTAB-1K. For convenience, you can download the extracted file (VTAB.zip) to easily access the datasets.

2. Few-Shot Classification

We follow NOAH to conduct the few-shot evaluation. There are two parts you shold pay attention to:

The file structure should look like:

FGFS
├── few-shot_split
│   ├── fgvc-aircraft
│   │   └── annotations
│   │       ├── train_meta.list.num_shot_1.seed_0
│   │       └── ...
│   │    ...
│   └── food101
│       └── annotations
│           ├── train_meta.list.num_shot_1.seed_0
│           └── ...
├── fgvc-aircraft
│   ├── img1.jpeg
│   ├── img2.jpeg
│   └── ...
│   ...
└── food101
    ├── img1.jpeg
    ├── img2.jpeg
    └── ...

For convenience, the extracted datasets are uploaded (FGFS.zip).

3. Domain Generalization

For convenience, the extracted datasets are uploaded (DG.zip).

Note: The training set for ImageNet (train directiory in DG/imagenet/images) has not been uploaded due to the large file size, so you will need to prepare it yourself (probably by a symbolic link).

Usage

Pre-trained Models

Fine-tuning ViT-B/16 on VTAB

bash train_scripts/vit/vtab/$DATASET_NAME/train_dtl(+).sh

Fine-tuning ViT-B/16 on Few-shot Learning

bash train_scripts/vit/few_shot/$DATASET_NAME/train_dtl(+)_shot_$SHOT.sh

Fine-tuning ViT-B/16 on Domain Generalization

bash train_scripts/vit/domain_generalization/$DATASET_NAME/train_dtl(+).sh

Fine-tuning Swin-B on VTAB

bash train_scripts/swin/vtab/$DATASET_NAME/train_dtl(+).sh

Citation

If this project is helpful for you, you can cite our paper:

@inproceedings{fu2024dtl,
      title={DTL: Disentangled Transfer Learning for Visual Recognition},
      author={Fu, Minghao and Zhu, Ke and Wu, Jianxin},
      booktitle={Proceedings of AAAI Conference on Artificial Intelligence (AAAI)},
      year={2024},
}

Acknowledgement

The code is built upon SSF, NOAH, VPT and timm.