Home

Awesome

On the Connection between Local Attention and Dynamic Depth-wise Convolution (ICLR 2022 spotlight) arxiv

This is the official PyTorch implementation of our paper. We simply replace local self attention by (dynamic) depth-wise convolution with lower computational cost. The performance is on par with the Swin Transformer.

Besides, the main contribution of our paper is the theorical and detailed comparison between depth-wise convolution and local self attention from three aspects: sparse connectivity, weight sharing and dynamic weight. By this paper, we want community to rethinking the local self attention and depth-wise convolution, and the basic model architeture designing rules.

<p align="center"> <img width="600" height="300" src="figures/relation.png"> </p>

Codes and models for object detection and semantic segmentation are avaliable in Detection and Segmentation.

Reference

@inproceedings{han2021connection,
  title={On the Connection between Local Attention and Dynamic Depth-wise Convolution},
  author={Han, Qi and Fan, Zejia and Dai, Qi and Sun, Lei and Cheng, Ming-Ming and Liu, Jiaying and Wang, Jingdong},
  booktitle={International Conference on Learning Representations},
  year={2022}
}

1. Requirements

torch>=1.5.0, torchvision, timm, pyyaml; apex-amp

data perpare: ImageNet dataset with the following structure:

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

2. Trainning

For tiny model, we train with batch-size 128 on 8 GPUs. When trainning base model, we use batch-size 64 on 16 GPUs with OpenMPI to keep the total batch-size unchanged. (With the same trainning setting, the base model couldn't train with AMP due to the anomalous gradient values.)

Please change the data path in sh scripts first.

For tiny model:

bash scripts/run_dwnet_tiny_patch4_window7_224.sh 
bash scripts/run_dynamic_dwnet_tiny_patch4_window7_224.sh

For base model, use multi node with OpenMPI:

bash scripts/run_dwnet_base_patch4_window7_224.sh 
bash scripts/run_dynamic_dwnet_base_patch4_window7_224.sh

3. Evaluation

python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --cfg configs/change_to_config_file --resume /path/to/model --data-path /path/to/imagenet --eval

4. Models

Models are provided by training on ImageNet with resolution 224.

Model#paramsFLOPsTop1 AccDownload
DWNet-tiny24M3.8G81.2github
dynamic DWNet-tiny51M3.8G81.8github
DWNet-base74M12.9G83.2github
dynamic dwnet-base162M13.0G83.2github

Detection (see Detection for details):

BackbonePretrainLr Schdbox mAPmask mAP#paramsFLOPsconfigmodel
DWNet-TImageNet-1K3x49.943.482M730Gconfiggithub
DWNet-BImageNet-1K3x51.044.1132M924Gconfiggithub
dynamic DWNet-TImageNet-1K3x50.543.7108M730Gconfiggithub
dynamic DWNet-BImageNet-1K3x51.244.4219M924Gconfiggithub

Segmentation (see Segmentation for details):

BackbonePretrainLr SchdmIoU#paramsFLOPsconfigmodel
DWNet-TImageNet-1K160K45.556M928Gconfiggithub
DWNet-BImageNet-1K160K48.3108M1129Gconfiggithub
dynamic DWNet-TImageNet-1K160K45.783M928Gconfiggithub
dynamic DWNet-BImageNet-1K160K48.0195M1129Gconfiggithub

LICENSE

This repo is under the MIT license. Some codes are borrow from Swin Transformer.