Home

Awesome

Dual Aggregation Transformer for Image Super-Resolution

Zheng Chen, Yulun Zhang, Jinjin Gu, Linghe Kong, Xiaokang Yang, and Fisher Yu, "Dual Aggregation Transformer for Image Super-Resolution", ICCV, 2023

[paper] [arXiv] [supplementary material] [visual results] [pretrained models]

🔥🔥🔥 News


Abstract: Transformer has recently gained considerable popularity in low-level vision tasks, including image super-resolution (SR). These networks utilize self-attention along different dimensions, spatial or channel, and achieve impressive performance. This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner. Specifically, we alternately apply spatial and channel self-attention in consecutive Transformer blocks. The alternate strategy enables DAT to capture the global context and realize inter-block feature aggregation. Furthermore, we propose the adaptive interaction module (AIM) and the spatial-gate feed-forward network (SGFN) to achieve intra-block feature aggregation. AIM complements two self-attention mechanisms from corresponding dimensions. Meanwhile, SGFN introduces additional non-linear spatial information in the feed-forward network. Extensive experiments show that our DAT surpasses current methods.


HRLRSwinIRCATDAT (ours)
<img src="figs/img_059_HR_x4.png" height=80><img src="figs/img_059_Bicubic_x4.png" height=80><img src="figs/img_059_SwinIR_x4.png" height=80><img src="figs/img_059_CAT_x4.png" height=80><img src="figs/img_059_DAT_x4.png" height=80>
<img src="figs/img_049_HR_x4.png" height=80><img src="figs/img_049_Bicubic_x4.png" height=80><img src="figs/img_049_SwinIR_x4.png" height=80><img src="figs/img_049_CAT_x4.png" height=80><img src="figs/img_049_DAT_x4.png" height=80>

Dependencies

# Clone the github repo and go to the default directory 'DAT'.
git clone https://github.com/zhengchen1999/DAT.git
conda create -n DAT python=3.8
conda activate DAT
pip install -r requirements.txt
python setup.py develop

Contents

  1. Datasets
  2. Models
  3. Training
  4. Testing
  5. Results
  6. Citation
  7. Acknowledgements

Datasets

Used training and testing sets can be downloaded as follows:

Training SetTesting SetVisual Results
DIV2K (800 training images, 100 validation images) + Flickr2K (2650 images) [complete training dataset DF2K: Google Drive / Baidu Disk]Set5 + Set14 + BSD100 + Urban100 + Manga109 [complete testing dataset: Google Drive / Baidu Disk]Google Drive / Baidu Disk

Download training and testing datasets and put them into the corresponding folders of datasets/. See datasets for the detail of the directory structure.

Models

MethodParamsFLOPs (G)DatasetPSNR (dB)SSIMModel ZooVisual Results
DAT-S11.21M203.34Urban10027.680.8300Google Drive / Baidu DiskGoogle Drive / Baidu Disk
DAT14.80M275.75Urban10027.870.8343Google Drive / Baidu DiskGoogle Drive / Baidu Disk
DAT-211.21M216.93Urban10027.860.8341Google Drive / Baidu DiskGoogle Drive / Baidu Disk
DAT-light573K49.69Urban10026.640.8033Google Drive / Baidu DiskGoogle Drive / Baidu Disk

The performance is reported on Urban100 (x4). DAT-S, DAT, DAT-2: output size of FLOPs is 3×512×512. DAT-light: output size of FLOPs is 3×1280×720.

Training

Testing

Test images with HR

Test images without HR

Results

We achieve state-of-the-art performance. Detailed results can be found in the paper. All visual results of DAT can be downloaded here.

<details> <summary>Click to expand</summary> <p align="center"> <img width="900" src="figs/Table-1.png"> </p> <p align="center"> <img width="900" src="figs/Table-2.png"> </p> <p align="center"> <img width="900" src="figs/Table-3.png"> </p> <p align="center"> <img width="900" src="figs/Figure-1.png"> </p> <p align="center"> <img width="900" src="figs/Figure-2.png"> <img width="900" src="figs/Figure-3.png"> <img width="900" src="figs/Figure-4.png"> <img width="900" src="figs/Figure-5.png"> </p> </details>

Citation

If you find the code helpful in your research or work, please cite the following paper(s).

@inproceedings{chen2023dual,
    title={Dual Aggregation Transformer for Image Super-Resolution},
    author={Chen, Zheng and Zhang, Yulun and Gu, Jinjin and Kong, Linghe and Yang, Xiaokang and Yu, Fisher},
    booktitle={ICCV},
    year={2023}
}

Acknowledgements

This code is built on BasicSR.