Home

Awesome

<div align="center"> Multi-scale Attention Network for Single Image Super-Resolution </div>

<div align="center">

Yan Wang<sup></sup>, Yusen Li<sup></sup>, Gang Wang, Xiaoguang Liu

</div> <p align="center"> Nankai University </p> <p align="center"> <a href="https://arxiv.org/abs/2209.14145" alt="arXiv"> <img src="https://img.shields.io/badge/arXiv-2209.14145-b31b1b.svg?style=flat" /></a> <a href="https://github.com/icandle/MAN/blob/main/LICENSE" alt="license"> <img src="https://img.shields.io/badge/license-Apache--2.0-%23B7A800" /></a> <a href="https://github.com/icandle/MAN/blob/main/images/man_ntire24.pdf"> <img src="https://img.shields.io/badge/Docs-Slide&Poster-8A2BE2" /></a> </p>

Overview: To unleash the potential of ConvNet in super-resolution, we propose a multi-scale attention network (MAN), by coupling a classical multi-scale mechanism with emerging large kernel attention. In particular, we proposed multi-scale large kernel attention (MLKA) and gated spatial attention unit (GSAU). Experimental results illustrate that our MAN can perform on par with SwinIR and achieve varied trade-offs between state-of-the-art performance and computations.

This repository contains PyTorch implementation for MAN (CVPRW 2024).

<details> <summary>Table of contents</summary> <p align="center">
  1. Requirements
  2. Datasets
  3. Implementary Details
  4. Train and Test
  5. Results and Models
  6. Acknowledgments
  7. Citation
</p> </details>

⚙️ Requirements

🎈 Datasets

Training: DIV2K or DF2K.

Testing: Set5, Set14, BSD100, Urban100, Manga109 (Google Drive/Baidu Netdisk).

Preparing: Please refer to the Dataset Preparation of BasicSR.

🔎 Implementary Details

Network architecture: Group number (n_resgroups): 1 for simplicity, MAB number (n_resblocks): 5/24/36, channel width (n_feats): 48/60/180 for tiny/light/base MAN.

<p align="center"> <img src="images/MAN_arch.png" width=100% height=100% > <br /></p> <em> Overview of the proposed MAN constituted of three components: the shallow feature extraction module (SF), the deep feature extraction module (DF) based on multiple multi-scale attention blocks (MAB), and the high-quality image reconstruction module. </em>

 

Component details: Three multi-scale decomposition modes are utilized in MLKA. The 7×7 depth-wise convolution is used in the GSAU.

<p align="center"> <img src="images/MAN_details.png" width=60% height=60% > <br /></p> <em> Details of Multi-scale Large Kernel Attention (MLKA), Gated Spatial Attention Unit (GSAU), and Large Kernel Attention Tail (LKAT). </em> &nbsp;

▶️ Train and Test

The BasicSR framework is utilized to train our MAN, also testing.

Training with the example option

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 train.py -opt options/trian_MAN.yml --launcher pytorch

Testing with the example option

python test.py -opt options/test_MAN.yml

The training/testing results will be saved in the ./experiments and ./results folders, respectively.

📊 Results and Models

Pretrained models available at Google Drive and Baidu Netdisk (pwd: mans for all links).

HR (x4)MAN-tinyEDSR-base+MAN-lightEDSR+MAN
<img width=100% height=100% src="images/Visual_Results/U004/HR.png"><img width=100% height=100% src="images/Visual_Results/U004/MAN-Tiny.png"><img width=100% height=100% src="images/Visual_Results/U004/EDSR-Base.png"><img width=100% height=100% src="images/Visual_Results/U004/MAN-Light.png"><img width=100% height=100% src="images/Visual_Results/U004/EDSR.png"><img width=100% height=100% src="images/Visual_Results/U004/MAN.png">
<img width=100% height=100% src="images/Visual_Results/U012/HR.png"><img width=100% height=100% src="images/Visual_Results/U012/MAN-Tiny.png"><img width=100% height=100% src="images/Visual_Results/U012/EDSR-Base.png"><img width=100% height=100% src="images/Visual_Results/U012/MAN-Light.png"><img width=100% height=100% src="images/Visual_Results/U012/EDSR.png"><img width=100% height=100% src="images/Visual_Results/U012/MAN.png">
<img width=100% height=100% src="images/Visual_Results/U044/HR.png"><img width=100% height=100% src="images/Visual_Results/U044/MAN-Tiny.png"><img width=100% height=100% src="images/Visual_Results/U044/EDSR-Base.png"><img width=100% height=100% src="images/Visual_Results/U044/MAN-Light.png"><img width=100% height=100% src="images/Visual_Results/U044/EDSR.png"><img width=100% height=100% src="images/Visual_Results/U044/MAN.png">
<img width=100% height=100% src="images/Visual_Results/D0850/HR.png"><img width=100% height=100% src="images/Visual_Results/D0850/MAN-Tiny.png"><img width=100% height=100% src="images/Visual_Results/D0850/EDSR-Base.png"><img width=100% height=100% src="images/Visual_Results/D0850/MAN-Light.png"><img width=100% height=100% src="images/Visual_Results/D0850/EDSR.png"><img width=100% height=100% src="images/Visual_Results/D0850/MAN.png">
Params/FLOPs150K/8G1518K/114G840K/47G43090K/2895G8712K/495G

Results of our MAN-tiny/light/base models. Set5 validation set is used below to show the general performance. The visual results of five testsets are provided in the last column.

MethodsParamsFLOPsPSNR/SSIM (x2)PSNR/SSIM (x3)PSNR/SSIM (x4)Results
MAN-tiny150K8.4G37.91/0.960334.23/0.925832.07/0.8930x2/x3/x4
MAN-light840K47.1G38.18/0.961234.65/0.929232.50/0.8988x2/x3/x4
MAN+8712K495G38.44/0.962334.97/0.931532.87/0.9030x2/x3/x4

💖 Acknowledgments

We would thank VAN and BasicSR for their enlightening work!

🎓 Citation

@inproceedings{wang2024multi,
  title={Multi-scale Attention Network for Single Image Super-Resolution},
  author={Wang, Yan and Li, Yusen and Wang, Gang and Liu, Xiaoguang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year={2024}
}

or

@article{wang2022multi,
  title={Multi-scale Attention Network for Single Image Super-Resolution},
  author={Wang, Yan and Li, Yusen and Wang, Gang and Liu, Xiaoguang},
  journal={arXiv preprint arXiv:2209.14145},
  year={2022}
}