Home

Awesome

[ECAI-2023] Efficient Information Modulation Network for Image Super-Resolution

Xiao Liu<sup>1 </sup>, Xiangyu Liao<sup>1 </sup>, Xiuya Shi<sup>1 </sup>, Linbo Qing<sup>1 </sup> and Chao Ren<sup>1, *</sup>

<sup>1</sup> Sichuan University, <sup> *</sup> Corresponding Author

🤗 paper 😀 Supplementary materials

<img src="images/complexity.png" alt="complexity" style="zoom:100%;"/> <hr />

:writing_hand: Changelog and ToDos

<hr />

:bulb: Abstract

main figure

Abstract: Recent researches have shown that the success of Transformers comes from their macro-level framework and advanced components, not just their self-attention (SA) mechanism. Comparable results can be obtained by replacing SA with spatial pooling, shifting, MLP, fourier transform and constant matrix, all of which have spatial information encoding capability like SA. In light of these findings, this work focuses on combining efficient spatial information encoding technology with superior macro architectures in Transformers. We rethink spatial convolution to achieve more efficient encoding of spatial features and dynamic modulation value representations by convolutional modulation techniques. The large-kernel convolution and Hadamard product are utilizated in the proposed Multi-orders Long-range convolutional modulation (MOLRCM) layer to imitate the implementation of SA. Moreover, MOLRCM layer also achieve long-range correlations and self-adaptation behavior, similar to SA, with linear complexity. On the other hand, we also address the sub-optimality of vanilla feed-forward networks (FFN) by introducing spatial awareness and locality, improving feature diversity, and regulating information flow between layers in the proposed Spatial Awareness Dynamic Feature Flow Modulation (SADFFM) layer. Experiment results show that our proposed efficient information modulation network (EIMN) performs better both quantitatively and qualitatively.

<hr />

:sparkles: Synthetic Image SISR Results

<details> <summary><strong>Quantitative Comparison with SOTA</strong> (click to expand) </summary> <p><img src = "./images/table.png" width=100% height=100%></p> Quantitative comparison with SOTA methods on five popular benchmark datasets. Blue text indicates the best results. `Multi-Adds' is calculated with a 1280 $\times$ 720 HR image. </details> <details> <summary><strong>Qualitative Comparison with SOTA</strong> (click to expand) </summary> <p><img src = "images/manga.png" width=50% height=50%></p> <p><img src = "images/set14_barbara.png" width=50% height=50%></p> <p><img src = "images/urban_012.png" width=50% height=50%></p> <p><img src = "images/urban_014.png" width=50% height=50%></p> <p><img src = "images/urban_034.png" width=50% height=50%></p> <p><img src = "images/urban_038.png" width=50% height=50%></p> <p><img src = "images/urban_044.png" width=50% height=50%></p> <p><img src = "images/urban_062.png" width=50% height=50%></p> <p><img src = "images/urban_076.png" width=50% height=50%></p> </details> <details> <summary><strong>LAM visualization analysis</strong> (click to expand) </summary> <p><img src = "images/lam-1.png" width=50% height=50%></p> <p><img src = "images/lam-2.png" width=50% height=50%></p> <p><img src = "images/lam-3.png" width=50% height=50%></p> Results of Local Attribution Maps. A more widely distributed red area and higher DI represent a larger range pixels utilization. </details> <hr />

:rocket: Installation

This repository is built in PyTorch 1.12.1 and trained on Centos 4.18.0 environment (Python3.7, CUDA11.6, cuDNN8.0).

  1. Clone our repository
git clone https://github.com/liux520/EIMN_BasicSR.git
cd EIMN_BasicSR
<hr />

:computer: Usage

0. Dataset Preparation

1. Evaluation

python demo/test_on_custom_datset.py
python demo/demo.py

2. Training

python basicsr/train.py -opt options/train/EIMN/train_EIMNNet_x2_bicubic.yml
<hr />

:arrow_double_down: Model Zoo

SISR-1: Bicubic degradation (Train dataset: DF2K-Large-Image)

Model#ParamsFLOPsSet5Set14Urban100Manga109BSDS100
EIMN_L_x2981K212G38.26/0.962034.14/0.922733.23/0.938139.42/0.978632.41/0.9034
EIMN_L_x3990K95G34.76/0.930430.70/0.849029.05/0.869834.60/0.950229.33/0.8127
EIMN_L_x41002K54G32.63/0.900828.94/0.789726.88/0.808431.52/0.918327.82/0.7458
EIMN_A_x2860K186G38.26/0.961934.12/0.922233.15/0.937339.48/0.978832.40/0.9034
EIMN_A_x3868K83G34.70/0.929930.65/0.848128.87/0.866034.45/0.949229.31/0.8121
EIMN_A_x4880K47G32.53/0.899328.89/0.788226.68/0.802731.22/0.941827.79/0.7447

SISR-2: Bicubic degradation (Train dataset: Multi-scale DF2K subimages)

Model#ParamsFLOPsSet5Set14Urban100Manga109BSDS100
EIMN_L_x2981K212G
EIMN_L_x3990K95G
EIMN_L_x41002K54G
EIMN_A_x2860K186G
EIMN_A_x3868K83G
EIMN_A_x4880K47G

SISR-3: Bicubic degradation (Train dataset: DIV2K-Large-Image)

Model#ParamsFLOPsSet5Set14Urban100Manga109BSDS100
EIMN_L_x2981K212G38.22/0.961933.93/0.921832.86/0.935439.21/0.977632.34/0.9027
EIMN_L_x3990K95G
EIMN_L_x41002K54G
EIMN_A_x2860K186G
EIMN_A_x3868K83G
EIMN_A_x4880K47G

SISR-4: Bicubic degradation (Train dataset: Multi-scale DIV2K subimages)

Model#ParamsFLOPsSet5Set14Urban100Manga109BSDS100
EIMN_L_x2981K212G38.23/0.961933.96/0.921232.98/0.936739.33/0.978332.36/0.9029
EIMN_L_x3990K95G
EIMN_L_x41002K54G
EIMN_A_x2860K186G
EIMN_A_x3868K83G
EIMN_A_x4880K47G

SISR-5: Practical degradation model (Train dataset: DF2K)

<center class="half"> <img src="./images/image0.png" width=500/> <img src="./images/image1.png" width=600/> </center> <center class="half"> <img src="./images/image3.png" width=500/> <img src="./images/image4.png" width=560/> </center> <center class="half"> <img src="./images/image5.png" width=500/> <img src="./images/image6.png" width=500/> </center> <center class="half"> <img src="./images/image7.png" width=500/> <img src="./images/image8.png" width=500/> </center>

SISR-6: Face Beauty & Acne Removal (Train dataset: here)

<center class="half"> <img src="./images/beauty0.png" width=500/> <img src="./images/beauty1.png" width=500/> </center> <center class="half"> <img src="./images/beauty2.png" width=500/> <img src="./images/beauty3.png" width=500/> </center> <center class="half"> <img src="./images/beauty4.png" width=500/> <img src="./images/beauty5.png" width=500/> </center> <center class="half"> <img src="./images/beauty6.png" width=500/> <img src="./images/beauty7.png" width=500/> </center> <hr />

:e-mail: Contact

Should you have any question, please create an issue on this repository or contact at liuxmail1220@gmail.com & liaoxiangyu1@stu.scu.edu.cn &shixiuya@stu.scu.edu.cn.

<hr />

:heart: Acknowledgement

We thank the XPixelGroup for the excellent low-level vision framework BasicSR.

<hr />

:pushpin: License

This project is released under the MIT license.

:pray: Citation

If this work is helpful for you, please consider citing:

@incollection{EIMN,
  title={Efficient Information Modulation Network for Image Super-Resolution},
  author={Liu, Xiao and Liao, Xiangyu and Shi, Xiuya and Qing, Linbo and Ren, Chao},
  booktitle={26th European Conference on Artificial Intelligence (ECAI)},
  pages={1544--1551},
  year={2023},
  publisher={IOS Press}
}