Awesome
Feature Modulation Transformer: Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution
Ao Li, Le Zhang, Yun Liu and Ce Zhu, "Feature Modulation Transformer: Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution", ICCV, 2023
Abstract: Transformer-based methods have exhibited remarkable potential in single image super-resolution (SISR) by effectively extracting long-range dependencies. However, most of the current research in this area has prioritized the design of transformer blocks to capture global information, while overlooking the importance of incorporating high-frequency priors, which we believe could be beneficial. In our study, we conducted a series of experiments and found that transformer structures are more adept at capturing low-frequency information, but have limited capacity in constructing high-frequency representations when compared to their convolutional counterparts. Our proposed solution, the cross-refinement adaptive feature modulation transformer (CRAFT), integrates the strengths of both convolutional and transformer structures. It comprises three key components: the high-frequency enhancement residual block (HFERB) for extracting high-frequency information, the shift rectangle window attention block (SRWAB) for capturing global information, and the hybrid fusion block (HFB) for refining the global representation. Our experiments on multiple datasets demonstrate that CRAFT outperforms state-of-the-art methods by up to 0.29dB while using fewer parameters.
<p align="center"> <img width="700" src="figs/CRAFT.png"> </p>
HR | LR | SwinIR | ESRT | CRAFT (ours) |
---|---|---|---|---|
<img src="figs/img012_HR.png" height=80> | <img src="figs/img012_LR.png" height=80> | <img src="figs/img012_SWINIR.png" height=80> | <img src="figs/img012_ESRT.png" height=80> | <img src="figs/img012_CRAFT.png" height=80> |
<img src="figs/YumeiroCooking_HR.png" height=80> | <img src="figs/YumeiroCooking_LR.png" height=80> | <img src="figs/YumeiroCooking_SWINIR.png" height=80> | <img src="figs/YumeiroCooking_ESRT.png" height=80> | <img src="figs/YumeiroCooking_CRAFT.png" height=80> |
Dependencies & Installation
- Python 3.7
- PyTorch 1.10.2
- NVIDIA GPU + CUDA 11.7
# Clone the github repo and go to the default directory 'CRAFT'.
git clone https://github.com/AVC2-UESTC/CRAFT-SR.git
conda create -n CRAFT python=3.7
conda activate CRAFT
pip install -r requirements.txt
python setup.py develop
Training
Train with DIV2K
-
Download train datasets, place them in
datasets/
. -
Run the following scripts.
# train with 4 GPUs # X2 bash scripts/dist_train.sh 4 options/train/CRAFT/train_CRAFT_SRx2_scratch.yml # X3 bash scripts/dist_train.sh 4 options/train/CRAFT/train_CRAFT_SRx3_scratch.yml # X4 bash scripts/dist_train.sh 4 options/train/CRAFT/train_CRAFT_SRx4_scratch.yml
Testing
Test images with HR
-
Download the pre-trained models and place them in
experiments/pretrained_models/
.We provide pre-trained models for image SR: CRAFT_MODEL_X2, CRAFT_MODEL_X3, and CRAFT_MODEL_X4.
-
Download test datasets, place them in
datasets/benchmark
. -
Run the following scripts.
# test Set5 # X2 python inference/inference_CRAFT.py --scale 2 --model_path experiments/pretrained_models/CRAFT_MODEL_X2.pth --folder_lq datasets/benchmark/Set5/LR_bicubic/X2 --input datasets/benchmark/Set5/HR --output results/CRAFT/Set5/X2 # X3 python inference/inference_CRAFT.py --scale 3 --model_path experiments/pretrained_models/CRAFT_MODEL_X3.pth --folder_lq datasets/benchmark/Set5/LR_bicubic/X3 --input datasets/benchmark/Set5/HR --output results/CRAFT/Set5/X3 # X4 python inference/inference_CRAFT.py --scale 4 --model_path experiments/pretrained_models/CRAFT_MODEL_X4.pth --folder_lq datasets/benchmark/Set5/LR_bicubic/X4 --input datasets/benchmark/Set5/HR --output results/CRAFT/Set5/X4
-
The output is in
results/
.
Results
Model | #Parameters | Set5 | Set14 | BSD100 | Urban100 | Manga109 |
---|---|---|---|---|---|---|
CRAFT-X2 | 737K | 38.23/0.9615 | 33.92/0.9211 | 32.33/0.9016 | 32.86/0.9343 | 39.39/0.9786 |
CRAFT-X3 | 744K | 34.71/0.9295 | 30.61/0.8469 | 29.24/0.8093 | 28.77/0.8635 | 34.29/0.9491 |
CRAFT-X4 | 753K | 32.52/0.8989 | 28.85/0.7872 | 27.72/0.7418 | 26.56/0.7995 | 31.18/0.9168 |
Citation
If you find the code helpful in your research or work, please cite the following paper(s).
@inproceedings{li2023craft,
title={Feature Modulation Transformer: Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution},
author={Li, Ao and Zhang, Le and Liu, Yun and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={12514--12524},
year={2023}
}