Home

Awesome

<p align="center"> <h1 align="center"> <img width="200" src="figs/logo.png"> <br> Hierarchical Transformer <br> for Efficient Image Super-Resolution</h1> <p align="center"> <a href="https://xiangz-0.github.io/">Xiang Zhang</a><sup>1</sup> · <a href="http://yulunzhang.com/">Yulun Zhang</a><sup>2</sup> · <a href="https://www.yf.io/">Fisher Yu</a><sup>1</sup> </p> <p align="center"> <sup>1</sup>ETH Zürich &nbsp; &nbsp; <sup>2</sup>MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University </p> <h3 align="center"> ECCV 2024 - Oral </h3> <h3 align="center"><a href="https://1drv.ms/b/c/de821e161e64ce08/EVsrOr1-PFFMsXxiRHEmKeoBSH6DPkTuN2GRmEYsl9bvDQ?e=f9wGUO">[Paper]</a> | <a href="https://1drv.ms/b/c/de821e161e64ce08/EYmRy-QOjPdFsMRT_ElKQqABYzoIIfDtkt9hofZ5YY_GjQ?e=2Iapqf">[Supp]</a> | <a href="https://www.youtube.com/watch?v=9rO0pjmmjZg">[Video]</a> | <a href="https://huggingface.co/XiangZ/hit-sr">[🤗Hugging Face]</a> | <a href="https://1drv.ms/f/c/de821e161e64ce08/EuE6xW-sN-hFgkIa6J-Y8gkB9b4vDQZQ01r1ZP1lmzM0vQ?e=aIRfCQ">[Visual Results]</a> | <a href="https://1drv.ms/f/c/de821e161e64ce08/EqakXUlsculBpo79VKpEXY4B_6OQL-fGyilrzpHaNObG1A?e=YNrqHb">[Models]</a> </h3> <div align="center"></div> </p>

Abstract: Transformers have exhibited promising performance in computer vision tasks including image super-resolution (SR). However, popular transformer-based SR methods often employ window self-attention with quadratic computational complexity to window sizes, resulting in fixed small windows with limited receptive fields. In this paper, we present a general strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR), boosting SR performance with multi-scale features while maintaining an efficient design. Specifically, we first replace the commonly used fixed small windows with expanding hierarchical windows to aggregate features at different scales and establish long-range dependencies. Considering the intensive computation required for large windows, we further design a spatial-channel correlation method with linear complexity to window sizes, efficiently gathering spatial and channel information from hierarchical windows. Extensive experiments verify the effectiveness and efficiency of our HiT-SR, and our improved versions of SwinIR-Light, SwinIR-NG, and SRFormer-Light yield state-of-the-art SR results with fewer parameters, FLOPs, and faster speeds (~7x).

<!-- <p align="center"> <img width="650" src="figs/framework.png"> </p> --> <p align="center"> <img width="900" src="figs/HiT-SR.png"> </p> <!-- ![](figs/HiT-SR.png) -->

📑 Contents


🔥 News

🛠️ Setup

git clone https://github.com/XiangZ-0/HiT-SR.git
conda create -n HiTSR python=3.8
conda activate HiTSR
pip install -r requirements.txt
python setup.py develop

💿 Datasets

Training and testing sets can be downloaded as follows:

Training SetTesting SetVisual Results
DIV2K (800 training images, 100 validation images) [organized training dataset DIV2K: One Drive]Set5 + Set14 + BSD100 + Urban100 + Manga109 [complete testing dataset: One Drive]One Drive

Download training and testing datasets and put them into the corresponding folders of datasets/. See datasets for the detail of the directory structure.

🚀 Models

Method#Param. (K)FLOPs (G)DatasetPSNR (dB)SSIMModel ZooVisual Results
HiT-SIR79253.8Urban100 (x4)26.710.8045One DriveOne Drive
HiT-SNG103257.7Urban100 (x4)26.750.8053One DriveOne Drive
HiT-SRF86658.0Urban100 (x4)26.800.8069One DriveOne Drive

The output size is set to 1280x720 to compute FLOPs.

🏋 Training

🧪 Testing

Test with ground-truth images

Test without ground-truth images

📊 Results

We apply our HiT-SR approach to improve SwinIR-Light, SwinIR-NG and SRFormer-Light, corresponding to our HiT-SIR, HiT-SNG, and HiT-SRF. Compared with the original structure, our improved models achieve better SR performance while reducing computational burdens.

<p align="center"> <img width="750" src="figs/performance-comparison.png"> </p> <p align="center"> <img width="750" src="figs/efficiency-comparison.png"> </p> <p align="center"> <img width="750" src="figs/overall_improvements.png"> </p> <p align="center"> <img width="750" src="figs/convergence-comparison.png"> </p>

More detailed results can be found in the paper. All visual results of can be downloaded here.

<details> <summary>More results (click to expan)</summary> <p align="center"> <img width="900" src="figs/quantitative-comparison.png"> </p> <p align="center"> <img width="900" src="figs/LAM.png"> </p> <p align="center"> <img width="900" src="figs/Quali-main.png"> </p> <p align="center"> <img width="900" src="figs/Quali-supp1.png"> </p> <p align="center"> <img width="900" src="figs/Quali-supp2.png"> </p> </details>

📎 Citation

If you find the code helpful in your research or work, please consider citing the following paper.

@inproceedings{zhang2024hitsr,
    title={HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution},
    author={Zhang, Xiang and Zhang, Yulun and Yu, Fisher},
    booktitle={ECCV},
    year={2024}
}

🏅 Acknowledgements

This project is built on DAT, SwinIR, NGramSwin, SRFormer, and BasicSR. Special thanks to their excellent works!