Home

Awesome

This repository contains the implementation of the following paper:

SRDiff: Single image super-resolution with diffusion probabilistic models

Get Started

Environment Installation

python -m venv --system-site-packages venv
source venv/bin/activate
pip install -r requirements.txt

Dataset Preparation

  1. To download DIV2K, Flickr2K and CelebA, please refer to https://github.com/andreas128/SRFlow

  2. Please put the downloaded DIV2K, Flickr2K and CelebA dataset in
    data/raw/DIV2K, data/raw/Flickr2K, data/raw/CelebA, respectively

  3. Pack to pickle for training

# pack the CelebA dataset
python data_gen/celeb_a.py --config configs/celeb_a.yaml 
# pack the DIV2K dataset
python data_gen/df2k.py --config configs/df2k4x.yaml

Pretrained Model

Please go to https://github.com/LeiaLi/SRDiff/releases/tag/v1.0.0 to download the pretrained models.

Train & Evaluate

  1. Prepare datasets. Please refer to Dataset Preparation.
  2. Modify config files.
    • CelebA
      • rrdb: configs/rrdb/celeb_a_pretrain.yaml
      • srdiff: configs/celeb_a.yaml
    • DIV2K
      • rrdb: configs/rrdb/df2k4x_pretrain.yaml
      • srdiff: configs/diffsr_df2k4x.yaml
  3. Run training / evaluation code. The code is for training on 1 GPU.

CelebA

# train rrdb-based conditional net
CUDA_VISIBLE_DEVICES=0 python tasks/trainer.py --config configs/rrdb/celeb_a_pretrain.yaml --exp_name rrdb_celebA_1 --reset
# train srdiff
CUDA_VISIBLE_DEVICES=0 python tasks/trainer.py --config configs/diffsr_celeb.yaml --exp_name diffsr_celebA_1 --reset --hparams="rrdb_ckpt=checkpoints/rrdb_celebA_1"

# tensorboard
tensorboard --logdir checkpoints/diffsr_celebA_1

# evaluate
CUDA_VISIBLE_DEVICES=0 python tasks/trainer.py --config configs/diffsr_celeb.yaml --exp_name diffsr_celebA_1 --infer

# evaluate with pretrained model
CUDA_VISIBLE_DEVICES=0 python tasks/trainer.py --config configs/diffsr_celeb.yaml --exp_name srdiff_pretrained_celebA --infer

DIV2K

# train rrdb-based conditional net
CUDA_VISIBLE_DEVICES=0 python tasks/trainer.py --config configs/rrdb/df2k4x_pretrain.yaml --exp_name rrdb_div2k_1 --reset
# train srdiff
CUDA_VISIBLE_DEVICES=0 python tasks/trainer.py --config configs/diffsr_df2k4x.yaml --exp_name diffsr_div2k_1 --reset --hparams="rrdb_ckpt=checkpoints/rrdb_div2k_1"

# tensorboard
tensorboard --logdir checkpoints/diffsr_div2k_1

# evaluate
CUDA_VISIBLE_DEVICES=0 python tasks/trainer.py --config configs/diffsr_df2k4x.yaml --exp_name diffsr_div2k_1 --infer

# evaluate with pretrained model
CUDA_VISIBLE_DEVICES=0 python tasks/trainer.py --config configs/diffsr_df2k4x.yaml --exp_name srdiff_pretrained_div2k --infer

Results

TaskPSNRSSIMLPIPSLR_PSNRTotal_loss
diffsr_celeb25.4540.7460.10653.0940.106
diffsr_div2k27.1600.7860.12953.6750.129

Citation

@article{LI202247,
title = {SRDiff: Single image super-resolution with diffusion probabilistic models},
journal = {Neurocomputing},
volume = {479},
pages = {47-59},
year = {2022},
issn = {0925-2312},
doi = {https://doi.org/10.1016/j.neucom.2022.01.029},
url = {https://www.sciencedirect.com/science/article/pii/S0925231222000522},
author = {Haoying Li and Yifan Yang and Meng Chang and Shiqi Chen and Huajun Feng and Zhihai Xu and Qi Li and Yueting Chen},
keywords = {Single image super-resolution, Diffusion probabilistic model, Diverse results, Deep learning},
abstract = {Single image super-resolution (SISR) aims to reconstruct high-resolution (HR) images from given low-resolution (LR) images. It is an ill-posed problem because one LR image corresponds to multiple HR images. Recently, learning-based SISR methods have greatly outperformed traditional methods. However, PSNR-oriented, GAN-driven and flow-based methods suffer from over-smoothing, mode collapse and large model footprint issues, respectively. To solve these problems, we propose a novel SISR diffusion probabilistic model (SRDiff), which is the first diffusion-based model for SISR. SRDiff is optimized with a variant of the variational bound on the data likelihood. Through a Markov chain, it can provide diverse and realistic super-resolution (SR) predictions by gradually transforming Gaussian noise into a super-resolution image conditioned on an LR input. In addition, we introduce residual prediction to the whole framework to speed up model convergence. Our extensive experiments on facial and general benchmarks (CelebA and DIV2K datasets) show that (1) SRDiff can generate diverse SR results with rich details and achieve competitive performance against other state-of-the-art methods, when given only one LR input; (2) SRDiff is easy to train with a small footprint(The word “footprint” in this paper represents “model size” (number of model parameters).); (3) SRDiff can perform flexible image manipulation operations, including latent space interpolation and content fusion.}
}