Home

Awesome

Toward Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real Data

This repository contains the evaluation protocols for the Super-Resolution Erlangen (SupER) benchmark reported in our paper:

Köhler, T., Bätz, M., Naderi, F., Kaup, A., Maier, A., & Riess, C. (2019). "Toward Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real Data.", In: IEEE Transactions on Pattern Analysis and Machine Intelligence (to appear) [PDF]

All datasets and the results from the benchmark can be downloaded from our project webpage. The images are all-real acquisitions collected with a Basler acA2000-50gm CMOS camera equipped with a f/1.8, 16mm fixed-focus lens. We benchmarked various single-image and multi-frame super-resolution (SR) methods on this database.

<img src="figures/flowchart.png">

Getting started

The structure for the benchmark is as follows:

You can use our evaluation framework as follows:

The data folder already contains the configurations for all datasets and evaluations (names of the processed sequences and coordinates of the image patches) stored as MAT-files. In case you leave results empty, you can compute all results from our data using our evaluation protocols.

Algorithms

Our repository contains source codes of the following algorithms grouped into single-image and multi-frame SR. Provided third party codes are shared with permission of the authors.

IDMethodReferenceSingle-imageMulti-frame
1EBSRKim and Kwon, Single-image super-resolution using sparse regression and natural image priorx
2ScSRYang et al., Image super-resolution via sparse representationx
3NBSRFSalvador and Perez-Pellitero, Naive Bayes super-resolution forestx
4VSRNetKappeler et al., Video super-resolution with convolutional neural networksx
5NUISRPark et al., Super-resolution image reconstruction: a technical overviewx
6WNUISRBätz et al., Multi-image super-resolution using a dual weighting scheme based on Voronoi tessellationx
7HYSRBätz et al., Hybrid super-resolution combining example-based single-image and interpolation-based multi-image reconstruction approachesx
8DBRSRBätz et al., Multi-image super-resolution using a locally adaptive denoising-based refinementx
9SRBMa et al., Handling motion blur in multi-frame super-resolutionx
10L1BTVFarsiu et al., Fast and robust multiframe super resolutionx
11IRWSRKöhler et al., Robust multi-frame super-resolution employing iteratively re-weighted minimizationx
12NNNearest-neighbour interpolationx
13BICUBICBicubic interpolationx
14BVSRLiu et al., On Bayesian adaptive video super resolutionx
15SRCNNDong et al., Learning a deep convolutional network for image super-resolutionx
16BEPSRZeng and Yang, A robust multiframe super-resolution algorithm based on half-quadratic estimation with modified BTV regularizationx
17SESRHuang et al., Single image super-resolution from transformed self-exemplarsx
18DRCNKim et al., Deeply-recursive convolutional network for image super-Resolutionx
19VDSRKim et al., Accurate image super-resolution using very deep convolutional networksx
20A+Timofte et al., A+: Adjusted anchored neighborhood regression for fast super-resolutionx

For benchmarking the SR algorithms, we use the following full-reference and no-reference image quality measures:

IDMethodReferenceFull-referenceNo-reference
1PSNRPeak-signal-to-noise ratiox
2SSIMStructural similarity indexx
3M-SSIMMulti-scale structural similarity indexx
4IFCInformation fidelity criterionx
5S3Spectral & spatial measure of local perceived sharpnessx
6BRISQUEBlind/Referenceless image spatial quality evaluatorx
7SSEQSpatial/spectral entropy-based quality indexx
8NIQENaturalness image quality evaluatorx
9SRMNo-Reference quality metric for single-image super-resolutionx
10LPIPSLearned perceptual image patch similarityx

Evaluation protocols for quantitative study

All evaluation scripts can be found in `matlab/quantitativeStudy``. In its current version, the benchmark is divided into six types of experiments performed on different datasets:

Specifying the evaluation parameters

The initEvaluationParametersFor<DATASET> define the parameters used for our benchmark, where <DATASET> denotes one of the aforementioned experiments (baseline, photometric, compression, sequence length, computation time, simulated data). The parameters comprise:

You can modify the evaluation parameters by changing these script. For instance, for the baseline evaluation the parameters in initEvaluationParametersForBaseline are set as follows:

% Range of binning factors used for this evaluation.
binningFactors = [2, 3, 4];
% Range of compression levels (index 1 means uncoded; indices 2-5 mean 
% coded with the corresponding QP setting using H.265/HEVC coder.
compressions   = [NaN, 10, 20, 30, 40];
% Range of number of frames for SR at the different binning factors.
numberOfFrames = [5, ...    % Number of frames for binning factor 2
                  11, ...   % Number of frames for binning factor 3
                  17];      % Number of frames for binning factor 4
% Index of the first reference frame for sliding window processing.
startReferenceFrame = 9;

% ----- Settings considered for this evaluation -----
% This is the index of the SR method that is evaluated.
sr_method           = 0:length(SRMethods);
% This is the index of the binning factor that is evaluated.
binning_val         = 1:3;
% This is the index of the compression setting.
compress_val        = 1;
% This is the index that indicates the number of input frames.
numberOfFrames_val  = 1;
% This is the index of the scene according to datasets.mat.
scenes_val          = 1:127;
% This is the index of the sliding window that is evaluated.
sliding_val         = 1:10;

By default, all parameters are set to reproduce the results reported in our paper and all available SR methods are included. You can modify the parameters to add new scenes/sequences or to include new algorithms (see below).

Running the evaluations

The evaluate<DATASET> scripts implement our different evaluation protocols. Run these scripts to process the given input data and to compute the SR images using the different algorithms. In case of the baseline experiments, the benchmark is performed by setting the input data, the result directory as well as all evaluation parameters as follows:

% Path to input data and result directory.
inputDir = '../../data';
datasetConfig = 'baselineDatasets';
resultDir = '../../results/baselineDatasets';

% Parameters for this evaluation.
initEvaluationParametersForBaseline;

%% Start evaluation.
processDatasets(inputDir, datasetConfig, resultDir, 
	binningFactors, numberOfFrames, startReferenceFrame, 
	numberOfFrames_val, sliding_val, sr_method, binning_val, scenes_val, 
	compressions(compress_val));

You can modify the path to the input and the results if desired. processDatasets defines the actual processing pipeline.

Once the SR result images are computed, the evaluateQualityMeasuresFor<DATASET> can be used to calculate the different image quality measures from these images. For the baseline experiments, the quality measures can be calculated as follows:

% The quality measures that are calculated.
measure_val = 1:length(qualityMeasures);

% Size of the border in pixels that is cropped from super-resolved image to 
% evaluate the quality measures without distortion by boundary artifacts.
border = 3;

%% Start evaluation.
calculateQualityMeasures(inputDir, datasetConfig, resultDir, 
	binningFactors, numberOfFrames, numberOfFrames_val, sliding_val, sr_method,
	binning_val, scenes_val, compressions(compress_val), measure_val, border);

By default, the script calculates all measures (measure_val = 1:length(qualityMeasures)) cropping a three-pixel boundary from the SR images as described in the paper. You can also limit the evaluation to specific measures by changing measure_val.

Handling of the results

All results (SR images and quality measures) are stored in the results folder. These items are sorted by the different datasets, scenes, and image compression levels with the directory structure <DATASET>/<SCENE>/<COMPRESSION_LEVEL>. The data is organized to facilitate a parallel execution of our scripts. There is one MAT-file for each result image and quality measure such that all items can be computed independently from each other.

The SR images for a certain dataset, scene and compression level are stored in the mat subfolders with the naming convention:

<MOTION_TYPE>_<BINNING_FACTOR>_sr<METHOD_INDEX>_f<NUMBER_OF_FRAMES>_win<SLIDING_WINDOW_INDEX>.

The corresponding quality measures are stored in quality_qm<MEASURE_INDEX> subfolders with the naming convention:

<MOTION_TYPE>_<BINNING_FACTOR>_sr<METHOD_INDEX>_f<NUMBER_OF_FRAMES>_win<SLIDING_WINDOW_INDEX>_qm<MEASURE_INDEX>. 

You can call the getQualityMeasuresFromResults function in visualization to import specific benchmark results. For instance, to filter for the results from the global motion with translation in z-direction datasets without video coding considering all scenes, SR methods, binning factors and sliding windows, you can run:

inputDir = '../../data';
resultDir = '../../results';

motion = {'global_translation_z_inlier'};
compression = {'Uncoded'};
binningFactors = [2, 3, 4];
numberOfFrames = [5, 11, 17];
srMethods = 1:length(SRMethods);
slidingWindows = 1:10;

% Get SR results with quality measures.
srResults = getQualityMeasuresFromResults([resultDir, '/baselineDatasets'], {},
	motion, compression, binningFactors, srMethods, numberOfFrames, slidingWindows);

The srResults structure comprises the quality measures for all imported datasets. You can plot these results using our visualization scripts (see below).

How to add new data

The input data for the benchmark is described in the data folder. For each evaluation, the respective input data is defined in a MAT-file with the following attributes:

To extend the benchmark, you can add new data in these MAT-files.

How to add new algorithms

The available SR algorithms are listed in algorithms/SRAlgorithms/SRMethods.m. Including a new algorithm can be done as follows:

The parameters required by each algorithm are 1) the magnification factor, and 2) a sliding window structure including the input frames as well as a reference frame. In case of multi-frame SR, all input frames are registered towards the reference using optical flow. In case of single-image SR, only the reference frame is processed (see our paper for details).

How to add new image quality measures

The image quality measures are listed in algorithms/qualityMeasures/qualityMeasures.m. Including a new measure can be done as follows:

Each quality measure takes a single SR image and the ground truth as inputs. In case of a no-reference measure, only the SR image is used while the ground truth is ignored.

Evaluation protocols for human observer study

All evaluation protocols of the human observer study are included in observerStudy. The study was performed for the uncoded global motion, local motion, mixed motion, and photometric outlier datasets with the binning factors 2, 3, and 4. All results are provided in results/observerStudy/voting.mat. The voting comprises a list of force-choice pair-wise comparisons. Each entry comprises 50 image pairs (SR images computed by two algorithms) and corresponds to the voting obtained from one observer. A single pair has the following structure:

Fit Bradley-Terry model

To fit the Bradley-Terry (BT) model from the observer study, run the evaluateBTScores script. This calculates BT scores from force-choice pair-wise comparisons (voting.mat) for each scene, motion type, environmental condition, and binning factor. You can specify the maximum number of sanity checks an observer is allowed to fail (default: 1). Observers that fail more sanity checks are rejected. All remaining observers are used to fit the BT model. The calculated BT scores with meta information (name of the scene, motion and environment conditions, binning factor) are stored in results/observerStudy/btModel.mat.

The correlation analysis to our quantitative study can be performed by running evaluateQualityMeasureCorrelations. This uses the precomputed BT model and computes the weighted Kendall distance between these scores and the different image quality measures. Results are stored in results/observerStudy/qualityMeasureCorrelation.mat.

The trade-off between perceptual image quality and SR computation time can be evaluated by running evaluateQualityTimeTradeoff. This relates the mean BT score achieved by the different SR algorithms to their respective computation times. Results are stored in results/observerStudy/qualityTimeTradeoff.mat.

Inter-observer variances, convergence, and correlation analysis

To analyze inter-observer variances from the observer study, run evaluateCoefficientOfAgreement. This calculates the Kendall coefficient of agreement from the BT model for all binning factors and different environmental conditions. Results are stored in results/observerStudy/coefficientsOfAgreement.mat.

To evaluate convergence, run evaluateObserverStudyConvergence. This evaluates inter-observer variances by randomly sampling subsets of the votes from all force-choice pair-wise comparisons (voting.mat). The evaluation is performed for different numbers of sanity checks that are allowed to fail by an observer. Results are saved in results/observerStudy/observerStudyConvergence.mat`.

Visualization

We provide different scripts to visualize the benchmark results in visualization. All results should be precomputed as described above before running the visualization.

To plot all results from our quantitative study run plotQualityMeasures. This reproduces Figures 5 - Figure 11 from our paper. You can modify the visualization by adding own SR algorithms or image quality measures.

To plot the results from the human observer study run plotObserverStudy. This reproduces Figure 12 - Figure 16 from the paper.

Citation

If you use our datasets and source codes for your research, please cite our paper:

@article{Koehler2019,
    author  = {K{\"o}hler, Thomas and B{\"a}tz, Michel and Naderi, Farzad and Kaup, Andr{\'e} and Maier, Andreas and Riess, Christian},
    title   = {Toward Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real Data}, 
    journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
    year    = {2019},
    volume  = {}, 
    number  = {}, 
    pages   = {1-1},
    doi     = {10.1109/TPAMI.2019.2917037}, 
    ISSN    = {0162-8828}
}

Acknowledgements

We would like to thank all authors who contributed source codes for SR algorithms or image quality measures to this evaluation framework.