Awesome

<div align="center"> <img src="assets/logo.png"/> <div align="center"> <b><font size="3">XPixel Homepage</font></b> <sup> <a href="http://xpixel.group/"> <i><font size="2">HOT</font></i> </a> </sup> </div> <div> </div> </div> <div align="center">  </div>

Introduction

X-Low-Level-Interpretation is dedicated to presenting the research efforts of XPixel in interpretating the principle of neural networks in low-level vision field. The interpretability of neural networks refers to the ability to understand and explain the decisions made by these networks, which is helpful for understanding model behaviors and improving model performance.

Full list

Evaluating the Generalization Ability of Super-Resolution Networks
Discovering Distinctive "Semantics" in Super-Resolution Networks
Interpreting Super-Resolution Networks with Local Attribution Maps
Rethinking Alignment in Video Super-Resolution Transformers
Networks are Slacking Off: Understanding Generalization Problem in Image Deraining
Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution

Papers

<a name="generalization"></a>Evaluating the Generalization Ability of Super-Resolution Networks

Research on the generalization ability of Super-Resolution (SR) networks is currently absent. We make the first attempt to propose a Generalization Assessment Index for SR networks, namely SRGA. SRGA exploits the statistical characteristics of internal features of deep networks, not output images to measure the generalization ability. To better validate our method, we collect a patch-based image evaluation set (PIES) that includes both synthetic and real-world images, covering a wide range of degradations. With SRGA and PIES dataset, we benchmark existing SR models on the generalization ability.

Authors: Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, Chao Dong
Links: :scroll:paper

<a name="semantics"></a>Discovering Distinctive "Semantics" in Super-Resolution Networks

In this paper, we make the primary attempt to answer the following fundamental questions for Image super-resolution (SR):

Can SR networks learn semantic information?
What hinders SR networks from generalizing to real-world data?

After comprehensively analyzing the feature representations (via dimensionality reduction and visualization), we successfully discover the distinctive "semantics" in SR networks. We show that a well-trained deep SR network is naturally a good descriptor of degradation information. Our experiments also reveal two key factors (adversarial learning and global residual) that influence the extraction of such semantics.

Authors: Yihao Liu, Anran Liu, Jinjin Gu, Zhipeng Zhang, Wenhao Wu, Yu Qiao, Chao Dong
Links: :scroll:paper

<a name="lam"></a>Interpreting Super-Resolution Networks with Local Attribution Maps

Image super-resolution (SR) techniques have been developing rapidly, benefiting from the invention of deep networks and its successive breakthroughs. In this paper, we propose a novel attribution approach called local attribution map (LAM), which performs attribution analysis of SR networks and aims at finding the input pixels that strongly influence the SR results. Based on LAM, we show that:

SR networks with a wider range of involved input pixels could achieve better performance.
Attention networks and non-local networks extract features from a wider range of input pixels.
Comparing with the range that actually contributes, the receptive field is large enough for most deep networks.
For SR networks, textures with regular stripes or grids are more likely to be noticed, while complex semantics are difficult to utilize.

Authors: Jinjin Gu, Chao Dong
Accepted at CVPR'21
Links: :scroll:paper :house:project

<a name="rethinking"></a>Rethinking Alignment in Video Super-Resolution Transformers

The alignment of adjacent frames is considered an essential operation in video super-resolution (VSR). Advanced VSR models are generally equipped with well-designed alignment modules. In this paper, we rethink the role of alignment in VSR Transformers and make several counter-intuitive observations. Our experiments show that: (i) VSR Transformers can directly utilize multi-frame information from unaligned videos, and (ii) existing alignment methods are sometimes harmful to VSR Transformers. Based on these observations, we propose a new and efficient alignment method called patch alignment, which aligns image patches instead of pixels. VSR Transformers equipped with patch alignment could demonstrate state-of-the-art performance on multiple benchmarks.

Authors: Shuwei Shi, Jinjin Gu, Liangbin Xie, Xintao Wang, Yujiu Yang, Chao Dong
Accepted at NIPS'22
Links: :scroll:paper :computer:code

<a name="derain"></a>Networks are Slacking Off: Understanding Generalization Problem in Image Deraining

A prevailing perspective in deep learning encourages the use of highly complex training data for overcoming the generalization problem. However, we discovered that:

This strategy does not enhance the generalization capability of deraining networks. On the contrary, it exacerbates the tendency of networks to overfit to specific degradations.
Better generalization in a deraining network can be achieved by simplifying the complexity of the training data.

This is due to the networks are learning the least complex elements to minimize training loss. When the complexity of the background image is less than that of the rain streaks, the network will prioritize the reconstruction of the background, thereby avoiding overfitting to the rain patterns and resulting in improved generalization performance.

Authors: Jinjin Gu, Xianzheng Ma, Xiangtao Kong, Yu Qiao, Chao Dong
Links: :scroll:paper

<a name="faig"></a>Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution

Recent blind super-resolution (SR) methods typically consist of two branches, one for degradation prediction and the other for conditional restoration. However, our experiments show that a one-branch network can achieve comparable performance to the two-branch scheme.

Then we wonder: how can one-branch networks automatically learn to distinguish degradations? To find the answer, we propose Filter Attribution method based on Integral Gradient (FAIG), which aims at finding the most discriminative filters for degradation removal in blind SR networks. With the discovered filters, we further develop a method to predict the degradation of an input image. Based on FAIG, we show that, in one-branch blind SR networks:

We are able to find a very small number of (1%) discriminative filters for each specific degradation.
The weights, locations and connections of the discovered filters are all important to determine the specific network function.
The task of degradation prediction can be implicitly realized by these discriminative filters without explicit supervised learning.

Our findings can not only help us better understand network behaviors inside one-branch blind SR networks, but also provide guidance on designing more efficient architectures and diagnosing networks for blind SR.

Authors: Liangbin Xie, Xintao Wang, Chao Dong, Zhongang Qi, Ying Shan
Accepted at NIPS'21 (spotlight)
Links: :scroll:paper :computer:code

License

This project is released under the Apache 2.0 license.

Projects in Open-XSource

X-Super Resolution: Algorithms in the realm of image super-resolution.
X-Image Processing: Algorithms in the realm of image restoration and enhancement.
X-Video Processing: Algorithms for processing videos.
X-Low level Interpretation: Algorithms for interpreting the principle of neural networks in low-level vision field.
X-Evaluation and Benchmark: Datasets for training or evaluating state-of-the-art algorithms.