Home

Awesome

<div align="center"> <img src="assets/logo.png"/> <div align="center"> <b><font size="3">XPixel Homepage</font></b> <sup> <a href="http://xpixel.group/"> <i><font size="2">HOT</font></i> </a> </sup> </div> <div>&nbsp;</div> </div> <div align="center"> <!-- English | [简体中文](README_zh-CN.md) --> </div>

Introduction

X-Video-Processing is dedicated to presenting the research efforts of XPixel in the realm of video processing. Video processing comprises several subtasks such as video denoising, deblurring, and super-resolution.

One of the significant challenges in video processing is the consideration of temporal information. Unlike image restoration, videos involve a temporal dimension that requires the algorithms to account for motion and changes over time.

Full list

Papers

<a name="TCVC"></a>Temporally Consistent Video Colorization with Deep Feature Propagation and Self-regularization Learning

Existing video colorization methods always suffer from severe flickering artifacts (temporal inconsistency) or unsatisfying colorization performance. We propose a novel temporally consistent video colorization framework (TCVC) to address this problem by jointly considering colorization and temporal consistency. Experiments demonstrate that our method can not only obtain visually pleasing colorized video, but also achieve clearly better temporal consistency than state-of-the-art methods.

<div align="center"> <img src="assets/tcvc.png" width="700"/> </div>

<a name="mitigating"></a>Mitigating Artifacts in Real-World Video Super-Resolution Models

The recurrent structure is a prevalent framework for the task of video super-resolution, which models the temporal dependency between frames via hidden states. When applied to real-world scenarios with unknown and complex degradations, hidden states tend to contain unpleasant artifacts and propagate them to restored frames. We propose a Hidden State Attention (HSA) module to mitigate artifacts in real-world video super-resolution. Equipped with HSA, our proposed method, namely FastRealVSR, is able to achieve 2x speedup while obtaining better performance than Real-BasicVSR.

<div align="center"> <img src="assets/mitigating.png" height="400"/> </div>

<a name="basicvsr"></a>BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond

Complex designs are not uncommon in video super-resolution (VSR) approaches as they need to exploit the additional temporal dimension. In this study, we wish to untangle the knots and reconsider some most essential components for VSR. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality.

<div align="center"> <img src="assets/basicvsr.png" width="700"/> </div>

<a name="understanding"></a>Understanding Deformable Alignment in Video Super-Resolution

Deformable convolution has recently shown compelling performance in aligning multiple frames. However, its underlying mechanism for alignment remains unclear. In this study, we show that deformable convolution can be decomposed into a combination of spatial warping and convolution and the increased diversity in deformable alignment significantly improves the quality of video super-resolution output. We further propose an offset-fidelity loss that guides the offset learning with optical flow. Experiments show that our loss successfully avoids the overflow of offsets and alleviates the instability problem of deformable alignment.

<div align="center"> <img src="assets/understanding.png" width="700"/> </div>

<a name="enhanced"></a>Enhanced Quadratic Video Interpolation

Recently, an algorithm named quadratic video interpolation (QVI) achieves appealing performance for video frame interpolation. However, its produced intermediate frames still contain some unsatisfactory artifacts, especially when large and complex motion occurs. In this work, we propose an enhanced quadratic video interpolation (EQVI) model, which won the first place in the AIM2020 Video Temporal Super-Resolution Challenge.

<div align="center"> <img src="assets/eqvi.png" width="700"/> </div>

<a name="edvr"></a>EDVR: Video Restoration With Enhanced Deformable Convolutional Networks

A challenging benchmark named REDS for video restoration is released in the NTIRE19 Challenge. This benchmark challenges existing methods from two aspects: (1) how to align multiple frames given large motions, and (2) how to effectively fuse different frames with diverse motion and blur. In this work, we propose a novel Video Restoration framework with Enhanced Deformable networks, termed EDVR, to address these challenges. Our EDVR wins the champions and outperforms the second place by a large margin. EDVR also demonstrates superior performance to state-of-the-art published methods on video super-resolution and deblurring.

<div align="center"> <img src="assets/edvr.png" width="700"/> </div>

<a name="rethinking"></a>Rethinking Alignment in Video Super-Resolution Transformers

The alignment of adjacent frames is considered an essential operation in video super-resolution (VSR). Advanced VSR models are generally equipped with well-designed alignment modules. In this paper, we rethink the role of alignment in VSR Transformers and make several counter-intuitive observations. Our experiments show that: (i) VSR Transformers can directly utilize multi-frame information from unaligned videos, and (ii) existing alignment methods are sometimes harmful to VSR Transformers. Based on these observations, we propose a new and efficient alignment method called patch alignment, which aligns image patches instead of pixels. VSR Transformers equipped with patch alignment could demonstrate state-of-the-art performance on multiple benchmarks.

<div align="center"> <img src="assets/rethinking.png" width="700"/> </div>

License

This project is released under the Apache 2.0 license.

Projects in Open-XSource