Home

Awesome

Lightweight Video Denoising using Aggregated Shifted Window Attention

IEEE/CVF Winter Conference on Applications of Computer Vision - WACV 2023 | PDF |

State-of-the-art attention-based denoising methods typically yield good results, but require vast amounts of GPU memory and usually suffer from very long computation times. Especially in the field of restoring high-resolution videos, these techniques are not practical. To overcome these issues, we introduce a lightweight video denoising network that combines efficient axial-coronal-sagittal (ACS) convolutions with a novel shifted window attention formulation (ASwin), which is based on the memory-efficient aggregation of self- and cross-attention across video frames. Our model can be used for general-purpose blind denoising of high-resolution real-world videos, due to being trained on realistic clean-noisy video pairs, generated by an authentic noise synthesis pipeline.

<div align="center"> <img , src="figures/denoise_result.png" width="800px"/> </div> <br />

Trained Weights and Test Data

After cloning this repository, follow these steps:

Requirements

Testing the Blind-Denoising Model for Real-World Noisy Videos

To test the model's performance under real-world noise conditions, run test_real_noise.py as shown below. If you run out of GPU-memory, try to either reduce --num_frame_testing or to set --patch_img to True, at the expense of slightly decreased denoising performance.

Command-line Arguments:

Usage:

Run the following command in your terminal, the denoised video frames will be saved in the folder 'results'.

python test_real_noise.py --num_frame_testing 24 --num_frame_overlapping 2 --file_list ./data/test_real_noise/files.csv

Testing the Non-Blind Gaussian Denoising Model

To test the model's performance for additive Gaussian noise, run test_gauss_noise.py as shown below. If you run out of GPU-memory, try to either reduce --num_frame_testing or to set --patch_img to True, at the expense of slightly decreased denoising performance.

Command-line Arguments:

Usage:

Run the following command in your terminal, the denoised video frames will be saved in the folder 'results'.

python test_gauss_noise.py --noise_sigma 20 --num_frame_testing 24 --num_frame_overlapping 2 --file_list ./data/davis2017/files.csv

Blind Denoising Results for historic Real-world Videos

Denoising digitized analog videos is even more challenging than other real-world denoising tasks, due to a high spatial correlation of noise induced by the physical structure of analog film and additional digital noise caused by the digitizing process. To evaluate the different methods, we used 10 high-resolution sequences of real digitized analog film footage, exhibiting different unknown noise types of varying strength. We compare our approach to the state-of-the-art real-world denoising methods MF2F and UDVD, as well as two commercial denoising tools for high-end video restoration, namely NeatVideo and DarkEnergy.

<div align="center"> <img src="figures/realnoise.png" width="800px"/> </div> <br />

Since the actual ground truth is not available for real-world noisy videos, standard quality assessment metrics, such as PSNR, cannot be computed. Therefore we perform a No-Reference Image Quality Assessment (NR-IQA) on the denoised real videos for a quantitative comparison. We use the state-of-the-art NR-IQA metric MUSIQ, which is computed by a multi-scale image quality transformer.

noisyUDVDMF2FDarkEnergyNeatVideoours
MUSIQ score25.1125.7735.2931.0533.1438.16

Non-blind Denoising Results for additive Gaussian Noise

Although not specifically designed for additive Gaussian denoising, we evaluate our approach on two commonly used data sets for synthetic denoising: Set8 and DAVIS2017. To obtain a quantitative comparison to state-of-the-art methods, we evaluate the denoising performance in terms of PSNR. Our model yields results close to the state-of-the-art method VRT and consistently outperforms all other competing methods. When considering both denoising performance and runtime, we can observe that the better performance of VRT comes with a significantly increased runtime - VRT is ~20 times slower than our model.

<img src="figures/snowboard.png" width="1000px"/>

Gaussian denoising results on DAVIS test set

VBM4DVNLBDVDnetVNLnetFastDVDPaCNetVRTours
σ=1037.5838.8538.1335.8338.7139.9740.8240.15
σ=2033.8835.6835.7034.4935.7736.8238.1537.12
σ=3031.6533.7334.0832.8634.0434.7936.5235.37
σ=4030.0532.3232.8632.3232.8233.3435.3234.13
σ=5028.8031.1331.9031.4331.8632.2034.3633.17

Gaussian denoising results on Set8 test set

VBM4DVNLBDVDnetVNLnetFastDVDPaCNetVRTours
σ=1036.0537.2636.0837.1036.4437.0637.8836.99
σ=2032.1933.7233.4933.8833.4333.9435.0234.06
σ=3030.0031.7431.6831.5931.6832.0533.3532.41
σ=4028.4830.3930.4630.5530.4630.7032.1531.22
σ=5027.3329.2429.5329.4729.5329.6631.2230.31

Runtime comparison

VBM4DVNLBDVDnetVNLnetFastDVDPaCNetVRTours
runtime (s)156.0420.04.911.870.0824.647.860.37

License

Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC-4.0)

Citation

If you use this code for your research, please cite our paper:

@InProceedings{Lindner_2023_WACV,
    author    = {Lindner, Lydia and Effland, Alexander and Ilic, Filip and Pock, Thomas and Kobler, Erich},
    title     = {Lightweight Video Denoising Using Aggregated Shifted Window Attention},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2023},
    pages     = {351-360}
}