Awesome
[ECCV 2022]Ghost-free High Dynamic Range Imaging with Context-aware Transformer
By Zhen Liu<sup>1</sup>, Yinglong Wang<sup>2</sup>, Bing Zeng<sup>3</sup> and Shuaicheng Liu<sup>3,1*</sup>
<sup>1</sup>Megvii Technology, <sup>2</sup>Noah’s Ark Lab, Huawei Technologies, <sup>3</sup>University of Electronic Science and Technology of China
This is the official PyTorch implementation of our ECCV2022 paper: Ghost-free High Dynamic Range Imaging with Context-aware Transformer (HDR-Transformer). The MegEngine version is available at HDR-Transformer-MegEngine.
News
- 2022.08.26 The PyTorch implementation of our paper is now available.
- 2022.07.04 Our paper has been accepted by ECCV 2022.
Abstract
High dynamic range (HDR) deghosting algorithms aim to generate ghost-free HDR images with realistic details. Restricted by the locality of the receptive field, existing CNN-based methods are typically prone to producing ghosting artifacts and intensity distortions in the presence of large motion and severe saturation. In this paper, we propose a novel Context-Aware Vision Transformer (CA-ViT) for ghost-free high dynamic range imaging. The CA-ViT is designed as a dual-branch architecture, which can jointly capture both global and local dependencies. Specifically, the global branch employs a window-based Transformer encoder to model long-range object movements and intensity variations to solve ghosting. For the local branch, we design a local context extractor (LCE) to capture short-range image features and use the channel attention mechanism to select informative local details across the extracted features to complement the global branch. By incorporating the CA-ViT as basic components, we further build the HDR-Transformer, a hierarchical network to reconstruct high-quality ghost-free HDR images. Extensive experiments on three benchmark datasets show that our approach outperforms state-of-the-art methods qualitatively and quantitatively with considerably reduced computational budgets.
Pipeline
Illustration of the proposed CA-ViT. As shown in Fig (a), the CA-ViT is designed as a dual-branch architecture where the global branch models long-range dependency among image contexts through a multi-head Transformer encoder, and the local branch explores both intra-frame local details and inner-frame feature relationship through a local context extractor. Fig. (b) depicts the key insight of our HDR deghosting approach with CA-ViT. To remove the residual ghosting artifacts caused by large motions of the hand (marked with blue), long-range contexts (marked with red), which are required to hallucinate reasonable content in the ghosting area, are modeled by the self-attention in the global branch. Meanwhile, the well-exposed non-occluded local regions (marked with green) can be effectively extracted with convolutional layers and fused by the channel attention in the local branch.
Usage
Requirements
- Python 3.7.13
- PyTorch 1.9.0
- Torchvision 0.10.0
- CUDA 10.2 on Ubuntu 18.04
Install the require dependencies:
conda create -n hdr_transformer_pytorch python=3.7
conda activate hdr_transformer_pytorch
pip install -r requirements.txt
Dataset
- Download the dataset (include the training set and test set) from Kalantari17's dataset
- Move the dataset to
./data
and reorganize the directories as follows:
./data/Training
|--001
| |--262A0898.tif
| |--262A0899.tif
| |--262A0900.tif
| |--exposure.txt
| |--HDRImg.hdr
|--002
...
./data/Test (include 15 scenes from `EXTRA` and `PAPER`)
|--001
| |--262A2615.tif
| |--262A2616.tif
| |--262A2617.tif
| |--exposure.txt
| |--HDRImg.hdr
...
|--BarbequeDay
| |--262A2943.tif
| |--262A2944.tif
| |--262A2945.tif
| |--exposure.txt
| |--HDRImg.hdr
...
- Prepare the corpped training set by running:
cd ./dataset
python gen_crop_data.py
Training & Evaluaton
To train the model, run:
python train.py
To evaluate, we provide a script for testing with limited GPU memory, which splits the full-size images into several patches and then merges them into the final results.
python test.py --pretrained_model ./checkpoints/pretrained_model.pth --save_results --save_dir ./results/hdr_transformer
Note: The pretrained weights are obtained with the reorginized codes, in which the PSNR and the SSIM values are slightly lower and slightly higher than those values reported in our paper. Feel free to use either for comparison.
Results
Acknowledgement
Our work is inspired the following works and uses parts of their official implementations:
We thank the respective authors for open sourcing their methods.
Citation
@inproceedings{liu2022ghost,
title={Ghost-free High Dynamic Range Imaging with Context-aware Transformer},
author={Liu, Zhen and Wang, Yinglong and Zeng, Bing and Liu, Shuaicheng},
booktitle={European Conference on Computer Vision},
pages={344--360},
year={2022},
organization={Springer}
}
Contact
If you have any questions, feel free to contact Zhen Liu at liuzhen03@megvii.com.