Awesome
[ECCV 2022]Ghost-free High Dynamic Range Imaging with Context-aware Transformer
By Zhen Liu<sup>1</sup>, Yinglong Wang<sup>2</sup>, Bing Zeng<sup>3</sup> and Shuaicheng Liu<sup>3,1*</sup>
<sup>1</sup>Megvii Technology, <sup>2</sup>Noah’s Ark Lab, Huawei Technologies, <sup>3</sup>University of Electronic Science and Technology of China
This is the official MegEngine implementation of our ECCV2022 paper: Ghost-free High Dynamic Range Imaging with Context-aware Transformer (HDR-Transformer). The PyTorch version is available at HDR-Transformer-PyTorch.
News
- 2022.08.26 The PyTorch implementation is now avaible.
- 2022.08.11 The arXiv version of our paper is now available.
- 2022.07.19 The source code is now available.
- 2022.07.04 Our paper has been accepted by ECCV 2022.
Abstract
High dynamic range (HDR) deghosting algorithms aim to generate ghost-free HDR images with realistic details. Restricted by the locality of the receptive field, existing CNN-based methods are typically prone to producing ghosting artifacts and intensity distortions in the presence of large motion and severe saturation. In this paper, we propose a novel Context-Aware Vision Transformer (CA-ViT) for ghost-free high dynamic range imaging. The CA-ViT is designed as a dual-branch architecture, which can jointly capture both global and local dependencies. Specifically, the global branch employs a window-based Transformer encoder to model long-range object movements and intensity variations to solve ghosting. For the local branch, we design a local context extractor (LCE) to capture short-range image features and use the channel attention mechanism to select informative local details across the extracted features to complement the global branch. By incorporating the CA-ViT as basic components, we further build the HDR-Transformer, a hierarchical network to reconstruct high-quality ghost-free HDR images. Extensive experiments on three benchmark datasets show that our approach outperforms state-of-the-art methods qualitatively and quantitatively with considerably reduced computational budgets.
Pipeline
Illustration of the proposed CA-ViT. As shown in Fig (a), the CA-ViT is designed as a dual-branch architecture where the global branch models long-range dependency among image contexts through a multi-head Transformer encoder, and the local branch explores both intra-frame local details and inner-frame feature relationship through a local context extractor. Fig. (b) depicts the key insight of our HDR deghosting approach with CA-ViT. To remove the residual ghosting artifacts caused by large motions of the hand (marked with blue), long-range contexts (marked with red), which are required to hallucinate reasonable content in the ghosting area, are modeled by the self-attention in the global branch. Meanwhile, the well-exposed non-occluded local regions (marked with green) can be effectively extracted with convolutional layers and fused by the channel attention in the local branch.
Usage
Requirements
- Python 3.7.0
- MegEngine 1.8.3+
- CUDA 10.0 on Ubuntu 18.04
Install the require dependencies:
conda create -n hdr_transformer python=3.7
conda activate hdr_transformer
pip install -r requirements.txt
Dataset
- Download the dataset (include the training set and test set) from Kalantari17's dataset
- Move the dataset to
./data
and reorganize the directories as follows:
./data/Training
|--001
| |--262A0898.tif
| |--262A0899.tif
| |--262A0900.tif
| |--exposure.txt
| |--HDRImg.hdr
|--002
...
./data/Test (include 15 scenes from `EXTRA` and `PAPER`)
|--001
| |--262A2615.tif
| |--262A2616.tif
| |--262A2617.tif
| |--exposure.txt
| |--HDRImg.hdr
...
|--BarbequeDay
| |--262A2943.tif
| |--262A2944.tif
| |--262A2945.tif
| |--exposure.txt
| |--HDRImg.hdr
...
- Prepare the corpped training set by running:
cd ./dataset
python gen_crop_data.py
Training & Evaluaton
cd HDR-Transformer
To train the model, run:
python train.py --model_dir experiments
To evaluate, run:
python evaluate.py --model_dir experiments --restore_file experiments/val_model_best.pth
Results
Acknowledgement
The MegEngine version of the Swin-Transformer is based on Swin-Transformer-MegEngine. Our work is inspired the following works and uses parts of their official implementations:
We thank the respective authors for open sourcing their methods.
Citation
@inproceedings{liu2022ghost,
title={Ghost-free High Dynamic Range Imaging with Context-aware Transformer},
author={Liu, Zhen and Wang, Yinglong and Zeng, Bing and Liu, Shuaicheng},
booktitle={European Conference on Computer Vision},
pages={344--360},
year={2022},
organization={Springer}
}
Contact
If you have any questions, feel free to contact Zhen Liu at liuzhen03@megvii.com.