Awesome

[ECCV 2022]Ghost-free High Dynamic Range Imaging with Context-aware Transformer

By Zhen Liu1, Yinglong Wang2, Bing Zeng3 and Shuaicheng Liu3,1*

1Megvii Technology, 2Noah’s Ark Lab, Huawei Technologies, 3University of Electronic Science and Technology of China

This is the official MegEngine implementation of our ECCV2022 paper: Ghost-free High Dynamic Range Imaging with Context-aware Transformer (HDR-Transformer). The PyTorch version is available at HDR-Transformer-PyTorch.

News

2022.08.26 The PyTorch implementation is now avaible.
2022.08.11 The arXiv version of our paper is now available.
2022.07.19 The source code is now available.
2022.07.04 Our paper has been accepted by ECCV 2022.

Abstract

High dynamic range (HDR) deghosting algorithms aim to generate ghost-free HDR images with realistic details. Restricted by the locality of the receptive field, existing CNN-based methods are typically prone to producing ghosting artifacts and intensity distortions in the presence of large motion and severe saturation. In this paper, we propose a novel Context-Aware Vision Transformer (CA-ViT) for ghost-free high dynamic range imaging. The CA-ViT is designed as a dual-branch architecture, which can jointly capture both global and local dependencies. Specifically, the global branch employs a window-based Transformer encoder to model long-range object movements and intensity variations to solve ghosting. For the local branch, we design a local context extractor (LCE) to capture short-range image features and use the channel attention mechanism to select informative local details across the extracted features to complement the global branch. By incorporating the CA-ViT as basic components, we further build the HDR-Transformer, a hierarchical network to reconstruct high-quality ghost-free HDR images. Extensive experiments on three benchmark datasets show that our approach outperforms state-of-the-art methods qualitatively and quantitatively with considerably reduced computational budgets.

Pipeline

pipeline Illustration of the proposed CA-ViT. As shown in Fig (a), the CA-ViT is designed as a dual-branch architecture where the global branch models long-range dependency among image contexts through a multi-head Transformer encoder, and the local branch explores both intra-frame local details and inner-frame feature relationship through a local context extractor. Fig. (b) depicts the key insight of our HDR deghosting approach with CA-ViT. To remove the residual ghosting artifacts caused by large motions of the hand (marked with blue), long-range contexts (marked with red), which are required to hallucinate reasonable content in the ghosting area, are modeled by the self-attention in the global branch. Meanwhile, the well-exposed non-occluded local regions (marked with green) can be effectively extracted with convolutional layers and fused by the channel attention in the local branch.

Usage

Requirements

Python 3.7.0
MegEngine 1.8.3+
CUDA 10.0 on Ubuntu 18.04

Install the require dependencies:

conda create -n hdr_transformer python=3.7
conda activate hdr_transformer
pip install -r requirements.txt

Dataset

Download the dataset (include the training set and test set) from Kalantari17's dataset
Move the dataset to ./data and reorganize the directories as follows:

./data/Training
|--001
|  |--262A0898.tif
|  |--262A0899.tif
|  |--262A0900.tif
|  |--exposure.txt
|  |--HDRImg.hdr
|--002
...
./data/Test (include 15 scenes from `EXTRA` and `PAPER`)
|--001
|  |--262A2615.tif
|  |--262A2616.tif
|  |--262A2617.tif
|  |--exposure.txt
|  |--HDRImg.hdr
...
|--BarbequeDay
|  |--262A2943.tif
|  |--262A2944.tif
|  |--262A2945.tif
|  |--exposure.txt
|  |--HDRImg.hdr
...

Prepare the corpped training set by running:

cd ./dataset
python gen_crop_data.py

Training & Evaluaton

cd HDR-Transformer

To train the model, run:

python train.py --model_dir experiments

To evaluate, run:

python evaluate.py --model_dir experiments --restore_file experiments/val_model_best.pth

Results

results

Acknowledgement

The MegEngine version of the Swin-Transformer is based on Swin-Transformer-MegEngine. Our work is inspired the following works and uses parts of their official implementations:

We thank the respective authors for open sourcing their methods.

Citation

@inproceedings{liu2022ghost,
  title={Ghost-free High Dynamic Range Imaging with Context-aware Transformer},
  author={Liu, Zhen and Wang, Yinglong and Zeng, Bing and Liu, Shuaicheng},
  booktitle={European Conference on Computer Vision},
  pages={344--360},
  year={2022},
  organization={Springer}
}

Contact

If you have any questions, feel free to contact Zhen Liu at liuzhen03@megvii.com.