Awesome

ITMLUT

Official PyTorch implemeation of "Redistributing the Precision and Content in 3D-LUT-based Inverse Tone-mapping for HDR/WCG Display"(paper(arViv), paper) in CVMP2023 (website, proceedings).

1. A quick glance to all AI-3D-LUT algorithms

Here are all AI-3D-LUT (look-up table) as far as we know (last updated 07/03/2024), please jump to them if interested.

You can cite our paper if you feel this overview helpful.

@InProceedings{Guo_2023_CVMP,
    author    = {Guo, Cheng and Fan, Leidong and Zhang, Qian and Liu, Hanyuan and Liu, Kanglin and Jiang, Xiuhua},
    title     = {Redistributing the Precision and Content in 3D-LUT-based Inverse Tone-mapping for HDR/WCG Display},
    booktitle = {Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production (CVMP)},
    month     = {November},
    year      = {2023},
    pages     = {1-10},
    doi       = {10.1145/3626495.3626503}
}

<table> <thead> <tr> <th colspan="7">AI-3D-LUT algotithms</th> <th colspan="3">Expressiveness of the trained LUT</th> <th rowspan="2">Output of neural network(s) </th> <th rowspan="2">Nodes (packing) </th> </tr> <tr> <th>Idea</th> <th>Task</th> <th>Name </th> <th>Publication</th> <th>Paper </th> <th>Code</th> <th>Institution</th> <th>#BasicLUT</th> <th>LUT size each</th> <th>(#) Extra dimension</th> </tr> </thead> <tbody> <tr> <td>First AI-LUT</td> <td rowspan="9">Image enhancement /retouching </td> <td>A3DLUT</td> <td>20-TPAMI</td> <td><a href="https://ieeexplore.ieee.org/abstract/document/9206076" target="_blank" rel="noopener noreferrer">paper</a></td> <td><a href="https://github.com/HuiZeng/Image-Adaptive-3DLUT" target="_blank" rel="noopener noreferrer">code</a></td> <td>HK_PolyU & <a href="https://www.dji.com/" target="_blank" rel="noopener noreferrer">DJI Innovation</a></td> <td>3×1</td> <td>3×333</td> <td>-</td> <td>weights (of basic LUTs)</td> <td rowspan="4">uniform</td> </tr> <tr> <td>C</td> <td>SA-LUT-Nets</td> <td>ICCV'21</td> <td><a href="https://openaccess.thecvf.com/content/ICCV2021/papers/Wang_Real-Time_Image_Enhancer_via_Learnable_Spatial-Aware_3D_Lookup_Tables_ICCV_2021_paper.pdf" target="_blank" rel="noopener noreferrer">paper</a></td> <td>-</td> <td><a href="https://www.noahlab.com.hk/" target="_blank" rel="noopener noreferrer">Huawei Noah's Ark Lab</a></td> <td>3×10</td> <td>3×333</td> <td>(10) category</td> <td>weights & category map</td> </tr> <tr> <td>E </td> <td>CLUT-Net</td> <td rowspan="2">MM'22</td> <td><a href="https://dl.acm.org/doi/10.1145/3503161.3547879" target="_blank" rel="noopener noreferrer">paper</a></td> <td><a href="https://github.com/Xian-Bei/CLUT-Net/" target="_blank" rel="noopener noreferrer">code</a></td> <td>CN_TongjiU & <a href="https://ur.oppo.com/" target="_blank" rel="noopener noreferrer">OPPO Research</a></td> <td>20×1 </td> <td>3×5×20 (compressed LUT representation)</td> <td>-</td> <td rowspan="2">weights</td> </tr> <tr> <td>E</td> <td>F2D-LUT</td> <td><a href="https://dl.acm.org/doi/abs/10.1145/3503161.3548325" target="_blank" rel="noopener noreferrer">paper</a></td> <td><a href="https://github.com/shedy-pub/I2VEnhance" target="_blank" rel="noopener noreferrer">code</a></td> <td>CN_TsinghuaU</td> <td>6×3</td> <td>2×332 (3D LUT decoupled to 2D LUTs) </td> <td>(3) R-G/R-B/G-B channel order</td> </tr> <tr> <td>N</td> <td>AdaInt</td> <td>CVPR'22</td> <td><a href="https://openaccess.thecvf.com/content/CVPR2022/papers/Yang_AdaInt_Learning_Adaptive_Intervals_for_3D_Lookup_Tables_on_Real-Time_CVPR_2022_paper.pdf" target="_blank" rel="noopener noreferrer">paper</a></td> <td><a href="https://github.com/ImCharlesY/AdaInt" target="_blank" rel="noopener noreferrer">code</a></td> <td rowspan="2">CN_SJTU & Alibaba Group</td> <td>3×1</td> <td>3×333</td> <td rowspan="3">-</td> <td>weights & nodes</td> <td>learned non-uniform</td> </tr> <tr> <td>N</td> <td>SepLUT</td> <td>ECCV'22</td> <td><a href="https://link.springer.com/content/pdf/10.1007/978-3-031-19797-0_12" target="_blank" rel="noopener noreferrer">paper</a></td> <td><a href="https://github.com/ImCharlesY/SepLUT" target="_blank" rel="noopener noreferrer">code</a></td> <td>1 (no self-adaptibility)</td> <td>3×93 or 3×173</td> <td>directly 1D & 3D LUTs</td> <td>learned non-linear by 1D LUT</td> </tr> <tr> <td>C </td> <td>DualBLN</td> <td>ACCV'22</td> <td><a href="https://openaccess.thecvf.com/content/ACCV2022/papers/Zhang_DualBLN_Dual_Branch_LUT-aware_Network_for_Real-time_Image_Retouching_ACCV_2022_paper.pdf" target="_blank" rel="noopener noreferrer">paper</a></td> <td><a href="https://github.com/120326/DualBLN" target="_blank" rel="noopener noreferrer">code</a></td> <td>CN_NorthwesternPolyU</td> <td>5×1</td> <td>3×363</td> <td>LUT fusion map</td> <td rowspan="7">uniform</td> </tr> <tr> <td>C</td> <td>4D-LUT</td> <td>23-TIP</td> <td><a href="https://ieeexplore.ieee.org/document/10226494" target="_blank" rel="noopener noreferrer">paper</a></td> <td>-</td> <td>CN_XianJiaotongU & <a href="https://www.msra.cn" target="_blank" rel="noopener noreferrer">Microsoft Research Asia</a></td> <td>3×1</td> <td>3×334</td> <td>(33) context</td> <td>weights & context map</td> </tr> <tr> <td>C &amp E</td> <td>AttentionLUT</td> <td>24-ArXiv</td> <td><a href="https://arxiv.org/pdf/2401.01569.pdf" target="_blank" rel="noopener noreferrer">paper</a></td> <td>-</td> <td>CN_SJTU <a href="https://jhc.sjtu.edu.cn/" target="_blank" rel="noopener noreferrer">John Hopcroft Center</a></td> <td>no (donot relay on basic LUT for self-adaptibility)</td> <td>9×15×33 (represented by Canonical Polyadic decomposition)</td> <td>-</td> <td>feature (to encode Q,K,V tensors)</td> </tr> <tr> <td>E</td> <td>Photorealistic Style Transfer</td> <td>NLUT</td> <td>23-arXiv</td> <td><a href="https://arxiv.org/pdf/2303.09170" target="_blank" rel="noopener noreferrer">paper</a></td> <td><a href="https://github.com/semchan/NLUT/" target="_blank" rel="noopener noreferrer">code</a></td> <td><a href="http://international.sobey.com/index.php" target="_blank" rel="noopener noreferrer">Sobey Digital Technology</a> & Peng Cheng Lab</td> <td>2048×1</td> <td>3×32×32 (compressed LUT representation)</td> <td>-</td> <td>weights</td> </tr> <tr> <td>C</td> <td>Video Low-light enhancement </td> <td>IA-LUT</td> <td>MM'23</td> <td><a href="https://dl.acm.org/doi/10.1145/3581783.3611933" target="_blank" rel="noopener noreferrer">paper</a></td> <td><a href="https://github.com/Wenhao-Li-%20777/FastLLVE" target="_blank" rel="noopener noreferrer">code</a></td> <td>CN_SJTU & <a href="https://damo.alibaba.com/" target="_blank" rel="noopener noreferrer">Alibaba Damo Academy</a></td> <td>3×1</td> <td>3×334</td> <td>(33) intensity</td> <td>weights & intensity map</td> </tr> <tr> <td>No</td> <td>Underwater Imge Enhancement</td> <td>INAM-LUT</td> <td>23-Sensors</td> <td><a href="https://www.mdpi.com/1424-8220/23/4/2169" target="_blank" rel="noopener noreferrer">paper</a></td> <td>-</td> <td>CN_XidianU </td> <td>3×1</td> <td>3×33(?)3</td> <td>-</td> <td>weights</td> </tr> <tr> <td>C</td> <td>Tone-mapping</td> <td>LapLUT</td> <td>NeurIPS'23</td> <td><a href="https://proceedings.neurips.cc/paper_files/paper/2023/file/b3a08d179347e33414badadf100e4e8d-Paper-Conference.pdf" target="_blank" rel="noopener noreferrer">paper</a></td> <td>-</td> <td>CN_HUST & <a href="https://www.dji.com/" target="_blank" rel="noopener noreferrer">DJI Innovation</a></td> <td>3×1</td> <td>3×333</td> <td>-</td> <td>weight map (of each interpolated image)</td> </tr> <tr> <td>Ours</td> <td>HDR/WCG Inverse Tone-mapping</td> <td>ITM-LUT </td> <td>CVMP'23</td> <td><a href="https://dl.acm.org/doi/abs/10.1145/3626495.3626503" target="_blank" rel="noopener noreferrer">paper</a></td> <td>see below</td> <td><a href="https://en.cuc.edu.cn/" target="_blank" rel="noopener noreferrer">CN_CUC</a> & Peng Cheng Lab</td> <td>5×3</td> <td>3×173</td> <td>(3) luminance probability (contribution) </td> <td>weights</td> <td>explicitly defined non-uniform </td> </tr> </tbody> </table>

In col. idea:

C stands for improving the expressiveness of LUT content (by new way to generate image-adaptive LUT or introducing new dimension);

E stands for making LUT further efficient (by special representation of LUT's elements);

N stands for setting non-uniform nodes (to optimize LUT's interpolation error on image with specific numerical distribution).

Note that:

We only listed AI-3D-LUTs for image-to-image low-level vision tasks, and below AI-LUTs are not included:

Non-3D AI-LUTs for other CV tasks: e.g. SR-LUT, MuLUT(paper1, paper2(extented to image restoration)), VA-LUT, SPLUT (super-resolution, non-3D-LUT), MEFLUT (multi-exposure fusion, 1D-LUT), SA-LuT-Nets (medical imaging) etc. (Such LUTs may not even involve an interpolation process).
Claim to be AI-LUT, but use other mechanism to conduct image-to-image transform: e.g. NILUT (represent LUT transform using MLP(multi-layer perceptron)) etc.

2. Our algorithm ITM-LUT

Our AI-3D-LUT alogorithm named ITM-LUT conduct inverse tone-mapping (ITM) from standard dynamic range (SDR) image/frame to its high dynamic range and wide color gamut (HDR/WCG) version.

2.1 Key features

Self-adaptability: LUT content will alter with input SDR's statistics, by merging basic LUTs using neural-network-generated weight from input SDR.
AI-learning: Rather a 'top-down design' static LUT, our LUT can be learned from any dataset in 'bottom-up' manner, enabling the reverse engineering of any technical and artistic intent between SDR and HDR/WCG.
HDR/WCG optimization: For a LUT processing higher-bit-depth HDR/WCG content (requiring larger LUT size N), we use 3 LUTs with different non-uniform nodes. Their result will have less interpolation error respectively in different ranges, so we use a pixel-wise contribution map to blend their best ranges. In this case, 3 smaller LUTs (e.g. N=17) can reach the same error level to single bigger LUT (e.g. N=33), while occupy less #elements (e.g. 44217<107811).

2.2 Prerequisites

Python
PyTorch
OpenCV
ImageIO
NumPy
GCC/G++

2.3 Usage (how to test)

First, install the CUDA&C++ implementation of trilinear interpolation with non-uniform vertices (need GCC/G++):

python3 ./ailut/setup.py install

after that, you can get ailut package in your python.

Run test.py with below configuration(s):

python3 test.py frameName.jpg

When batch processing, use wildcard *:

python3 test.py framesPath/*.png

or like:

python3 test.py framesPath/footageName_*.png

Add below configuration(s) for specific propose:

Propose	Configuration
Specifing output path	`-out resultDir/` (default is inputDir)
Resizing image before inference	`-resize True -height newH -width newW`
Adding filename tag	`-tag yourTag`
Forcing CPU processing	`-use_gpu False`
Using input SDR with bit depth != 8	e.g. `-in_bitdepth 16`
Saving result HDR in other format<br/>(defalut is uncompressed<br/>16-bit `.tif`of single frame)	`-out_format suffix`<br>`png` as 16bit .png<br>`exr` require extra package `openEXR`

Change line 104 in test.py to use other parameters/checkpoint:

Current params.pth is trained on our own HDRTV4K dataset and DaVinci degradation model (available here), it can score 35.14dB the PSNR, 0.9605 the SSIM, 14.330 the $\Delta$ Eitp and 9.1181 VDP3 ('task'='side-by-side', 'color_encoding'='rgb-bt.2020', 'pixel_per_degree'=60 on 1920*1080 image) on HDRTV4K-DaVinci testset.
Checkpoint params_TV1K.pth is trained on popular HDRTV1K dataset and YouTube degradation model, it can score 36.69dB the PSNR, 0.9811 the SSIM, 10.194 the $\Delta$ Eitp and 8.9122 VDP3 ('task'='side-by-side', 'color_encoding'='rgb-bt.2020', 'pixel_per_degree'=60 on 1920*1080 image) on HDRTV1K testset.
We will later release more interesting checkpoint(s).

2.4 Training code

First, download the training code from BaiduNetDisk(code:qgs2) or GoogleDrive. This package contain 5 essential real ITM LUTs used in our own LUT initialization, and other 13 real ITM LUTs (in both N=17/33/65) where you can use any of their combinations to try new LUT initialization.

Then:

cd ITMLUT_train/codes

python3 train.py -opt options/test/test_Net.yml

You can modify training configuration e.g. #basicLUTs and LUTsize at codes/options/test/test_Net.yml.
Rename any aother LUT in codes/real_luts/other_luts to e.g. 2_17.cube in codes/real_luts to try new initialization, remenber to delete the first row (str) when using other commercial LUT(s).

2.5 Changelog

Date	log
29 Feb 2024	Since most SoTAs are still trained and tested on HDRTV1K dataset, we add a checkpoint `params_TV1K.pth` trained on it, so result will get a similar look as SoTAs.
3 Mar 2024	Training code (along with 18 real ITM LUTs in N=17/33/65) is now released.

Contact

Guo Cheng (Andre Guo) guocheng@cuc.edu.cn

State Key Laboratory of Media Convergence and Communication (MCC), Communication University of China (CUC), Beijing, China.
Peng Cheng Laboratory (PCL), Shenzhen, China.