Awesome

📖 Recommendations of Document Image Processing

This repository contains a paper collection of the methods for document image processing, including appearance enhancement, deshadow, dewarping, deblur, and binarization.

🔥 Contents

1. Registration
2. Appearance Enhancement
3. Deshadow
4. Dewarping
5. Deblur
6. Binarization
⭐ Star Rising

1. Registration

Document registration (also known as document alignment) aims to densely map two document images with the same content (such as a scanned and photographed version of the same document). It has important applications in automated data annotation and template-based dewarping tasks.

1.1 Papers

Year	Venue	Title	Repo
2023	IJDAR	Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping	Code
2023	Arxiv	DocAligner: Annotating real-world photographic document images by simply taking pictures	Code
2024	ACM MM	Document Registration: Towards Automated Labeling of Pixel-Level Alignment Between Warped-Flat Documents
2024	ICDAR	Coarse-to-Fine Document Image Registration for Dewarping	Code

1.2 Datasets

Dataset	Num. (train/test)	Type	Example	Download
DocAlign12K	12K (10K/2K)	Synth	Example	Link

1.3 SOTA

<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Method</th> <th class="tg-c3ow" colspan="3">DocUNet (130)</th> </tr> <tr> <th class="tg-c3ow">MS-SSIM↑</th> <th class="tg-c3ow">AD↓</th> </tr> </thead> <tbody> <tr> <td class="tg-c3ow">Arxiv'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2306.05749">DocAligner</a></td> <td class="tg-c3ow">0.8232</td> <td class="tg-c3ow">0.0445</td> </tr> </tbody> </table>

2. Appearance Enhancement

Appearance enhancement (also known as illumination correction) is not limited to a specific degradation type and aims to restore a clean appearance similar to that obtained from a scanner or digital born PDF files.

2.1 Papers

Year	Venue	Title	Repo
2019	ACM TOG	Document Rectification and Illumination Correction using a Patch-based CNN	Code
2020	BMVC	Intrinsic Decomposition of Document Images In-the-wild	Code
2021	ICCV	DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks	Code
2021	ACM MM	DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction	Code
2022	CVPR	Fourier Document Restoration for Robust Document Dewarping and Recognition
2022	ACM MM	UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior	Code
2023	TAI	Appearance Enhancement for Camera-captured Document Images in the Wild	Code
2023	ICCVW	Template-guided Illumination Correction for Document Images with Imperfect Geometric Reconstruction	Code
2023	arxiv	DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF Versions
2024	ICASSP	Efficient Joint Rectification of Photometric and Geometric Distortions in Document Images
2024	CVPR	DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks	Code

2.2 Datasets

Dataset	Num. (train/test)	Type	Example	Download
Doc3DShade	90K	Synth	Example	Link
DocProj	2450	Synth	Example	Link
DocUNet from DocAligner	130	Real	Example	Link
RealDAE	600 (450/150)	Real	Example	Link
Inv3D	25K	Synth	Example	Link

2.3 Apps

2.4 SOTA

<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Methods</th> <th class="tg-c3ow" rowspan="2">Training data</th> <th class="tg-c3ow" colspan="2">DocUNet from DocAligner (130)</th> <th class="tg-c3ow" colspan="2">RealDAE (150)</th> </tr> <tr> <th class="tg-c3ow">SSIM</th> <th class="tg-c3ow">PSNR</th> <th class="tg-c3ow">SSIM</th> <th class="tg-c3ow">PSNR</th> </tr> </thead> <tbody> <tr> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">0.7195</td> <td class="tg-c3ow">13.09</td> <td class="tg-c3ow">0.8264</td> <td class="tg-c3ow">12.26</td> </tr> <tr> <td class="tg-c3ow">TOG'19</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/1909.09470.pdf">DocProj</a></td> <td class="tg-c3ow">DocProj</td> <td class="tg-c3ow">0.7098</td> <td class="tg-c3ow">14.71</td> <td class="tg-c3ow">0.8684</td> <td class="tg-c3ow">19.35</td> </tr> <tr> <td class="tg-c3ow">BMVC'20</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2011.14447">Das et al.</a></td> <td class="tg-c3ow">Doc3DShade</td> <td class="tg-c3ow">0.7276</td> <td class="tg-c3ow">16.42</td> <td class="tg-c3ow">0.8633</td> <td class="tg-c3ow">19.87</td> </tr> <tr> <td class="tg-c3ow">MM'21</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2110.12942.pdf">DocTr</a></td> <td class="tg-c3ow">DocProj</td> <td class="tg-c3ow">0.7067</td> <td class="tg-c3ow">15.78</td> <td class="tg-c3ow">0.7925</td> <td class="tg-c3ow">18.62</td> </tr> <tr> <td class="tg-c3ow">MM'22</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/abs/10.1145/3503161.3547916">UDoc-GAN</a></td> <td class="tg-c3ow">DocProj</td> <td class="tg-c3ow">0.6833</td> <td class="tg-c3ow">14.29</td> <td class="tg-c3ow">0.7558</td> <td class="tg-c3ow">16.43</td> </tr> <tr> <td class="tg-c3ow">TAI'23</td> <td class="tg-c3ow"><a href="https://ieeexplore.ieee.org/abstract/document/10268585/">GCDRNet</a></td> <td class="tg-c3ow">RealDAE</td> <td class="tg-c3ow"><b>0.7658</b></td> <td class="tg-c3ow">17.09</td> <td class="tg-c3ow"><b>0.9423</b></td> <td class="tg-c3ow">24.42</td> </tr> <tr> <td class="tg-c3ow">CVPR'24</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2405.04408">DocRes</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.7598</td> <td class="tg-c3ow"><b>17.60</b></td> <td class="tg-c3ow">0.9219</td> <td class="tg-c3ow"><b>24.65</b></td> </tr> </tbody> </table>

3. Deshadow

Deshadowing aims to eliminate shadows that are mainly caused by occlusion to obtain shadow-free document images.

3.1 Papers

Year	Venue	Title	Repo
2018	CVPR	Document Enhancement Using Visibility Detection	Code
2020	CVPR	BEDSR-Net A Deep Shadow Removal Network from a Single Document Image	Code*
2022	ICPR	Document Shadow Removal with Foreground Detection Learning From Fully Synth Images	Code
2022	MERCon	Shadow Removal for Documents with Reflective Textured Surface
2023	ICASSP	ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal	Code
2023	ICASSP	Shadow Removal of Text Document Images Using Background Estimation and Adaptive Text Enhancement
2023	ICASSP	LP-IOANet: Efficient High Resolution Document Shadow Removal
2023	Optical Review	Shadow removal from document image based on background estimation employing selective median filter and black-top-hat transform
2023	CVPR	Document Image Shadow Removal Guided by Color-Aware Background	Code
2023	arxiv	ShaDocFormer: A Shadow-attentive Threshold Detector with Cascaded Fusion Refiner for document shadow removal
2023	ICCV	High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net	Code
2023	Sensors	Synthetic Document Images with Diverse Shadows for Deep Shadow Removal Networks	Code
2024	AAAI	DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degradations	Code
2024	CVPR	DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks	Code
2024	IJDAR	Am I readable? Transfer learning based document image rectification

* indicates that the implementation is unofficial.

3.2 Datasets

Dataset	Num. (train/test)	Type	Example	Download
RDD	4916 (4371/545)	Real	Example	Link
Kligler et al.	300	Real	Example	Link
FSDSRD	14200	Synth	Example	Link
Jung et al.	87	Real	Example	Link
OSR	237	Real	Example	Link
WEZUT OCR	176	Real	Example	Link
SD7K	7620 (6479/760)	Real	Example	Link
SynDocDS	50K (40K/5K)	Synth		Link

3.3 SOTA

<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Method</th> <th class="tg-c3ow" rowspan="2">Training data</th> <th class="tg-c3ow" colspan="3"><a href="https://openaccess.thecvf.com/content_cvpr_2018/html/Kligler_Document_Enhancement_Using_CVPR_2018_paper.html">Kligler et al. (300)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://link.springer.com/chapter/10.1007/978-3-030-20887-5_25">Jung et al. (87)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://www.mdpi.com/1424-8220/20/23/6929">OSR (237)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://www.mdpi.com/1424-8220/20/23/6929">RDD (545)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://www.mdpi.com/1424-8220/20/23/6929">SD7K (760)</a></th> </tr> <tr> <th class="tg-c3ow">RMSE↓</th> <th class="tg-c3ow">PSNR↑</th> <th class="tg-c3ow">SSIM↑</th> <th class="tg-c3ow">RMSE↓</th> <th class="tg-c3ow">PSNR↑</th> <th class="tg-c3ow">SSIM↑</th> <th class="tg-c3ow">RMSE↓</th> <th class="tg-c3ow">PSNR↑</th> <th class="tg-c3ow">SSIM↑</th> <th class="tg-c3ow">RMSE↓</th> <th class="tg-c3ow">PSNR↑</th> <th class="tg-c3ow">SSIM↑</th> <th class="tg-c3ow">RMSE↓</th> <th class="tg-c3ow">PSNR↑</th> <th class="tg-c3ow">SSIM↑</th> </tr> </thead> <tbody> </tbody> <tbody>   <tbody> <tr> <td class="tg-c3ow">CVPR'23</td> <td class="tg-c3ow"><a href='https://openaccess.thecvf.com/content/CVPR2023/html/Zhang_Document_Image_Shadow_Removal_Guided_by_Color-Aware_Background_CVPR_2023_paper.html'>BGShadowNet</a></td> <td class="tg-c3ow">RDD</td> <td class="tg-c3ow">5.377</td> <td class="tg-c3ow">29.17</td> <td class="tg-c3ow">0.948</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">2.219</td> <td class="tg-c3ow">37.58</td> <td class="tg-c3ow">0.983</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> </tbody> <tr> <td class="tg-c3ow">ICCV'23</td> <td class="tg-c3ow"><a href='https://openaccess.thecvf.com/content/ICCV2023/html/Li_High-Resolution_Document_Shadow_Removal_via_A_Large-Scale_Real-World_Dataset_and_ICCV_2023_paper.html'>FSENet</a></td> <td class="tg-c3ow">SD7K</td> <td class="tg-c3ow">10.60</td> <td class="tg-c3ow">28.98</td> <td class="tg-c3ow">0.93</td> <td class="tg-c3ow">17.56</td> <td class="tg-c3ow">23.60 </td> <td class="tg-c3ow">0.85</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">10.00</td> <td class="tg-c3ow">28.67</td> <td class="tg-c3ow">0.96</td> </tr> </tbody> </tbody> <tr> <td class="tg-c3ow">CVPR'24</td> <td class="tg-c3ow"><a href='https://arxiv.org/pdf/2405.04408'>DocRes</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">27.14</td> <td class="tg-c3ow">0.900</td> <td class="tg-c3ow"></td> <td class="tg-c3ow">23.02</td> <td class="tg-c3ow">0.908</td> <td class="tg-c3ow"></td> <td class="tg-c3ow">21.64</td> <td class="tg-c3ow">0.937</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> </tbody> </table>

4. Dewarping

Dewarping, also referred to as geometric rectification, aims to rectify document images that suffer from curves, folds, crumples, perspective/affine deformation and other geometric distortions.

4.1 Papers

Year	Venue	Title	Repo
2018	CVPR	DocUNet: Document Image Unwarping via A Stacked U-Net
2019	TOG	Document Rectification and Illumination Correction using a Patch-based CNN	Code
2019	ICCV	DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks	Code
2020	PR	Geometric Rectification of Document Images using Adversarial Gated Unwarping Network
2020	ECCV	Can You Read Me Now? Content Aware Rectification using Angle Supervision
2020	DAS	Dewarping Document Image by Displacement Flow Estimation with Fully Convolutional Network	Code
2021	ACM MM	DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction	Code
2021	ICCV	End-to-end Piece-wise Unwarping of Document Images	Code
2021	ICDAR	Document Dewarping with Control Points	Code
2022	CVPR	Fourier Document Restoration for Robust Document Dewarping and Recognition
2022	CVPR	Revisiting Document Image Dewarping by Grid Regularization
2022	ACM MM	Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild
2022	SIGGRAPH	Learning From Documents in the Wild to Improve Document Unwarping	Code
2022	ECCV	Geometric Representation Learning for Document Image Rectification	Code
2022	ECCV	Learning an Isometric Surface Parameterization for Texture Unwrapping	Code
2022	Arxiv	DocScanner: Robust Document Image Rectification with Progressive Learning	Code
2022	ICPR	Document Image Rectification in Complex Scene Using Stacked Siamese Networks
2023	Arxiv	Geometric Rectification of Creased Document Images based on Isometric Mapping
2023	IJDAR	Adaptive Dewarping of Severely Warped Camera-captured Document Images Based on Document Map Generation
2023	TMM	Deep Unrestricted Document Image Rectification	Code
2023	Arxiv	Neural Document Unwarping using Coupled Grids
2023	IJDAR	Inv3D: A High-resolution 3D Invoice Dataset for Template-guided Single-image Document Unwarping	Code
2023	Arxiv	MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
2023	ICCVW	Template-guided Illumination Correction for Document Images with Imperfect Geometric Reconstruction	Code
2023	ICCV	Foreground and Text-lines Aware Document Image Rectification	Code
2023	ACM TOG	Layout-Aware Single-Image Document Flatening	Code
2023	WACV	DocReal: Robust Document Dewarping of Real-Life Images via Attention-Enhanced Control Point Prediction	Code
2023	TCSVT	Rethinking Supervision in Document Unwarping: A Self-consistent Flow-free Approach
2023	SIGGRAPH	UVDoc: Neural Grid-based Document Unwarping	Code
2023	Arxiv	Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints under Polar Representation
2024	ICASSP	Efficient Joint Rectification of Photometric and Geometric Distortions in Document Images
2024	ICDAR	Coarse-to-Fine Document Image Registration for Dewarping	Code
2024	CVPR	DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks	Code
2024	IJDAR	Am I readable? Transfer learning based document image rectification
2024	ACM MM	Document Registration: Towards Automated Labeling of Pixel-Level Alignment Between Warped-Flat Documents

4.2 Dataset

Dataset	Num.	Type	Example	Download/Codes
DocUNet	130	Real	Example	Link
Doc3D	100K	Synth	-	Link
DIW	5K	Real	Example	Link
WarpDoc	1020	Real	Example	Link
DIR300	300	Real	Example	Link
Inv3D	25K	Synth	Example	Link
Inv3DReal	360	Real	Example	Link
DICP	-	Synth	-	Link
DIF	-	Synth	-	Link
Simulated Paper	90K	Synth	-	Link
DocReal	200	Real	Example	Link
UVDoc	20K	Synth	Example	Link
WarpDoc-R	840	Real

4.3 SOTA

<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Method</th> <th class="tg-c3ow" colspan="3">DocUNet (130)</th> <th class="tg-c3ow" colspan="3">DIR300 (300)</th> <th class="tg-c3ow" colspan="2">DocReal (200)</th> <th class="tg-c3ow" colspan="2">UVDoc (50)</th> </tr> <tr> <th class="tg-c3ow">MS-SSIM↑</th> <th class="tg-c3ow">LD↓</th> <th class="tg-c3ow">AD↓</th> <th class="tg-c3ow">MS-SSIM↑</th> <th class="tg-c3ow">LD↓</th> <th class="tg-c3ow">AD↓</th> <th class="tg-c3ow">MS-SSIM↑</th> <th class="tg-c3ow">LD↓</th> <th class="tg-c3ow">MS-SSIM↑</th> <th class="tg-c3ow">AD↓</th> </tr> </thead> <tbody> <tr> <td class="tg-c3ow">ICCV'19</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content_ICCV_2019/html/Das_DewarpNet_Single-Image_Document_Unwarping_With_Stacked_3D_and_2D_Regression_ICCV_2019_paper.html">DewarpNet</a></td> <td class="tg-c3ow">0.474</td> <td class="tg-c3ow">8.39</td> <td class="tg-c3ow">0.426</td> <td class="tg-c3ow">0.492</td> <td class="tg-c3ow">13.94</td> <td class="tg-c3ow">0.331</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.589</td> <td class="tg-c3ow">0.193</td> </tr> <tr> <td class="tg-c3ow">DAS'20</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2104.06815.pdf">FCN-based</a></td> <td class="tg-c3ow">0.448</td> <td class="tg-c3ow">7.84</td> <td class="tg-c3ow">0.434</td> <td class="tg-c3ow">0.503</td> <td class="tg-c3ow">9.75</td> <td class="tg-c3ow">0.331</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ICCV'21</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/ICCV2021/papers/Das_End-to-End_Piece-Wise_Unwarping_of_Document_Images_ICCV_2021_paper.pdf">Piece-Wise</a></td> <td class="tg-c3ow">0.492</td> <td class="tg-c3ow">8.64</td> <td class="tg-c3ow">0.468</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ICDAR'21</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2203.10543.pdf">DDCP</a></td> <td class="tg-c3ow">0.473</td> <td class="tg-c3ow">8.99</td> <td class="tg-c3ow">0.453</td> <td class="tg-c3ow">0.552</td> <td class="tg-c3ow">10.95</td> <td class="tg-c3ow">0.357</td> <td class="tg-c3ow">0.46</td> <td class="tg-c3ow">16.04</td> <td class="tg-c3ow">0.585</td> <td class="tg-c3ow">0.290</td> </tr> <tr> <td class="tg-c3ow">MM'21</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2110.12942.pdf">DocTr</a></td> <td class="tg-c3ow">0.511</td> <td class="tg-c3ow">7.76</td> <td class="tg-c3ow">0.396</td> <td class="tg-c3ow">0.616</td> <td class="tg-c3ow">7.21</td> <td class="tg-c3ow">0.254</td> <td class="tg-c3ow">0.55</td> <td class="tg-c3ow">12.66</td> <td class="tg-c3ow">0.697</td> <td class="tg-c3ow">0.160</td> </tr> <tr> <td class="tg-c3ow">CVPR'22</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/CVPR2022/papers/Jiang_Revisiting_Document_Image_Dewarping_by_Grid_Regularization_CVPR_2022_paper.pdf">RDGR</a></td> <td class="tg-c3ow">0.497</td> <td class="tg-c3ow">8.51</td> <td class="tg-c3ow">0.461</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.610</td> <td class="tg-c3ow">0.280</td> </tr> <tr> <td class="tg-c3ow">MM'22</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2207.11515.pdf">Marior</a></td> <td class="tg-c3ow">0.478</td> <td class="tg-c3ow">7.27</td> <td class="tg-c3ow">0.403</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ECCV'22</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2210.08161.pdf">DocGeoNet</a></td> <td class="tg-c3ow">0.504</td> <td class="tg-c3ow">7.71</td> <td class="tg-c3ow">0.380</td> <td class="tg-c3ow">0.638</td> <td class="tg-c3ow">6.40</td> <td class="tg-c3ow">0.242</td> <td class="tg-c3ow">0.55</td> <td class="tg-c3ow">12.22</td> <td class="tg-c3ow">0.706</td> <td class="tg-c3ow">0.168</td> </tr> <tr> <td class="tg-c3ow">SIGGRAPH'22</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/pdf/10.1145/3528233.3530756">PaperEdge</a></td> <td class="tg-c3ow">0.473</td> <td class="tg-c3ow">7.81</td> <td class="tg-c3ow">0.392</td> <td class="tg-c3ow">0.583</td> <td class="tg-c3ow">8.00</td> <td class="tg-c3ow">0.255</td> <td class="tg-c3ow">0.52</td> <td class="tg-c3ow">11.46</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">Arxiv'22</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2110.14968">DocScanner-L</a></td> <td class="tg-c3ow">0.518</td> <td class="tg-c3ow">7.45</td> <td class="tg-c3ow">0.334</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ICCV'23</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Foreground_and_Text-lines_Aware_Document_Image_Rectification_ICCV_2023_paper.pdf">Li et al.</td> <td class="tg-c3ow">0.497</td> <td class="tg-c3ow">8.43</td> <td class="tg-c3ow">0.376</td> <td class="tg-c3ow">0.607</td> <td class="tg-c3ow">7.68</td> <td class="tg-c3ow">0.244</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">WACV'23</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/WACV2024/papers/Yu_DocReal_Robust_Document_Dewarping_of_Real-Life_Images_via_Attention-Enhanced_Control_WACV_2024_paper.pdf">DocReal</a></td> <td class="tg-c3ow">0.50</td> <td class="tg-c3ow">7.03</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"><b>0.56</b></td> <td class="tg-c3ow"><b>9.83</b></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">TCSVT'23</td> <td class="tg-c3ow"><a href="https://ieeexplore.ieee.org/abstract/document/10327775">DRNet</a></td> <td class="tg-c3ow">0.51</td> <td class="tg-c3ow">7.42</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">TMM'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2304.08796">DocTr++</a></td> <td class="tg-c3ow">0.51</td> <td class="tg-c3ow">7.54</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.45</td> <td class="tg-c3ow">19.88</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">Arxiv'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2312.07925">Polar-Doc</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.605</td> <td class="tg-c3ow">7.17</td> <td class="tg-c3ow">0.206</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">Arxiv'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2307.12571.pdf">MetaDoc</a></td> <td class="tg-c3ow">0.502</td> <td class="tg-c3ow">7.42</td> <td class="tg-c3ow">0.315</td> <td class="tg-c3ow">0.638</td> <td class="tg-c3ow">5.75</td> <td class="tg-c3ow"><b>0.178</b></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">SIGGRAPH'23</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/fullHtml/10.1145/3610548.3618174">UVDoc</a></td> <td class="tg-c3ow"><b>0.544</b></td> <td class="tg-c3ow">6.83</td> <td class="tg-c3ow">0.315</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"><b>0.785</b></td> <td class="tg-c3ow"><b>0.119</b></td> </tr> <tr> <td class="tg-c3ow">ACM TOG'23</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/pdf/10.1145/3627818">LA-DocFlatten</a></td> <td class="tg-c3ow">0.526</td> <td class="tg-c3ow"><b>6.72</b></td> <td class="tg-c3ow"><b>0.300</b></td> <td class="tg-c3ow">0.651</td> <td class="tg-c3ow"><b>5.70</b></td> <td class="tg-c3ow">0.195</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">CVPR'24</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2405.04408">DocRes</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.626</td> <td class="tg-c3ow">6.83</td> <td class="tg-c3ow">0.241</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">IJDAR'24</td> <td class="tg-c3ow"><a href="https://link.springer.com/article/10.1007/s10032-024-00476-9">DocTLNet</a></td> <td class="tg-c3ow">0.51</td> <td class="tg-c3ow"> 6.70</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"><b>0.658</b></td> <td class="tg-c3ow">5.75</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> </tbody> </table>

Note that the 127th and 128th distorted images in DocUNet benchmark are rotated by 180 degrees, which does not match the ground truth documents. The performance reported here is based on corrected data.
Note that the UVDoc benchmark reported in our repository is based on the full UVDoc benchmark dataset (reported on the official github page). The results in the paper used only half of the UVDoc benchmark.

5. Deblur

5.1 Papers

Year	Venue	Title	Repo
2019	NIPS	SVDocNet: Spatially Variant U-Net for Blind Document Deblurring
2019	MTA	DeepDeblur: text image recovery from blur to sharp	code
2020	TPAMI	DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement	code
2021	ICCV	End-to-End Unsupervised Document Image Blind Denoising
2023	ACM MM	DocDiff: Document Enhancement via Residual Diffusion ModelscDiff	code
2024	AAAI	DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degradations	Code
2024	CVPR	DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks	Code
2024	Arxiv	NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement	Code

5.2 Datasets

Dataset	Num. (train/test)	Type	Example	Download
TDD (text deblur dataset)	67.6K (66K/1.6K)	Synth	Example	Link1, Link2

5.3 SOTA

Coming Soon ...

6. Binarization

6.1 Papers

Year	Venue	Title	Repo
2019	PR	DeepOtsu: Document enhancement and binarization using iterative deep learning	code
2021	PR	Complex image processing with less data—Document image binarization by integrating multiple pre-trained U-Net modules	code
2022	PR	Two-Stage Generative Adversarial Networks for Binarization of Color Document Images	code
2023	PR	GDB: Gated Convolutions-based Document Binarization	code
2023	ACM MM	DocDiff: Document Enhancement via Residual Diffusion ModelscDiff	code
2023	ICDAR	ColDBin: Cold Diffusion for Document Image Binarization	code
2023	IF	A Novel Degraded Document Binarization Model through Vision Transformer Network
2023	Arxiv	DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization
2024	AAAI	DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degradations	Code
2024	CVPR	DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks	Code
2024	Arxiv	NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement	Code

6.2 Datasets

Dataset	Num.	Type	Example	Download
DocEng 2019	15	Real	Example	Link
DocEng 2020	32	Real	Example	Link
DocEng 2021	222	Real	Example	Link
DocEng 2022	80	Real	Example	Link
DIBCO 2009	10	Real	Example	Link
H-DIBCO 2010	10	Real	Example	Link
DIBCO 2011	16	Real	Example	Link
H-DIBCO 2012	14	Real	Example	Link
DIBCO 2013	16	Real	Example	Link
H-DIBCO 2014	10	Real	Example	Link
H-DIBCO 2016	10	Real	Example	Link
DIBCO 2017	20	Real	Example	Link
DIBCO 2018	10	Real	Example	Link
DIBCO 2019	10	Real	Example	Link
Bickly-diary	7	Real	Example	Link
Synchromedia Multispectral (MSI)	240	Real	Example	Link
Persian Heritage Image Binarization （PHIBD）	15	Real	Example	Link
Palm Leaf	50	Real	Example	Link
NoiseOffice	216	Synth	Example	Link
LRDE Document Binarization Dataset	125	Real	-	Link
Shipping label dataset	1082	Real	Example	Link

6.3 SOTA