Home

Awesome

LOGO

<!-- omit in toc -->

šŸ“– Recommendations of Document Image Processing

This repository contains a paper collection of the methods for document image processing, including appearance enhancement, deshadow, dewarping, deblur, and binarization.

<!-- omit in toc -->

šŸ”„ Contents

1. Registration

Document registration (also known as document alignment) aims to densely map two document images with the same content (such as a scanned and photographed version of the same document). It has important applications in automated data annotation and template-based dewarping tasks.

1.1 Papers

YearVenueTitleRepo
2023IJDARInv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarpingCode
2023ArxivDocAligner: Annotating real-world photographic document images by simply taking picturesCode
2024ACM MMDocument Registration: Towards Automated Labeling of Pixel-Level Alignment Between Warped-Flat Documents
2024ICDARCoarse-to-Fine Document Image Registration for DewarpingCode

1.2 Datasets

DatasetNum. (train/test)TypeExampleDownload
DocAlign12K12K (10K/2K)SynthExampleLink

1.3 SOTA

<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Method</th> <th class="tg-c3ow" colspan="3">DocUNet (130)</th> </tr> <tr> <th class="tg-c3ow">MS-SSIMā†‘</th> <th class="tg-c3ow">ADā†“</th> </tr> </thead> <tbody> <tr> <td class="tg-c3ow">Arxiv'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2306.05749">DocAligner</a></td> <td class="tg-c3ow">0.8232</td> <td class="tg-c3ow">0.0445</td> </tr> </tbody> </table>

2. Appearance Enhancement

Appearance enhancement (also known as illumination correction) is not limited to a specific degradation type and aims to restore a clean appearance similar to that obtained from a scanner or digital born PDF files.

2.1 Papers

YearVenueTitleRepo
2019ACM TOGDocument Rectification and Illumination Correction using a Patch-based CNNCode
2020BMVCIntrinsic Decomposition of Document Images In-the-wildCode
2021ICCVDewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression NetworksCode
2021ACM MMDocTr: Document Image Transformer for Geometric Unwarping and Illumination CorrectionCode
2022CVPRFourier Document Restoration for Robust Document Dewarping and Recognition
2022ACM MMUDoc-GAN: Unpaired Document Illumination Correction with Background Light PriorCode
2023TAIAppearance Enhancement for Camera-captured Document Images in the WildCode
2023ICCVWTemplate-guided Illumination Correction for Document Images with Imperfect Geometric ReconstructionCode
2023arxivDocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF Versions
2024ICASSPEfficient Joint Rectification of Photometric and Geometric Distortions in Document Images
2024CVPRDocRes: A Generalist Model Toward Unifying Document Image Restoration TasksCode

2.2 Datasets

DatasetNum. (train/test)TypeExampleDownload
Doc3DShade90KSynthExampleLink
DocProj2450SynthExampleLink
DocUNet from DocAligner130RealExampleLink
RealDAE600 (450/150)RealExampleLink
Inv3D25KSynthExampleLink

2.3 Apps

2.4 SOTA

<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Methods</th> <th class="tg-c3ow" rowspan="2">Training data</th> <th class="tg-c3ow" colspan="2">DocUNet from DocAligner (130)</th> <th class="tg-c3ow" colspan="2">RealDAE (150)</th> </tr> <tr> <th class="tg-c3ow">SSIM</th> <th class="tg-c3ow">PSNR</th> <th class="tg-c3ow">SSIM</th> <th class="tg-c3ow">PSNR</th> </tr> </thead> <tbody> <tr> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">0.7195</td> <td class="tg-c3ow">13.09</td> <td class="tg-c3ow">0.8264</td> <td class="tg-c3ow">12.26</td> </tr> <tr> <td class="tg-c3ow">TOG'19</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/1909.09470.pdf">DocProj</a></td> <td class="tg-c3ow">DocProj</td> <td class="tg-c3ow">0.7098</td> <td class="tg-c3ow">14.71</td> <td class="tg-c3ow">0.8684</td> <td class="tg-c3ow">19.35</td> </tr> <tr> <td class="tg-c3ow">BMVC'20</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2011.14447">Das et al.</a></td> <td class="tg-c3ow">Doc3DShade</td> <td class="tg-c3ow">0.7276</td> <td class="tg-c3ow">16.42</td> <td class="tg-c3ow">0.8633</td> <td class="tg-c3ow">19.87</td> </tr> <tr> <td class="tg-c3ow">MM'21</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2110.12942.pdf">DocTr</a></td> <td class="tg-c3ow">DocProj</td> <td class="tg-c3ow">0.7067</td> <td class="tg-c3ow">15.78</td> <td class="tg-c3ow">0.7925</td> <td class="tg-c3ow">18.62</td> </tr> <tr> <td class="tg-c3ow">MM'22</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/abs/10.1145/3503161.3547916">UDoc-GAN</a></td> <td class="tg-c3ow">DocProj</td> <td class="tg-c3ow">0.6833</td> <td class="tg-c3ow">14.29</td> <td class="tg-c3ow">0.7558</td> <td class="tg-c3ow">16.43</td> </tr> <tr> <td class="tg-c3ow">TAI'23</td> <td class="tg-c3ow"><a href="https://ieeexplore.ieee.org/abstract/document/10268585/">GCDRNet</a></td> <td class="tg-c3ow">RealDAE</td> <td class="tg-c3ow"><b>0.7658</b></td> <td class="tg-c3ow">17.09</td> <td class="tg-c3ow"><b>0.9423</b></td> <td class="tg-c3ow">24.42</td> </tr> <tr> <td class="tg-c3ow">CVPR'24</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2405.04408">DocRes</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.7598</td> <td class="tg-c3ow"><b>17.60</b></td> <td class="tg-c3ow">0.9219</td> <td class="tg-c3ow"><b>24.65</b></td> </tr> </tbody> </table>

3. Deshadow

Deshadowing aims to eliminate shadows that are mainly caused by occlusion to obtain shadow-free document images.

3.1 Papers

YearVenueTitleRepo
2018CVPRDocument Enhancement Using Visibility DetectionCode
2020CVPRBEDSR-Net A Deep Shadow Removal Network from a Single Document ImageCode*
2022ICPRDocument Shadow Removal with Foreground Detection Learning From Fully Synth ImagesCode
2022MERConShadow Removal for Documents with Reflective Textured Surface
2023ICASSPShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document Shadow RemovalCode
2023ICASSPShadow Removal of Text Document Images Using Background Estimation and Adaptive Text Enhancement
2023ICASSPLP-IOANet: Efficient High Resolution Document Shadow Removal
2023Optical ReviewShadow removal from document image based on background estimation employing selective median filter and black-top-hat transform
2023CVPRDocument Image Shadow Removal Guided by Color-Aware BackgroundCode
2023arxivShaDocFormer: A Shadow-attentive Threshold Detector with Cascaded Fusion Refiner for document shadow removal
2023ICCVHigh-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing NetCode
2023SensorsSynthetic Document Images with Diverse Shadows for Deep Shadow Removal NetworksCode
2024AAAIDocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple DegradationsCode
2024CVPRDocRes: A Generalist Model Toward Unifying Document Image Restoration TasksCode
2024IJDARAm I readable? Transfer learning based document image rectification

* indicates that the implementation is unofficial.

3.2 Datasets

DatasetNum. (train/test)TypeExampleDownload
RDD4916 (4371/545)RealExampleLink
Kligler et al.300RealExampleLink
FSDSRD14200SynthExampleLink
Jung et al.87RealExampleLink
OSR237RealExampleLink
WEZUT OCR176RealExampleLink
SD7K7620 (6479/760)RealExampleLink
SynDocDS50K (40K/5K)SynthLink

3.3 SOTA

<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Method</th> <th class="tg-c3ow" rowspan="2">Training data</th> <th class="tg-c3ow" colspan="3"><a href="https://openaccess.thecvf.com/content_cvpr_2018/html/Kligler_Document_Enhancement_Using_CVPR_2018_paper.html">Kligler et al. (300)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://link.springer.com/chapter/10.1007/978-3-030-20887-5_25">Jung et al. (87)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://www.mdpi.com/1424-8220/20/23/6929">OSR (237)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://www.mdpi.com/1424-8220/20/23/6929">RDD (545)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://www.mdpi.com/1424-8220/20/23/6929">SD7K (760)</a></th> </tr> <tr> <th class="tg-c3ow">RMSEā†“</th> <th class="tg-c3ow">PSNRā†‘</th> <th class="tg-c3ow">SSIMā†‘</th> <th class="tg-c3ow">RMSEā†“</th> <th class="tg-c3ow">PSNRā†‘</th> <th class="tg-c3ow">SSIMā†‘</th> <th class="tg-c3ow">RMSEā†“</th> <th class="tg-c3ow">PSNRā†‘</th> <th class="tg-c3ow">SSIMā†‘</th> <th class="tg-c3ow">RMSEā†“</th> <th class="tg-c3ow">PSNRā†‘</th> <th class="tg-c3ow">SSIMā†‘</th> <th class="tg-c3ow">RMSEā†“</th> <th class="tg-c3ow">PSNRā†‘</th> <th class="tg-c3ow">SSIMā†‘</th> </tr> </thead> <tbody> </tbody> <tbody> <!-- <tr> <td class="tg-c3ow">CVPR'18</td> <td class="tg-c3ow">Kligler et al.</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">22.81</td> <td class="tg-c3ow">21.21</td> <td class="tg-c3ow">0.8058</td> <td class="tg-c3ow">29.06</td> <td class="tg-c3ow">19.05</td> <td class="tg-c3ow">0.8274</td> <td class="tg-c3ow">33.50</td> <td class="tg-c3ow">17.84</td> <td class="tg-c3ow">0.8451</td> <td class="tg-c3ow">37.67</td> <td class="tg-c3ow">16.84</td> <td class="tg-c3ow">0.7668</td> </tr> --> <!-- <tr> <td class="tg-c3ow">CVPR'20</td> <td class="tg-c3ow">BEDSR-Net</td> <td class="tg-c3ow">SDSRD</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">32.90</td> <td class="tg-c3ow">0.9354</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">27.23</td> <td class="tg-c3ow">0.9115</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">CVPR'20</td> <td class="tg-c3ow">BEDSR-Net</td> <td class="tg-c3ow">FSDSRD</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">22.36</td> <td class="tg-c3ow">0.9286</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">22.38</td> <td class="tg-c3ow">0.9464</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">CVPR'20</td> <td class="tg-c3ow">BEDSR-Net</td> <td class="tg-c3ow">RDD</td> <td class="tg-c3ow">6.533</td> <td class="tg-c3ow">28.12</td> <td class="tg-c3ow">0.9320</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">2.937</td> <td class="tg-c3ow">34.928</td> <td class="tg-c3ow">0.973</td> </tr> <td class="tg-c3ow">CVPR'20</td> <td class="tg-c3ow">BEDSR-Net</td> <td class="tg-c3ow">Jung</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">ICIP'22</td> <td class="tg-c3ow">DSRFGD</td> <td class="tg-c3ow">FSDSRD</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">23.02</td> <td class="tg-c3ow">0.9302</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">21.62</td> <td class="tg-c3ow">0.9525</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">ArXiv'23</td> <td class="tg-c3ow">ShaDocFormer</td> <td class="tg-c3ow">RDD</td> <td class="tg-c3ow">13.17</td> <td class="tg-c3ow">26.36</td> <td class="tg-c3ow">0.90</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">8.9</td> <td class="tg-c3ow">29.46</td> <td class="tg-c3ow">0.92</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">CVPR'23</td> <td class="tg-c3ow">BGShadeNet_retest</td> <td class="tg-c3ow">RDD</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">35.36</td> <td class="tg-c3ow">17.34</td> <td class="tg-c3ow">0.9040</td> <td class="tg-c3ow">19.72</td> <td class="tg-c3ow">22.64</td> <td class="tg-c3ow">0.9388</td> <td class="tg-c3ow">6.02</td> <td class="tg-c3ow">33.33</td> <td class="tg-c3ow">0.9520</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">ICASSP'23</td> <td class="tg-c3ow"><a href='https://arxiv.org/abs/2211.16675'>ShaDocNet</a></td> <td class="tg-c3ow">kilger</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">ICASSP'23</td> <td class="tg-c3ow">ShaDocNet</td> <td class="tg-c3ow">Jung</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> --> <tbody> <tr> <td class="tg-c3ow">CVPR'23</td> <td class="tg-c3ow"><a href='https://openaccess.thecvf.com/content/CVPR2023/html/Zhang_Document_Image_Shadow_Removal_Guided_by_Color-Aware_Background_CVPR_2023_paper.html'>BGShadowNet</a></td> <td class="tg-c3ow">RDD</td> <td class="tg-c3ow">5.377</td> <td class="tg-c3ow">29.17</td> <td class="tg-c3ow">0.948</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">2.219</td> <td class="tg-c3ow">37.58</td> <td class="tg-c3ow">0.983</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> </tbody> <tr> <td class="tg-c3ow">ICCV'23</td> <td class="tg-c3ow"><a href='https://openaccess.thecvf.com/content/ICCV2023/html/Li_High-Resolution_Document_Shadow_Removal_via_A_Large-Scale_Real-World_Dataset_and_ICCV_2023_paper.html'>FSENet</a></td> <td class="tg-c3ow">SD7K</td> <td class="tg-c3ow">10.60</td> <td class="tg-c3ow">28.98</td> <td class="tg-c3ow">0.93</td> <td class="tg-c3ow">17.56</td> <td class="tg-c3ow">23.60 </td> <td class="tg-c3ow">0.85</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">10.00</td> <td class="tg-c3ow">28.67</td> <td class="tg-c3ow">0.96</td> </tr> </tbody> </tbody> <tr> <td class="tg-c3ow">CVPR'24</td> <td class="tg-c3ow"><a href='https://arxiv.org/pdf/2405.04408'>DocRes</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">27.14</td> <td class="tg-c3ow">0.900</td> <td class="tg-c3ow"></td> <td class="tg-c3ow">23.02</td> <td class="tg-c3ow">0.908</td> <td class="tg-c3ow"></td> <td class="tg-c3ow">21.64</td> <td class="tg-c3ow">0.937</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> </tbody> </table>

4. Dewarping

Dewarping, also referred to as geometric rectification, aims to rectify document images that suffer from curves, folds, crumples, perspective/affine deformation and other geometric distortions.

4.1 Papers

YearVenueTitleRepo
2018CVPRDocUNet: Document Image Unwarping via A Stacked U-Net
2019TOGDocument Rectification and Illumination Correction using a Patch-based CNNCode
2019ICCVDewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression NetworksCode
2020PRGeometric Rectification of Document Images using Adversarial Gated Unwarping Network
2020ECCVCan You Read Me Now? Content Aware Rectification using Angle Supervision
2020DASDewarping Document Image by Displacement Flow Estimation with Fully Convolutional NetworkCode
2021ACM MMDocTr: Document Image Transformer for Geometric Unwarping and Illumination CorrectionCode
2021ICCVEnd-to-end Piece-wise Unwarping of Document ImagesCode
2021ICDARDocument Dewarping with Control PointsCode
2022CVPRFourier Document Restoration for Robust Document Dewarping and Recognition
2022CVPRRevisiting Document Image Dewarping by Grid Regularization
2022ACM MMMarior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild
2022SIGGRAPHLearning From Documents in the Wild to Improve Document UnwarpingCode
2022ECCVGeometric Representation Learning for Document Image RectificationCode
2022ECCVLearning an Isometric Surface Parameterization for Texture UnwrappingCode
2022ArxivDocScanner: Robust Document Image Rectification with Progressive LearningCode
2022ICPRDocument Image Rectification in Complex Scene Using Stacked Siamese Networks
2023ArxivGeometric Rectification of Creased Document Images based on Isometric Mapping
2023IJDARAdaptive Dewarping of Severely Warped Camera-captured Document Images Based on Document Map Generation
2023TMMDeep Unrestricted Document Image RectificationCode
2023ArxivNeural Document Unwarping using Coupled Grids
2023IJDARInv3D: A High-resolution 3D Invoice Dataset for Template-guided Single-image Document UnwarpingCode
2023ArxivMataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary
2023ICCVWTemplate-guided Illumination Correction for Document Images with Imperfect Geometric ReconstructionCode
2023ICCVForeground and Text-lines Aware Document Image RectificationCode
2023ACM TOGLayout-Aware Single-Image Document FlateningCode
2023WACVDocReal: Robust Document Dewarping of Real-Life Images via Attention-Enhanced Control Point PredictionCode
2023TCSVTRethinking Supervision in Document Unwarping: A Self-consistent Flow-free Approach
2023SIGGRAPHUVDoc: Neural Grid-based Document UnwarpingCode
2023ArxivPolar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints under Polar Representation
2024ICASSPEfficient Joint Rectification of Photometric and Geometric Distortions in Document Images
2024ICDARCoarse-to-Fine Document Image Registration for DewarpingCode
2024CVPRDocRes: A Generalist Model Toward Unifying Document Image Restoration TasksCode
2024IJDARAm I readable? Transfer learning based document image rectification
2024ACM MMDocument Registration: Towards Automated Labeling of Pixel-Level Alignment Between Warped-Flat Documents

4.2 Dataset

DatasetNum.TypeExampleDownload/Codes
DocUNet130RealExampleLink
Doc3D100KSynth-Link
DIW5KRealExampleLink
WarpDoc1020RealExampleLink
DIR300300RealExampleLink
Inv3D25KSynthExampleLink
Inv3DReal360RealExampleLink
DICP-Synth-Link
DIF-Synth-Link
Simulated Paper90KSynth-Link
DocReal200RealExampleLink
UVDoc20KSynthExampleLink
WarpDoc-R840Real

4.3 SOTA

<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Method</th> <th class="tg-c3ow" colspan="3">DocUNet (130)</th> <th class="tg-c3ow" colspan="3">DIR300 (300)</th> <th class="tg-c3ow" colspan="2">DocReal (200)</th> <th class="tg-c3ow" colspan="2">UVDoc (50)</th> </tr> <tr> <th class="tg-c3ow">MS-SSIMā†‘</th> <th class="tg-c3ow">LDā†“</th> <th class="tg-c3ow">ADā†“</th> <th class="tg-c3ow">MS-SSIMā†‘</th> <th class="tg-c3ow">LDā†“</th> <th class="tg-c3ow">ADā†“</th> <th class="tg-c3ow">MS-SSIMā†‘</th> <th class="tg-c3ow">LDā†“</th> <th class="tg-c3ow">MS-SSIMā†‘</th> <th class="tg-c3ow">ADā†“</th> </tr> </thead> <tbody> <tr> <td class="tg-c3ow">ICCV'19</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content_ICCV_2019/html/Das_DewarpNet_Single-Image_Document_Unwarping_With_Stacked_3D_and_2D_Regression_ICCV_2019_paper.html">DewarpNet</a></td> <td class="tg-c3ow">0.474</td> <td class="tg-c3ow">8.39</td> <td class="tg-c3ow">0.426</td> <td class="tg-c3ow">0.492</td> <td class="tg-c3ow">13.94</td> <td class="tg-c3ow">0.331</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.589</td> <td class="tg-c3ow">0.193</td> </tr> <tr> <td class="tg-c3ow">DAS'20</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2104.06815.pdf">FCN-based</a></td> <td class="tg-c3ow">0.448</td> <td class="tg-c3ow">7.84</td> <td class="tg-c3ow">0.434</td> <td class="tg-c3ow">0.503</td> <td class="tg-c3ow">9.75</td> <td class="tg-c3ow">0.331</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ICCV'21</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/ICCV2021/papers/Das_End-to-End_Piece-Wise_Unwarping_of_Document_Images_ICCV_2021_paper.pdf">Piece-Wise</a></td> <td class="tg-c3ow">0.492</td> <td class="tg-c3ow">8.64</td> <td class="tg-c3ow">0.468</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ICDAR'21</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2203.10543.pdf">DDCP</a></td> <td class="tg-c3ow">0.473</td> <td class="tg-c3ow">8.99</td> <td class="tg-c3ow">0.453</td> <td class="tg-c3ow">0.552</td> <td class="tg-c3ow">10.95</td> <td class="tg-c3ow">0.357</td> <td class="tg-c3ow">0.46</td> <td class="tg-c3ow">16.04</td> <td class="tg-c3ow">0.585</td> <td class="tg-c3ow">0.290</td> </tr> <tr> <td class="tg-c3ow">MM'21</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2110.12942.pdf">DocTr</a></td> <td class="tg-c3ow">0.511</td> <td class="tg-c3ow">7.76</td> <td class="tg-c3ow">0.396</td> <td class="tg-c3ow">0.616</td> <td class="tg-c3ow">7.21</td> <td class="tg-c3ow">0.254</td> <td class="tg-c3ow">0.55</td> <td class="tg-c3ow">12.66</td> <td class="tg-c3ow">0.697</td> <td class="tg-c3ow">0.160</td> </tr> <tr> <td class="tg-c3ow">CVPR'22</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/CVPR2022/papers/Jiang_Revisiting_Document_Image_Dewarping_by_Grid_Regularization_CVPR_2022_paper.pdf">RDGR</a></td> <td class="tg-c3ow">0.497</td> <td class="tg-c3ow">8.51</td> <td class="tg-c3ow">0.461</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.610</td> <td class="tg-c3ow">0.280</td> </tr> <tr> <td class="tg-c3ow">MM'22</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2207.11515.pdf">Marior</a></td> <td class="tg-c3ow">0.478</td> <td class="tg-c3ow">7.27</td> <td class="tg-c3ow">0.403</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ECCV'22</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2210.08161.pdf">DocGeoNet</a></td> <td class="tg-c3ow">0.504</td> <td class="tg-c3ow">7.71</td> <td class="tg-c3ow">0.380</td> <td class="tg-c3ow">0.638</td> <td class="tg-c3ow">6.40</td> <td class="tg-c3ow">0.242</td> <td class="tg-c3ow">0.55</td> <td class="tg-c3ow">12.22</td> <td class="tg-c3ow">0.706</td> <td class="tg-c3ow">0.168</td> </tr> <tr> <td class="tg-c3ow">SIGGRAPH'22</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/pdf/10.1145/3528233.3530756">PaperEdge</a></td> <td class="tg-c3ow">0.473</td> <td class="tg-c3ow">7.81</td> <td class="tg-c3ow">0.392</td> <td class="tg-c3ow">0.583</td> <td class="tg-c3ow">8.00</td> <td class="tg-c3ow">0.255</td> <td class="tg-c3ow">0.52</td> <td class="tg-c3ow">11.46</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">Arxiv'22</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2110.14968">DocScanner-L</a></td> <td class="tg-c3ow">0.518</td> <td class="tg-c3ow">7.45</td> <td class="tg-c3ow">0.334</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ICCV'23</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Foreground_and_Text-lines_Aware_Document_Image_Rectification_ICCV_2023_paper.pdf">Li et al.</td> <td class="tg-c3ow">0.497</td> <td class="tg-c3ow">8.43</td> <td class="tg-c3ow">0.376</td> <td class="tg-c3ow">0.607</td> <td class="tg-c3ow">7.68</td> <td class="tg-c3ow">0.244</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">WACV'23</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/WACV2024/papers/Yu_DocReal_Robust_Document_Dewarping_of_Real-Life_Images_via_Attention-Enhanced_Control_WACV_2024_paper.pdf">DocReal</a></td> <td class="tg-c3ow">0.50</td> <td class="tg-c3ow">7.03</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"><b>0.56</b></td> <td class="tg-c3ow"><b>9.83</b></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">TCSVT'23</td> <td class="tg-c3ow"><a href="https://ieeexplore.ieee.org/abstract/document/10327775">DRNet</a></td> <td class="tg-c3ow">0.51</td> <td class="tg-c3ow">7.42</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">TMM'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2304.08796">DocTr++</a></td> <td class="tg-c3ow">0.51</td> <td class="tg-c3ow">7.54</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.45</td> <td class="tg-c3ow">19.88</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">Arxiv'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2312.07925">Polar-Doc</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.605</td> <td class="tg-c3ow">7.17</td> <td class="tg-c3ow">0.206</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">Arxiv'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2307.12571.pdf">MetaDoc</a></td> <td class="tg-c3ow">0.502</td> <td class="tg-c3ow">7.42</td> <td class="tg-c3ow">0.315</td> <td class="tg-c3ow">0.638</td> <td class="tg-c3ow">5.75</td> <td class="tg-c3ow"><b>0.178</b></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">SIGGRAPH'23</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/fullHtml/10.1145/3610548.3618174">UVDoc</a></td> <td class="tg-c3ow"><b>0.544</b></td> <td class="tg-c3ow">6.83</td> <td class="tg-c3ow">0.315</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"><b>0.785</b></td> <td class="tg-c3ow"><b>0.119</b></td> </tr> <tr> <td class="tg-c3ow">ACM TOG'23</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/pdf/10.1145/3627818">LA-DocFlatten</a></td> <td class="tg-c3ow">0.526</td> <td class="tg-c3ow"><b>6.72</b></td> <td class="tg-c3ow"><b>0.300</b></td> <td class="tg-c3ow">0.651</td> <td class="tg-c3ow"><b>5.70</b></td> <td class="tg-c3ow">0.195</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">CVPR'24</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2405.04408">DocRes</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.626</td> <td class="tg-c3ow">6.83</td> <td class="tg-c3ow">0.241</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">IJDAR'24</td> <td class="tg-c3ow"><a href="https://link.springer.com/article/10.1007/s10032-024-00476-9">DocTLNet</a></td> <td class="tg-c3ow">0.51</td> <td class="tg-c3ow"> 6.70</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"><b>0.658</b></td> <td class="tg-c3ow">5.75</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> </tbody> </table>

5. Deblur

5.1 Papers

YearVenueTitleRepo
2019NIPSSVDocNet: Spatially Variant U-Net for Blind Document Deblurring
2019MTADeepDeblur: text image recovery from blur to sharpcode
2020TPAMIDE-GAN: A Conditional Generative Adversarial Network for Document Enhancementcode
2021ICCVEnd-to-End Unsupervised Document Image Blind Denoising
2023ACM MMDocDiff: Document Enhancement via Residual Diffusion ModelscDiffcode
2024AAAIDocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple DegradationsCode
2024CVPRDocRes: A Generalist Model Toward Unifying Document Image Restoration TasksCode
2024ArxivNAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document EnhancementCode

5.2 Datasets

DatasetNum. (train/test)TypeExampleDownload
TDD (text deblur dataset)67.6K (66K/1.6K)SynthExampleLink

5.3 SOTA

Comding Soon ...

6. Binarization

6.1 Papers

YearVenueTitleRepo
2019PRDeepOtsu: Document enhancement and binarization using iterative deep learningcode
2021PRComplex image processing with less dataā€”Document image binarization by integrating multiple pre-trained U-Net modulescode
2022PRTwo-Stage Generative Adversarial Networks for Binarization of Color Document Imagescode
2023PRGDB: Gated Convolutions-based Document Binarizationcode
2023ACM MMDocDiff: Document Enhancement via Residual Diffusion ModelscDiffcode
2023ICDARColDBin: Cold Diffusion for Document Image Binarizationcode
2023IFA Novel Degraded Document Binarization Model through Vision Transformer Network
2023ArxivDocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization
2024AAAIDocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple DegradationsCode
2024CVPRDocRes: A Generalist Model Toward Unifying Document Image Restoration TasksCode
2024ArxivNAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document EnhancementCode

6.2 Datasets

DatasetNum.TypeExampleDownload
DocEng 201915RealExampleLink
DocEng 202032RealExampleLink
DocEng 2021222RealExampleLink
DocEng 202280RealExampleLink
DIBCO 200910RealExampleLink
H-DIBCO 201010RealExampleLink
DIBCO 201116RealExampleLink
H-DIBCO 201214RealExampleLink
DIBCO 201316RealExampleLink
H-DIBCO 201410RealExampleLink
H-DIBCO 201610RealExampleLink
DIBCO 201720RealExampleLink
DIBCO 201810RealExampleLink
DIBCO 201910RealExampleLink
Bickly-diary7RealExampleLink
Synchromedia Multispectral (MSI)240RealExampleLink
Persian Heritage Image Binarization ļ¼ˆPHIBDļ¼‰15RealExampleLink
Palm Leaf50RealExampleLink
NoiseOffice216SynthExampleLink
LRDE Document Binarization Dataset125Real-Link
Shipping label dataset1082RealExampleLink

6.3 SOTA

Coming Soon ...

ā­ Star Rising

Star Rising