Awesome
<!-- omit in toc -->š Recommendations of Document Image Processing
This repository contains a paper collection of the methods for document image processing, including appearance enhancement, deshadow, dewarping, deblur, and binarization.
<!-- omit in toc -->š„ Contents
- 1. Registration
- 2. Appearance Enhancement
- 3. Deshadow
- 4. Dewarping
- 5. Deblur
- 6. Binarization
- ā Star Rising
1. Registration
Document registration (also known as document alignment) aims to densely map two document images with the same content (such as a scanned and photographed version of the same document). It has important applications in automated data annotation and template-based dewarping tasks.
1.1 Papers
1.2 Datasets
Dataset | Num. (train/test) | Type | Example | Download |
---|---|---|---|---|
DocAlign12K | 12K (10K/2K) | Synth | Example | Link |
1.3 SOTA
<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Method</th> <th class="tg-c3ow" colspan="3">DocUNet (130)</th> </tr> <tr> <th class="tg-c3ow">MS-SSIMā</th> <th class="tg-c3ow">ADā</th> </tr> </thead> <tbody> <tr> <td class="tg-c3ow">Arxiv'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2306.05749">DocAligner</a></td> <td class="tg-c3ow">0.8232</td> <td class="tg-c3ow">0.0445</td> </tr> </tbody> </table>2. Appearance Enhancement
Appearance enhancement (also known as illumination correction) is not limited to a specific degradation type and aims to restore a clean appearance similar to that obtained from a scanner or digital born PDF files.
2.1 Papers
2.2 Datasets
Dataset | Num. (train/test) | Type | Example | Download |
---|---|---|---|---|
Doc3DShade | 90K | Synth | Example | Link |
DocProj | 2450 | Synth | Example | Link |
DocUNet from DocAligner | 130 | Real | Example | Link |
RealDAE | 600 (450/150) | Real | Example | Link |
Inv3D | 25K | Synth | Example | Link |
2.3 Apps
2.4 SOTA
<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Methods</th> <th class="tg-c3ow" rowspan="2">Training data</th> <th class="tg-c3ow" colspan="2">DocUNet from DocAligner (130)</th> <th class="tg-c3ow" colspan="2">RealDAE (150)</th> </tr> <tr> <th class="tg-c3ow">SSIM</th> <th class="tg-c3ow">PSNR</th> <th class="tg-c3ow">SSIM</th> <th class="tg-c3ow">PSNR</th> </tr> </thead> <tbody> <tr> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">0.7195</td> <td class="tg-c3ow">13.09</td> <td class="tg-c3ow">0.8264</td> <td class="tg-c3ow">12.26</td> </tr> <tr> <td class="tg-c3ow">TOG'19</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/1909.09470.pdf">DocProj</a></td> <td class="tg-c3ow">DocProj</td> <td class="tg-c3ow">0.7098</td> <td class="tg-c3ow">14.71</td> <td class="tg-c3ow">0.8684</td> <td class="tg-c3ow">19.35</td> </tr> <tr> <td class="tg-c3ow">BMVC'20</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2011.14447">Das et al.</a></td> <td class="tg-c3ow">Doc3DShade</td> <td class="tg-c3ow">0.7276</td> <td class="tg-c3ow">16.42</td> <td class="tg-c3ow">0.8633</td> <td class="tg-c3ow">19.87</td> </tr> <tr> <td class="tg-c3ow">MM'21</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2110.12942.pdf">DocTr</a></td> <td class="tg-c3ow">DocProj</td> <td class="tg-c3ow">0.7067</td> <td class="tg-c3ow">15.78</td> <td class="tg-c3ow">0.7925</td> <td class="tg-c3ow">18.62</td> </tr> <tr> <td class="tg-c3ow">MM'22</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/abs/10.1145/3503161.3547916">UDoc-GAN</a></td> <td class="tg-c3ow">DocProj</td> <td class="tg-c3ow">0.6833</td> <td class="tg-c3ow">14.29</td> <td class="tg-c3ow">0.7558</td> <td class="tg-c3ow">16.43</td> </tr> <tr> <td class="tg-c3ow">TAI'23</td> <td class="tg-c3ow"><a href="https://ieeexplore.ieee.org/abstract/document/10268585/">GCDRNet</a></td> <td class="tg-c3ow">RealDAE</td> <td class="tg-c3ow"><b>0.7658</b></td> <td class="tg-c3ow">17.09</td> <td class="tg-c3ow"><b>0.9423</b></td> <td class="tg-c3ow">24.42</td> </tr> <tr> <td class="tg-c3ow">CVPR'24</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2405.04408">DocRes</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.7598</td> <td class="tg-c3ow"><b>17.60</b></td> <td class="tg-c3ow">0.9219</td> <td class="tg-c3ow"><b>24.65</b></td> </tr> </tbody> </table>3. Deshadow
Deshadowing aims to eliminate shadows that are mainly caused by occlusion to obtain shadow-free document images.
3.1 Papers
* indicates that the implementation is unofficial.
3.2 Datasets
Dataset | Num. (train/test) | Type | Example | Download |
---|---|---|---|---|
RDD | 4916 (4371/545) | Real | Example | Link |
Kligler et al. | 300 | Real | Example | Link |
FSDSRD | 14200 | Synth | Example | Link |
Jung et al. | 87 | Real | Example | Link |
OSR | 237 | Real | Example | Link |
WEZUT OCR | 176 | Real | Example | Link |
SD7K | 7620 (6479/760) | Real | Example | Link |
SynDocDS | 50K (40K/5K) | Synth | Link |
3.3 SOTA
<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Method</th> <th class="tg-c3ow" rowspan="2">Training data</th> <th class="tg-c3ow" colspan="3"><a href="https://openaccess.thecvf.com/content_cvpr_2018/html/Kligler_Document_Enhancement_Using_CVPR_2018_paper.html">Kligler et al. (300)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://link.springer.com/chapter/10.1007/978-3-030-20887-5_25">Jung et al. (87)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://www.mdpi.com/1424-8220/20/23/6929">OSR (237)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://www.mdpi.com/1424-8220/20/23/6929">RDD (545)</a></th> <th class="tg-c3ow" colspan="3"><a href="https://www.mdpi.com/1424-8220/20/23/6929">SD7K (760)</a></th> </tr> <tr> <th class="tg-c3ow">RMSEā</th> <th class="tg-c3ow">PSNRā</th> <th class="tg-c3ow">SSIMā</th> <th class="tg-c3ow">RMSEā</th> <th class="tg-c3ow">PSNRā</th> <th class="tg-c3ow">SSIMā</th> <th class="tg-c3ow">RMSEā</th> <th class="tg-c3ow">PSNRā</th> <th class="tg-c3ow">SSIMā</th> <th class="tg-c3ow">RMSEā</th> <th class="tg-c3ow">PSNRā</th> <th class="tg-c3ow">SSIMā</th> <th class="tg-c3ow">RMSEā</th> <th class="tg-c3ow">PSNRā</th> <th class="tg-c3ow">SSIMā</th> </tr> </thead> <tbody> </tbody> <tbody> <!-- <tr> <td class="tg-c3ow">CVPR'18</td> <td class="tg-c3ow">Kligler et al.</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">22.81</td> <td class="tg-c3ow">21.21</td> <td class="tg-c3ow">0.8058</td> <td class="tg-c3ow">29.06</td> <td class="tg-c3ow">19.05</td> <td class="tg-c3ow">0.8274</td> <td class="tg-c3ow">33.50</td> <td class="tg-c3ow">17.84</td> <td class="tg-c3ow">0.8451</td> <td class="tg-c3ow">37.67</td> <td class="tg-c3ow">16.84</td> <td class="tg-c3ow">0.7668</td> </tr> --> <!-- <tr> <td class="tg-c3ow">CVPR'20</td> <td class="tg-c3ow">BEDSR-Net</td> <td class="tg-c3ow">SDSRD</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">32.90</td> <td class="tg-c3ow">0.9354</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">27.23</td> <td class="tg-c3ow">0.9115</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">CVPR'20</td> <td class="tg-c3ow">BEDSR-Net</td> <td class="tg-c3ow">FSDSRD</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">22.36</td> <td class="tg-c3ow">0.9286</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">22.38</td> <td class="tg-c3ow">0.9464</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">CVPR'20</td> <td class="tg-c3ow">BEDSR-Net</td> <td class="tg-c3ow">RDD</td> <td class="tg-c3ow">6.533</td> <td class="tg-c3ow">28.12</td> <td class="tg-c3ow">0.9320</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">2.937</td> <td class="tg-c3ow">34.928</td> <td class="tg-c3ow">0.973</td> </tr> <td class="tg-c3ow">CVPR'20</td> <td class="tg-c3ow">BEDSR-Net</td> <td class="tg-c3ow">Jung</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">ICIP'22</td> <td class="tg-c3ow">DSRFGD</td> <td class="tg-c3ow">FSDSRD</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">23.02</td> <td class="tg-c3ow">0.9302</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">21.62</td> <td class="tg-c3ow">0.9525</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">ArXiv'23</td> <td class="tg-c3ow">ShaDocFormer</td> <td class="tg-c3ow">RDD</td> <td class="tg-c3ow">13.17</td> <td class="tg-c3ow">26.36</td> <td class="tg-c3ow">0.90</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">8.9</td> <td class="tg-c3ow">29.46</td> <td class="tg-c3ow">0.92</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">CVPR'23</td> <td class="tg-c3ow">BGShadeNet_retest</td> <td class="tg-c3ow">RDD</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">35.36</td> <td class="tg-c3ow">17.34</td> <td class="tg-c3ow">0.9040</td> <td class="tg-c3ow">19.72</td> <td class="tg-c3ow">22.64</td> <td class="tg-c3ow">0.9388</td> <td class="tg-c3ow">6.02</td> <td class="tg-c3ow">33.33</td> <td class="tg-c3ow">0.9520</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">ICASSP'23</td> <td class="tg-c3ow"><a href='https://arxiv.org/abs/2211.16675'>ShaDocNet</a></td> <td class="tg-c3ow">kilger</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> </tbody> <tbody> <tr> <td class="tg-c3ow">ICASSP'23</td> <td class="tg-c3ow">ShaDocNet</td> <td class="tg-c3ow">Jung</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> <td class="tg-c3ow">-</td> </tr> --> <tbody> <tr> <td class="tg-c3ow">CVPR'23</td> <td class="tg-c3ow"><a href='https://openaccess.thecvf.com/content/CVPR2023/html/Zhang_Document_Image_Shadow_Removal_Guided_by_Color-Aware_Background_CVPR_2023_paper.html'>BGShadowNet</a></td> <td class="tg-c3ow">RDD</td> <td class="tg-c3ow">5.377</td> <td class="tg-c3ow">29.17</td> <td class="tg-c3ow">0.948</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">2.219</td> <td class="tg-c3ow">37.58</td> <td class="tg-c3ow">0.983</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> </tbody> <tr> <td class="tg-c3ow">ICCV'23</td> <td class="tg-c3ow"><a href='https://openaccess.thecvf.com/content/ICCV2023/html/Li_High-Resolution_Document_Shadow_Removal_via_A_Large-Scale_Real-World_Dataset_and_ICCV_2023_paper.html'>FSENet</a></td> <td class="tg-c3ow">SD7K</td> <td class="tg-c3ow">10.60</td> <td class="tg-c3ow">28.98</td> <td class="tg-c3ow">0.93</td> <td class="tg-c3ow">17.56</td> <td class="tg-c3ow">23.60 </td> <td class="tg-c3ow">0.85</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">10.00</td> <td class="tg-c3ow">28.67</td> <td class="tg-c3ow">0.96</td> </tr> </tbody> </tbody> <tr> <td class="tg-c3ow">CVPR'24</td> <td class="tg-c3ow"><a href='https://arxiv.org/pdf/2405.04408'>DocRes</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">27.14</td> <td class="tg-c3ow">0.900</td> <td class="tg-c3ow"></td> <td class="tg-c3ow">23.02</td> <td class="tg-c3ow">0.908</td> <td class="tg-c3ow"></td> <td class="tg-c3ow">21.64</td> <td class="tg-c3ow">0.937</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> </tbody> </table>4. Dewarping
Dewarping, also referred to as geometric rectification, aims to rectify document images that suffer from curves, folds, crumples, perspective/affine deformation and other geometric distortions.
4.1 Papers
4.2 Dataset
Dataset | Num. | Type | Example | Download/Codes |
---|---|---|---|---|
DocUNet | 130 | Real | Example | Link |
Doc3D | 100K | Synth | - | Link |
DIW | 5K | Real | Example | Link |
WarpDoc | 1020 | Real | Example | Link |
DIR300 | 300 | Real | Example | Link |
Inv3D | 25K | Synth | Example | Link |
Inv3DReal | 360 | Real | Example | Link |
DICP | - | Synth | - | Link |
DIF | - | Synth | - | Link |
Simulated Paper | 90K | Synth | - | Link |
DocReal | 200 | Real | Example | Link |
UVDoc | 20K | Synth | Example | Link |
WarpDoc-R | 840 | Real |
4.3 SOTA
<table class="tg"> <thead> <tr> <th class="tg-c3ow" rowspan="2">Venue</th> <th class="tg-c3ow" rowspan="2">Method</th> <th class="tg-c3ow" colspan="3">DocUNet (130)</th> <th class="tg-c3ow" colspan="3">DIR300 (300)</th> <th class="tg-c3ow" colspan="2">DocReal (200)</th> <th class="tg-c3ow" colspan="2">UVDoc (50)</th> </tr> <tr> <th class="tg-c3ow">MS-SSIMā</th> <th class="tg-c3ow">LDā</th> <th class="tg-c3ow">ADā</th> <th class="tg-c3ow">MS-SSIMā</th> <th class="tg-c3ow">LDā</th> <th class="tg-c3ow">ADā</th> <th class="tg-c3ow">MS-SSIMā</th> <th class="tg-c3ow">LDā</th> <th class="tg-c3ow">MS-SSIMā</th> <th class="tg-c3ow">ADā</th> </tr> </thead> <tbody> <tr> <td class="tg-c3ow">ICCV'19</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content_ICCV_2019/html/Das_DewarpNet_Single-Image_Document_Unwarping_With_Stacked_3D_and_2D_Regression_ICCV_2019_paper.html">DewarpNet</a></td> <td class="tg-c3ow">0.474</td> <td class="tg-c3ow">8.39</td> <td class="tg-c3ow">0.426</td> <td class="tg-c3ow">0.492</td> <td class="tg-c3ow">13.94</td> <td class="tg-c3ow">0.331</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.589</td> <td class="tg-c3ow">0.193</td> </tr> <tr> <td class="tg-c3ow">DAS'20</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2104.06815.pdf">FCN-based</a></td> <td class="tg-c3ow">0.448</td> <td class="tg-c3ow">7.84</td> <td class="tg-c3ow">0.434</td> <td class="tg-c3ow">0.503</td> <td class="tg-c3ow">9.75</td> <td class="tg-c3ow">0.331</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ICCV'21</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/ICCV2021/papers/Das_End-to-End_Piece-Wise_Unwarping_of_Document_Images_ICCV_2021_paper.pdf">Piece-Wise</a></td> <td class="tg-c3ow">0.492</td> <td class="tg-c3ow">8.64</td> <td class="tg-c3ow">0.468</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ICDAR'21</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2203.10543.pdf">DDCP</a></td> <td class="tg-c3ow">0.473</td> <td class="tg-c3ow">8.99</td> <td class="tg-c3ow">0.453</td> <td class="tg-c3ow">0.552</td> <td class="tg-c3ow">10.95</td> <td class="tg-c3ow">0.357</td> <td class="tg-c3ow">0.46</td> <td class="tg-c3ow">16.04</td> <td class="tg-c3ow">0.585</td> <td class="tg-c3ow">0.290</td> </tr> <tr> <td class="tg-c3ow">MM'21</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2110.12942.pdf">DocTr</a></td> <td class="tg-c3ow">0.511</td> <td class="tg-c3ow">7.76</td> <td class="tg-c3ow">0.396</td> <td class="tg-c3ow">0.616</td> <td class="tg-c3ow">7.21</td> <td class="tg-c3ow">0.254</td> <td class="tg-c3ow">0.55</td> <td class="tg-c3ow">12.66</td> <td class="tg-c3ow">0.697</td> <td class="tg-c3ow">0.160</td> </tr> <tr> <td class="tg-c3ow">CVPR'22</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/CVPR2022/papers/Jiang_Revisiting_Document_Image_Dewarping_by_Grid_Regularization_CVPR_2022_paper.pdf">RDGR</a></td> <td class="tg-c3ow">0.497</td> <td class="tg-c3ow">8.51</td> <td class="tg-c3ow">0.461</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.610</td> <td class="tg-c3ow">0.280</td> </tr> <tr> <td class="tg-c3ow">MM'22</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2207.11515.pdf">Marior</a></td> <td class="tg-c3ow">0.478</td> <td class="tg-c3ow">7.27</td> <td class="tg-c3ow">0.403</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ECCV'22</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2210.08161.pdf">DocGeoNet</a></td> <td class="tg-c3ow">0.504</td> <td class="tg-c3ow">7.71</td> <td class="tg-c3ow">0.380</td> <td class="tg-c3ow">0.638</td> <td class="tg-c3ow">6.40</td> <td class="tg-c3ow">0.242</td> <td class="tg-c3ow">0.55</td> <td class="tg-c3ow">12.22</td> <td class="tg-c3ow">0.706</td> <td class="tg-c3ow">0.168</td> </tr> <tr> <td class="tg-c3ow">SIGGRAPH'22</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/pdf/10.1145/3528233.3530756">PaperEdge</a></td> <td class="tg-c3ow">0.473</td> <td class="tg-c3ow">7.81</td> <td class="tg-c3ow">0.392</td> <td class="tg-c3ow">0.583</td> <td class="tg-c3ow">8.00</td> <td class="tg-c3ow">0.255</td> <td class="tg-c3ow">0.52</td> <td class="tg-c3ow">11.46</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">Arxiv'22</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2110.14968">DocScanner-L</a></td> <td class="tg-c3ow">0.518</td> <td class="tg-c3ow">7.45</td> <td class="tg-c3ow">0.334</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">ICCV'23</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Foreground_and_Text-lines_Aware_Document_Image_Rectification_ICCV_2023_paper.pdf">Li et al.</td> <td class="tg-c3ow">0.497</td> <td class="tg-c3ow">8.43</td> <td class="tg-c3ow">0.376</td> <td class="tg-c3ow">0.607</td> <td class="tg-c3ow">7.68</td> <td class="tg-c3ow">0.244</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">WACV'23</td> <td class="tg-c3ow"><a href="https://openaccess.thecvf.com/content/WACV2024/papers/Yu_DocReal_Robust_Document_Dewarping_of_Real-Life_Images_via_Attention-Enhanced_Control_WACV_2024_paper.pdf">DocReal</a></td> <td class="tg-c3ow">0.50</td> <td class="tg-c3ow">7.03</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"><b>0.56</b></td> <td class="tg-c3ow"><b>9.83</b></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">TCSVT'23</td> <td class="tg-c3ow"><a href="https://ieeexplore.ieee.org/abstract/document/10327775">DRNet</a></td> <td class="tg-c3ow">0.51</td> <td class="tg-c3ow">7.42</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">TMM'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2304.08796">DocTr++</a></td> <td class="tg-c3ow">0.51</td> <td class="tg-c3ow">7.54</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.45</td> <td class="tg-c3ow">19.88</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">Arxiv'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/abs/2312.07925">Polar-Doc</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.605</td> <td class="tg-c3ow">7.17</td> <td class="tg-c3ow">0.206</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">Arxiv'23</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2307.12571.pdf">MetaDoc</a></td> <td class="tg-c3ow">0.502</td> <td class="tg-c3ow">7.42</td> <td class="tg-c3ow">0.315</td> <td class="tg-c3ow">0.638</td> <td class="tg-c3ow">5.75</td> <td class="tg-c3ow"><b>0.178</b></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">SIGGRAPH'23</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/fullHtml/10.1145/3610548.3618174">UVDoc</a></td> <td class="tg-c3ow"><b>0.544</b></td> <td class="tg-c3ow">6.83</td> <td class="tg-c3ow">0.315</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"><b>0.785</b></td> <td class="tg-c3ow"><b>0.119</b></td> </tr> <tr> <td class="tg-c3ow">ACM TOG'23</td> <td class="tg-c3ow"><a href="https://dl.acm.org/doi/pdf/10.1145/3627818">LA-DocFlatten</a></td> <td class="tg-c3ow">0.526</td> <td class="tg-c3ow"><b>6.72</b></td> <td class="tg-c3ow"><b>0.300</b></td> <td class="tg-c3ow">0.651</td> <td class="tg-c3ow"><b>5.70</b></td> <td class="tg-c3ow">0.195</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">CVPR'24</td> <td class="tg-c3ow"><a href="https://arxiv.org/pdf/2405.04408">DocRes</a></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow">0.626</td> <td class="tg-c3ow">6.83</td> <td class="tg-c3ow">0.241</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> <tr> <td class="tg-c3ow">IJDAR'24</td> <td class="tg-c3ow"><a href="https://link.springer.com/article/10.1007/s10032-024-00476-9">DocTLNet</a></td> <td class="tg-c3ow">0.51</td> <td class="tg-c3ow"> 6.70</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"><b>0.658</b></td> <td class="tg-c3ow">5.75</td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> <td class="tg-c3ow"></td> </tr> </tbody> </table>- Note that the 127th and 128th distorted images in DocUNet benchmark are rotated by 180 degrees, which does not match the ground truth documents. The performance reported here is based on corrected data.
- Note that the UVDoc benchmark reported in our repository is based on the full UVDoc benchmark dataset (reported on the official github page). The results in the paper used only half of the UVDoc benchmark.
5. Deblur
5.1 Papers
5.2 Datasets
Dataset | Num. (train/test) | Type | Example | Download |
---|---|---|---|---|
TDD (text deblur dataset) | 67.6K (66K/1.6K) | Synth | Example | Link1, Link2 |
5.3 SOTA
Coming Soon ...
6. Binarization
6.1 Papers
6.2 Datasets
Dataset | Num. | Type | Example | Download |
---|---|---|---|---|
DocEng 2019 | 15 | Real | Example | Link |
DocEng 2020 | 32 | Real | Example | Link |
DocEng 2021 | 222 | Real | Example | Link |
DocEng 2022 | 80 | Real | Example | Link |
DIBCO 2009 | 10 | Real | Example | Link |
H-DIBCO 2010 | 10 | Real | Example | Link |
DIBCO 2011 | 16 | Real | Example | Link |
H-DIBCO 2012 | 14 | Real | Example | Link |
DIBCO 2013 | 16 | Real | Example | Link |
H-DIBCO 2014 | 10 | Real | Example | Link |
H-DIBCO 2016 | 10 | Real | Example | Link |
DIBCO 2017 | 20 | Real | Example | Link |
DIBCO 2018 | 10 | Real | Example | Link |
DIBCO 2019 | 10 | Real | Example | Link |
Bickly-diary | 7 | Real | Example | Link |
Synchromedia Multispectral (MSI) | 240 | Real | Example | Link |
Persian Heritage Image Binarization ļ¼PHIBDļ¼ | 15 | Real | Example | Link |
Palm Leaf | 50 | Real | Example | Link |
NoiseOffice | 216 | Synth | Example | Link |
LRDE Document Binarization Dataset | 125 | Real | - | Link |
Shipping label dataset | 1082 | Real | Example | Link |
6.3 SOTA
Coming Soon ...