Awesome
DCDM (Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution)
Abstract. Severe blurring of scene text images, resulting in the loss of critical strokes and textual information, has a profound impact on text readability and recognizability. Therefore, scene text image super- resolution, aiming to enhance text resolution and legibility in low-resolution images, is a crucial task. In this paper, we introduce a novel genera- tive model for scene text super-resolution called diffusion-conditioned- diffusion model (DCDM). The model is designed to learn the distribu- tion of high-resolution images via two conditions: 1) the low-resolution image and 2) the character-level text embedding generated by a latent diffusion text model. The latent diffusion text module is specifically de- signed to generate character-level text embedding space from the latent space of low-resolution images. Additionally, the character-level CLIP module has been used to align the high-resolution character-level text embeddings with low-resolution embeddings. This ensures visual align- ment with the semantics of scene text image characters. Our experiments on the TextZoom dataset demonstrate the superiority of the proposed method to state-of-the-art methods.