Scene Text Image Super-Resolution via Content Perceptual Loss and Criss-Cross Transformer Blocks

October 13, 2022 · Declared Dead · 🏛 IEEE International Joint Conference on Neural Network

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Rui Qin, Bin Wang, Yu-Wing Tai arXiv ID 2210.06924 Category cs.CV: Computer Vision Citations 14 Venue IEEE International Joint Conference on Neural Network Last Checked 3 months ago

Abstract

Text image super-resolution is a unique and important task to enhance readability of text images to humans. It is widely used as pre-processing in scene text recognition. However, due to the complex degradation in natural scenes, recovering high-resolution texts from the low-resolution inputs is ambiguous and challenging. Existing methods mainly leverage deep neural networks trained with pixel-wise losses designed for natural image reconstruction, which ignore the unique character characteristics of texts. A few works proposed content-based losses. However, they only focus on text recognizers' accuracy, while the reconstructed images may still be ambiguous to humans. Further, they often have weak generalizability to handle cross languages. To this end, we present TATSR, a Text-Aware Text Super-Resolution framework, which effectively learns the unique text characteristics using Criss-Cross Transformer Blocks (CCTBs) and a novel Content Perceptual (CP) Loss. The CCTB extracts vertical and horizontal content information from text images by two orthogonal transformers, respectively. The CP Loss supervises the text reconstruction with content semantics by multi-scale text recognition features, which effectively incorporates content awareness into the framework. Extensive experiments on various language datasets demonstrate that TATSR outperforms state-of-the-art methods in terms of both recognition accuracy and human perception.