Unpaired Document Image Denoising for OCR using BiLSTM enhanced CycleGAN
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
The recognition performance of Optical Character Recognition (OCR) models can be sub-optimal when document images suffer from various degradations. Supervised learning-based methods for image enhancement can generate high-quality enhanced images. However, these methods require the availability of corresponding clean images or ground truth text for training. Moreover, the paired training data used for training these models is usually generated by adding different types of synthetic noise to clean images. Real-world noise is more challenging and complex in nature compared to synthetic noise. To effectively enhance real-world noisy images, the models must be trained using real noisy images. However, it is infeasible to have corresponding clean images for real-world noisy images, and creating ground truth text requires manual effort. Unsupervised methods have been explored in recent years, focusing on enhancing natural scene images. In the case of document images, preserving the readability of text in the enhanced images is of utmost importance for improved OCR performance. In this thesis, we explore the possibility of enhancing documents in an unsupervised setting using unpaired training samples. To this end, we propose a modified architecture for the standard CycleGAN model to improve its performance in enhancing document images with better text preservation. The results indicate that the proposed model leads to better preservation of text and improved OCR performance compared to the CycleGAN model and classical unsupervised image preprocessing techniques like Sauvola and Otsu.
