Unknown-box Approximation to Improve Optical Character Recognition Performance
Date
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
Optical character recognition (OCR) is a widely used pattern recognition application in numerous domains. Several feature-rich commercial OCR solutions and opensource OCR solutions are available for consumers, which can provide moderate to excellent accuracy levels. These solutions are general-purpose by design to serve a wider community. However, accuracy can diminish with difficult and uncommon document domains. Preprocessing of document images can be used to minimize the effect of domain shift. In this thesis, we investigate the possibility and the effect of using OCR engine feedback to train a preprocessor. The main obstacle in this approach is propagating the error signal through an opaque OCR engine. Circumventing this obstacle, we propose a novel preprocessor trained using gradient approximation. Unlike the previous OCR agnostic preprocessing techniques, the proposed training approach approximates a particular OCR engine's gradient and trains the preprocessor module eliminating the need for intermediate labels. We compare two different methods to our proposed approach to establish a better training pipeline. Experiments with two different datasets and two OCR engines show that the presented preprocessor is able to improve the accuracy of the OCR engine from the baseline accuracy by applying pixel-level manipulations to the document image.
