Non-restricted Winter 2026 convocation theses and dissertations will be discoverable in ERA on March 16. Congratulations to all our graduates!

Unknown-box Approximation to Improve Optical Character Recognition Performance

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Master's

Degree

Master of Science

Department

Department of Computing Science

Supervisor / Co-Supervisor and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

Optical character recognition (OCR) is a widely used pattern recognition application in numerous domains. Several feature-rich commercial OCR solutions and opensource OCR solutions are available for consumers, which can provide moderate to excellent accuracy levels. These solutions are general-purpose by design to serve a wider community. However, accuracy can diminish with difficult and uncommon document domains. Preprocessing of document images can be used to minimize the effect of domain shift. In this thesis, we investigate the possibility and the effect of using OCR engine feedback to train a preprocessor. The main obstacle in this approach is propagating the error signal through an opaque OCR engine. Circumventing this obstacle, we propose a novel preprocessor trained using gradient approximation. Unlike the previous OCR agnostic preprocessing techniques, the proposed training approach approximates a particular OCR engine's gradient and trains the preprocessor module eliminating the need for intermediate labels. We compare two different methods to our proposed approach to establish a better training pipeline. Experiments with two different datasets and two OCR engines show that the presented preprocessor is able to improve the accuracy of the OCR engine from the baseline accuracy by applying pixel-level manipulations to the document image.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source