Perceptually Motivated Algorithms for Multimedia

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Doctoral

Degree

Doctor of Philosophy

Department

Department of Computing Science

Supervisor / Co-Supervisor and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

Perceptual factors in vision can facilitate the development of more effective multimedia algorithms. In particular, the wide dynamic range of the human vision system is a motivation for developing image lighting enhancement algorithms. Image lighting enhancement can be achieved by capturing multiple images with different exposure settings and then reconstructing a final image. However, this approach cannot solve the problem of revealing or predicting details in already-captured images. Single-image lighting enhancement is desirable for this scenario, but many challenges remain to be addressed including over-enhancement, noise, and color artifacts due to a lack of understanding of the image content. Another aspect of multimedia algorithms that can benefit from perceptual factors, like the foveation mechanism and perceptual quality, is image and video compression. As the resolution and image quality of modern cameras have increased, the amount of data produced by computational photography has also surged dramatically. This has created a demand for better image/video compression methods that can reduce the data size without compromising the image quality. In this thesis, four perceptually motivated methods are proposed to address the challenges in single-image lighting enhancement and image/video compression. First, we propose an image lighting enhancement method based on a fusion pyramid, which is a traditional contrast-based fusion approach. Second, we propose a self-attention-based learning strategy to reconstruct a properly exposed image from a single input image. We leverage the self-attention mechanism to model the interdependencies between different locations, and design a generative adversarial network (GAN) with a custom HDR loss function to improve the image quality. Third, we propose a novel video compression method that integrates visual saliency information with foveation to reduce perceptual redundancy. This is an innovative approach to subsample and restore the input image using saliency data, which allocates more space for salient regions and less for non-salient ones. Finally, based on the assumption that a group of images can be decomposed into several shared feature matrices, we propose a novel principal component approximation network (PCANet) for image compression. This is the first learning-based method that achieves promising performance while including the size of the network in the bitrate calculation.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source