Black History Month is here! Discover ERA research focused on Black experiences in Canada and worldwide. Use our general search below to get started!

Multiple Kernel Learning with Many Kernels

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Doctoral

Degree

Doctor of Philosophy

Department

Department of Computing Science

Supervisor / Co-Supervisor and Their Department(s)

Examining Committee Member(s) and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

Multiple kernel learning (MKL) addresses the problem of learning the kernel function from data. Since a kernel function is associated with an underlying feature space, MKL can be considered as a systematic approach to feature selection. Many of the existing MKL algorithms perform kernel learning by combining a given set of base kernels. While efficient MKL algorithms are capable of handling large training sets, their computational complexity depends linearly on the number of base kernels. Hence, these algorithms are not scalable to problems with many kernels.

In this thesis, we investigate MKL when the number of kernels is very large. We aim to exploit the structure of kernel space to design efficient MKL algorithms. Such a structure exists for some of the important families of kernels, such as polynomial kernels and Gaussian kernels. We propose efficient gradient-based algorithms, with convergence guarantees, for the two prominent problem formulations of MKL, i.e. one-stage and two-stage MKL. We show that stochastic gradient descent and greedy coordinate descent are suitable algorithms to deal with very large kernel sets.

We propose an efficient stochastic gradient descent method for one-stage MKL. We show that sampling a coordinate with a probability proportional to the magnitude of gradient results in a low-variance estimate of gradient. For the case of learning polynomial kernels we show that the computational complexity of our algorithm has only a logarithmic dependence on the number of kernels. We show how greedy coordinate descent can be applied to one-stage and two-stage MKL. Greedy coordinate descent is in particular useful when the goal is to combine continuously-parameterized kernels, such as Gaussian kernels.

We perform thorough empirical study on synthetic and real data to compare the new gradient-based algorithms against several MKL algorithms. Our experiments demonstrate that our new methods are competitive in terms of generalization performance, while their computational cost is significantly less than other methods that enjoy similarly good generalization performance.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source