Black History Month is here! Discover ERA research focused on Black experiences in Canada and worldwide. Use our general search below to get started!

Learning and Planning with the Average-Reward Formulation

dc.contributor.advisorSutton, Richard (Computing Schience)
dc.contributor.authorWan, Yi
dc.date.accessioned2025-05-28T18:48:31Z
dc.date.available2025-05-28T18:48:31Z
dc.date.issued2023-11
dc.description.abstractThe average-reward formulation is a natural and important formulation of learning and planning problems, yet has received much less attention than the episodic and discounted formulations. This dissertation makes three areas of contributions to algorithms and their theories concerning the average-reward formulation, primarily through the lens of reinforcement learning. The first area of contributions is a family of tabular learning and planning average-reward algorithms and their convergence theories. The second area of contributions of this dissertation is a complete extension of the options framework (Sutton, Precup, and Singh 1999) for temporal abstraction from the discounted formulation to the average-reward formulation. The extension includes general convergent off-policy inter-option learning algorithms, intra-option algorithms for learning values and models, as well as incremental planning variants of the learning algorithms, an option-interrupting algorithm, and convergence theories of the algorithms. The third area of contributions includes an average-reward prediction function approximation algorithm, its convergence analysis, and an error bound for the convergence point.
dc.identifier.doihttps://doi.org/10.7939/r3-6qf9-8826
dc.language.isoen
dc.rightsThis thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
dc.subjectAverage-Reward Formulation
dc.subjectReinforcement Learning
dc.subjectMarkov Decision Process
dc.subjectAverage-Reward Markov Decision Processes
dc.titleLearning and Planning with the Average-Reward Formulation
dc.typehttp://purl.org/coar/resource_type/c_46ec
thesis.degree.grantorhttp://id.loc.gov/authorities/names/n79058482
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
ual.date.graduationFall 2023
ual.departmentDepartment of Computing Science
ual.jupiterAccesshttp://terms.library.ualberta.ca/public

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wan_Yi_202308_PhD.pdf
Size:
3.44 MB
Format:
Adobe Portable Document Format