Learning and Planning with the Average-Reward Formulation
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
The average-reward formulation is a natural and important formulation of learning and planning problems, yet has received much less attention than the episodic and discounted formulations. This dissertation makes three areas of contributions to algorithms and their theories concerning the average-reward formulation, primarily through the lens of reinforcement learning. The first area of contributions is a family of tabular learning and planning average-reward algorithms and their convergence theories. The second area of contributions of this dissertation is a complete extension of the options framework (Sutton, Precup, and Singh 1999) for temporal abstraction from the discounted formulation to the average-reward formulation. The extension includes general convergent off-policy inter-option learning algorithms, intra-option algorithms for learning values and models, as well as incremental planning variants of the learning algorithms, an option-interrupting algorithm, and convergence theories of the algorithms. The third area of contributions includes an average-reward prediction function approximation algorithm, its convergence analysis, and an error bound for the convergence point.
