Explorations in the Foundations of Value-based Reinforcement Learning
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
Value-based reinforcement learning is an approach to sequential decision making in which decisions are informed by learned, long-horizon predictions of future reward. This dissertation aims to understand issues that value-based methods face and develop algorithmic ideas to address these issues. It details three areas of contribution toward improving value-based methods. The first area of contribution extends temporal difference methods for fixed-horizon predictions. Regardless of problem setting, using fixed-horizon approximations of the return avoids the well-documented stability issues which plague off-policy temporal difference methods with function approximation. The second area of contribution introduces a framework of value-aware importance weights for off-policy learning and derives a minimum-variance instance of them. This alleviates variance concerns of importance sampling-based off-policy corrections. Lastly, the third area of contribution acknowledges a discrepancy between the discrete-time and continuous-time returns when viewing one as an approximation of the other, and proposes a modification to better align the objectives. This provides improved prediction targets, and when faced with variable time-discretization, improves control performance in terms of an underlying integral return.
