Strengths, Weaknesses, and Combinations of Model-based and Model-free Reinforcement Learning
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Examining Committee Member(s) and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
Reinforcement learning algorithms are conventionally divided into two approaches: a model-based approach that builds a model of the environment and then computes a value function from the model, and a model-free approach that directly estimates the value function. The first contribution of this thesis is to demonstrate that, with similar computational resources, neither approach dominates the other. Explicitly, the model-based approach achieves a better performance with fewer environmental interactions, while the model-free approach reaches a more accurate solution asymptotically by using a larger representation or eligibility traces. The strengths offered by each approach are important for a reinforcement learning agent and, therefore, it is desirable to search for a combination of the two approaches and get the strengths of both. The main contribution of this thesis is to propose a new architecture in which a model-based algorithm forms an initial value function estimate and a model-free algorithm adds on to and improves the initial value function estimate. Experiments show that our architecture, called the Cascade Architecture, preserves the data efficiency of the model-based algorithm. Moreover, we prove that the Cascade Architecture converges to the original model-free solution and thus prevents any imperfect model from impairing the asymptotic performance. These results strengthen the case for combining model-based and model-free reinforcement learning.
