Monte Carlo Tree Search and Model Uncertainty

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Master's

Degree

Master of Science

Department

Department of Computing Science

Supervisor / Co-Supervisor and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

Monte Carlo Tree Search (MCTS) is a popular tree search framework for choos- ing actions in decision-making problems. MCTS is traditionally applied to applications in which a perfect simulation model is available. However, when the model is imperfect, the performance of MCTS drops heavily. In this work, we introduce the Uncertainty Adapted MCTS (UA-MCTS) framework; an adaptation of the MCTS framework to model uncertainty. We define model uncertainty as the difference between the actual environment and the imperfect model. In UA-MCTS we modify each of the 4 steps selection, expansion, simulation, and backpropagation in MCTS so that they consider uncertainty. Although we provide a method to learn the uncertainty of the model, UA-MCTS is not restricted to our specific learning method. In the Reinforcement Learning (RL) domain, we propose the DQ-MCTS framework. DQ-MCTS uses the learned values from DQN, a state of the art model-free RL method, to improve MCTS performance. Since DQN is a model-free method, the errors in the model do not affect the learned values. DQ-MCTS uses DQN learned values to initialize the newly added nodes in the expansion step and to evaluate the last states in the simulation step. We experimentally evaluate UA-MCTS and DQ-MCTS on the determin- istic domains from the MinAtar test suite. Our results demonstrate that UA- MCTS strongly improves MCTS in the presence of model error, and that DQ-MCTS can perform better than MCTS but not better than DQN.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source