Sub-Neural Policies: Option Discovery via Neural Decomposition

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Master's

Degree

Master of Science

Department

Department of Computing Science

Supervisor / Co-Supervisor and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

In reinforcement learning, agents solve problems through interactions with the environment. However, when faced with intricate environmental dynamics, learning can become challenging, resulting in sub-optimal policies. A potential remedy to this situation lies in the transfer of knowledge from previously solved tasks to enhance the efficiency of the agent. In this dissertation, we investigate this approach, focusing on the decomposition of neural network policies for Markov Decision Processes into reusable sub-policies, which can be helpful'' for unforeseen tasks. We consider neural networks with piecewise linear activation functions, since they can be transformed into oblique decision trees. Each sub-tree within an oblique decision tree corresponds to a sub-policy associated with the primary task. We hypothesize that some of these sub-policies can be helpful in downstream tasks. Given that the number of these sub-policies grows exponentially with the neural network's size, we select a subset of such sub-policies while minimizing the Levin Loss. We transform the selected sub-policies into temporally extended actions, or options. To validate the algorithm's ability to discover helpful options, we present empirical findings on two challenging grid-world domains, each characterized by distinct dynamics. The experimental results show that options can occur naturally'' within neural network encoding policies. Our results suggest that the process of decomposing neural network serves as a promising avenue for option discovery.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source