Sub-Neural Policies: Option Discovery via Neural Decomposition
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
In reinforcement learning, agents solve problems through interactions with the environment. However, when faced with intricate environmental dynamics, learning can become challenging, resulting in sub-optimal policies. A potential remedy to this situation lies in the transfer of knowledge from previously solved tasks to enhance the efficiency of the agent. In this dissertation, we investigate this approach, focusing on the decomposition of neural network policies for Markov Decision Processes into reusable sub-policies, which can be helpful'' for unforeseen tasks. We consider neural networks with piecewise linear activation functions, since they can be transformed into oblique decision trees. Each sub-tree within an oblique decision tree corresponds to a sub-policy associated with the primary task. We hypothesize that some of these sub-policies can be helpful in downstream tasks. Given that the number of these sub-policies grows exponentially with the neural network's size, we select a subset of such sub-policies while minimizing the Levin Loss. We transform the selected sub-policies into temporally extended actions, or options. To validate the algorithm's ability to discover helpful options, we present empirical findings on two challenging grid-world domains, each characterized by distinct dynamics. The experimental results show that options can occur naturally'' within neural network encoding policies. Our results suggest that the process of decomposing neural network serves as a promising avenue for option discovery.
