Fall 2025 theses and dissertations (non-restricted) will be available in ERA on November 17, 2025.

Learning Programmatic Policies from ReLU Neural Networks

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Master's

Degree

Master of Science

Department

Department of Computing Science

Supervisor / Co-Supervisor and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

Oblique decision trees use linear combinations of features in the decision nodes. Due to the non-smooth structure of decision trees, training oblique decision trees is considerably difficult as the parameters are tuned using expensive non-differentiable optimization techniques or found by extensive search of a discretized space. Recent work showed that one can induce oblique decision trees from the derivatives of ReLU neural networks. For learning with gradient descent, the derivative-based model requires one to anneal from a smooth approximation of ReLU activation functions to ReLUs during training and to use a dynamic programming algorithm to efficiently compute the gradients.

In this thesis we show that regular ReLU neural networks trained with backpropagation can be written as oblique decision trees. We also show that hidden units from ReLU networks can be used to implicitly train oblique decision trees using computationally efficient algorithms for axis-aligned trees. Our result provides not only simple and efficient ways to induce oblique decision trees, but effective methods for synthesizing programmatic policies. This is because oblique decision trees can be seen as programs written in a domain-specific language commonly used in the programmatically interpretable reinforcement learning literature. All one needs to do is use a ReLU neural network to encode the policy, and learn using any policy gradient algorithm. Our methods can then map the policy learned with gradient descent to a program. Empirical results show that our approaches for synthesizing programmatic policies is competitive with the current state of the art if the synthesized programs are small; our methods outperforms the state of the art in almost all control problems evaluated if it is allowed to synthesize longer programs.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source