Non-restricted Winter 2026 convocation theses and dissertations will be discoverable in ERA on March 16. Congratulations to all our graduates!

Learning Policies and Heuristics for Bidirectional Search

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Master's

Degree

Master of Science

Department

Department of Computing Science

Specialization

Statistical Machine Learning

Supervisor / Co-Supervisor and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

This thesis empirically investigates the comparative ease of learning policies and heuristics for bidirectional versus unidirectional search in satisficing classical planning. Our research explores the potential advantages of bidirectional search in terms of learnability and efficiency of the resulting guidance mechanisms. We employ a learning framework where neural networks parameterize policies and heuristics, trained on solutions generated by bidirectional and unidirectional search methods across three classical planning domains: Sliding Tile Puzzle, Witness Puzzles (Triangles and Colours), and Pancake Puzzle. We compare base algorithms that span a spectrum from pure heuristic-based to pure policy-based approaches. As models improve, they solve more problems and find different solutions, potentially enhancing the quality of training data. Experimental results suggest that learning satisficing policies and heuristics for bidirectional search is often easier and leads to more efficient search guidance. Consistently, during training, bidirectional searches achieved greater or equal solve rates after seeing fewer problems and using fewer cumulative expansions. The resulting learned bidirectional guidance mechanisms were efficient in terms of expansions and generalized well to test sets. These findings were particularly pronounced in the Witness and Pancake domains. We propose several possible explanations for these observations, including the potential for bidirectional search to yield different solution distributions, allow for complementary interaction between the two directions predictions, and reduce the effective search depth. This research contributes to the ongoing exploration of learning in planning, raising new questions about the relationship between search strategies and the ease of learning effective guidance.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source