Stable Dynamic Programming and Reinforcement Learning with Dual Representations
Loading...
Date
Citation for Previous Publication
Link to Related Item
Abstract
Description
Technical report TR07-05. We investigate novel, dual algorithms for dynamic programming and reinforcement learning, based on maintaining explicit representations of stationary distributions instead of value functions. In particular, we investigate the convergence properties of standard dynamic programming and reinforcement learning algorithms when they are converted to their natural dual form. Here we uncover advantages for the dual approach: dual update algorithms, since they are based on estimating normalized probability distributions rather than unbounded value functions, avoid divergence even in the presence of function approximation and off-policy updates. Moreover, dual update algorithms remain stable in situations where standard value function estimation diverges. | TRID-ID TR07-05
Item Type
http://purl.org/coar/resource_type/c_93fc
Alternative
Other License Text / Link
Subject/Keywords
Language
en
