Stable Dynamic Programming and Reinforcement Learning with Dual Representations

Loading...
Thumbnail Image

Date

Citation for Previous Publication

Link to Related Item

Abstract

Description

Technical report TR07-05. We investigate novel, dual algorithms for dynamic programming and reinforcement learning, based on maintaining explicit representations of stationary distributions instead of value functions. In particular, we investigate the convergence properties of standard dynamic programming and reinforcement learning algorithms when they are converted to their natural dual form. Here we uncover advantages for the dual approach: dual update algorithms, since they are based on estimating normalized probability distributions rather than unbounded value functions, avoid divergence even in the presence of function approximation and off-policy updates. Moreover, dual update algorithms remain stable in situations where standard value function estimation diverges. | TRID-ID TR07-05

Item Type

http://purl.org/coar/resource_type/c_93fc

Alternative

Other License Text / Link

Language

en

Location

Time Period

Source