Stable Dynamic Programming and Reinforcement Learning with Dual Representations

dc.contributor.authorWang, Tao
dc.contributor.authorSchuurmans, Dale
dc.contributor.authorBowling, Michael
dc.contributor.authorLizotte, Daniel
dc.date.accessioned2025-05-01T21:12:55Z
dc.date.available2025-05-01T21:12:55Z
dc.date.issued2007
dc.descriptionTechnical report TR07-05. We investigate novel, dual algorithms for dynamic programming and reinforcement learning, based on maintaining explicit representations of stationary distributions instead of value functions. In particular, we investigate the convergence properties of standard dynamic programming and reinforcement learning algorithms when they are converted to their natural dual form. Here we uncover advantages for the dual approach: dual update algorithms, since they are based on estimating normalized probability distributions rather than unbounded value functions, avoid divergence even in the presence of function approximation and off-policy updates. Moreover, dual update algorithms remain stable in situations where standard value function estimation diverges. | TRID-ID TR07-05
dc.identifier.doihttps://doi.org/10.7939/R33G0M
dc.language.isoen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/
dc.subjectReinforcement learning
dc.subjectDual representations
dc.subjectReinforcement Learning
dc.titleStable Dynamic Programming and Reinforcement Learning with Dual Representations
dc.typehttp://purl.org/coar/resource_type/c_93fc
ual.jupiterAccesshttp://terms.library.ualberta.ca/public

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TR07-05.pdf
Size:
178.75 KB
Format:
Adobe Portable Document Format