Investigating Two Policy Gradient Methods Under Different Time Discretizations

dc.contributor.advisorMahmood, A. Rupam (Computing Science)
dc.contributor.authorFarrahi, Homayoon
dc.date.accessioned2025-05-06T19:02:50Z
dc.date.available2025-05-06T19:02:50Z
dc.date.issued2021-11
dc.description.abstractContinuous-time reinforcement learning tasks commonly use discrete time steps of fixed cycle times for actions. Choosing a small action-cycle time in such tasks allows reinforcement learning agents fast reaction and a more temporally detailed perception of the environment. The learning performance of both policy gradient and action-value methods, however, may deteriorate as the cycle time duration is reduced, which necessitates the tuning of the cycle time as a hyper-parameter. Since tuning an additional hyper-parameter is time-consuming, specifically for real-world robots, existing algorithms can benefit from having hyper-parameters that are robust to the choice of cycle time. In this thesis, we aim to study how changing the action-cycle time affects the performance of two prominent policy gradient algorithms PPO and SAC and investigate the efficacy of their widely-used hyper-parameter values across different cycle times. We explore how changing some of these hyper-parameters based on the cycle time can help or hinder the performance of these algorithms and inquire into and understand the relationship between them. These relationships are put forward as new hyper-parameters that can be adjusted based on the cycle time, and their effectiveness is examined and validated on simulated and real-world robotic tasks. We show that the new hyper-parameters, unlike the existing ones, can be more robust to different environments and cycle times and can enable hyper-parameter values tuned to a cycle time on a specific problem to be transferred to a different cycle time.
dc.identifier.doihttps://doi.org/10.7939/r3-sttb-hb65
dc.language.isoen
dc.rightsThis thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
dc.subjectReinforcement Learning
dc.subjectPolicy Gradient
dc.subjectReal-Time Learning
dc.subjectTime Discretization
dc.titleInvestigating Two Policy Gradient Methods Under Different Time Discretizations
dc.typehttp://purl.org/coar/resource_type/c_46ec
thesis.degree.grantorUniversity of Alberta
thesis.degree.levelMaster's
thesis.degree.nameMaster of Science
ual.date.graduationFall 2021
ual.departmentDepartment of Computing Science
ual.jupiterAccesshttp://terms.library.ualberta.ca/public

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Farrahi_Homayoon_202108_MSc.pdf
Size:
2.61 MB
Format:
Adobe Portable Document Format