Black History Month is here! Discover ERA research focused on Black experiences in Canada and worldwide. Use our general search below to get started!

Using Response Functions for Strategy Training and Evaluation

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Master's

Degree

Master of Science

Department

Department of Computing Science

Supervisor / Co-Supervisor and Their Department(s)

Examining Committee Member(s) and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

Extensive-form games are a powerful framework for modeling sequential multi-agent interactions. In extensive-form games with imperfect information, Nash equilibria are generally used as a solution concept, but computing a Nash equilibrium can be intractable in large games. Instead, a variety of techniques are used to find strategies that approximate Nash equilibria. Traditionally, an approximate Nash equilibrium strategy is evaluated by measuring the strategy's worst-case performance, or exploitability. However, because exploitability fails to capture how likely the worst-case is to be realized, it provides only a limited picture of strategy strength, and there is extensive empirical evidence showing that exploitability can correlate poorly with one-on-one performance against a variety of opponents. In this thesis, we introduce a class of adaptive opponents called pretty-good responses that exploit a strategy but only have limited exploitative power. By playing a strategy against a variety of counter-strategies created with pretty-good responses, we get a more complete picture of strategy strength than that offered by exploitability alone. In addition, we show how standard no-regret algorithms can me modified to learn strategies that are strong against adaptive opponents. We prove that this technique can produce optimal strategies for playing against pretty-good responses. We empirically demonstrate the effectiveness of the technique by finding static strategies that are strong against Monte Carlo opponents who learn by sampling our strategy, including the UCT Monte Carlo tree search algorithm.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source