Regret Minimization with Function Approximation in Extensive-Form Games

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Master's

Degree

Master of Science

Department

Department of Computing Science

Specialization

Statistical Machine Learning

Citation for Previous Publication

Link to Related Item

Abstract

Computing a Nash equilibrium in zero-sum games, or more generally saddle point optimization, is a fundamental problem in game theory and machine learning, with applications spanning across a wide variety of domains, from generative modeling and computer vision to super-human AI in imperfect information games like poker. Despite the broad application of Nash equilibria, traditional methods from optimization and machine learning are not directly applicable. However, in zero-sum games an effective and simple method exists -- self-play with online learning. In this setup, an equilibrium is computed by pitting two algorithms against each other to play out a game repeatedly. Online learning with self-play via Counterfactual Regret Minimization (CFR) is the leading approach for saddle point computation in large games with sequential decision making and imperfect information. For very large games, CFR can be scaled in various dimensions such as sampling, subgame decomposition, and function approximation. Despite the growing interests in scaling algorithms with function approximation in areas such as reinforcement learning, current theoretical guarantees for CFR and function approximation are minimal. In this thesis we extend theoretical results for CFR when using function approximation, and complement these worst-case guarantees with experiments on several common benchmark games with sequential decision making and imperfect information.

The thesis is outlined as follows. First, relevant background is given by defining external regret -- a quantity to evaluate online learning algorithms. Then the connection between regret, self-play, and general concave-convex saddle point problems is given. A generalization of external regret in the specific online decision problem is also reviewed. The main theoretical contributions are then presented, generalizing previous work to different types of regret in the online decision problem and with different algorithms. The new theoretical guarantees with function approximation give rise to two new families of algorithms, presented as f-RCFR and f-RCFR+, combining function approximation and CFR like algorithms. Both f-RCFR and f-RCFR+ algorithms are then compared across different games, with f-RCFR+ demonstrating superior performance and better management of function approximator capacity.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

en

Location

Time Period

Source