Guarantees for Self-Play via Polymatrix Decomposability
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself. Self-play is useful for generating large quantities of data for learning, but has the drawback that agents the learner will face post-training may have dramatically different behaviour than the learner came to expect by interacting with itself. For the case of two-player constant-sum games, self-play that reaches Nash equilibrium is guaranteed to produce strategies that cannot lose utility from their equilibrium value against any post-training opponent; however, no such guarantee exists for multi-player games.
We show that in games that approximately decompose into a set of two-player constant-sum games (called polymatrix games) where global $\epsilon$-Nash equilibria are boundedly far from Nash-equilibria in each subgame (called subgame stability), any no-external-regret algorithm that learns by self-play will produce a strategy with bounded loss of utility against new agents, which we call vulnerability. In approximate subgame stable constant sum polymatrix (SS-CSP) games, the strategies produced by self-play are also exchangeable and have values that fall into a bounded range. We extend these results to extensive-form games and give an efficient representation and algorithm for such a decomposition. We demonstrate our findings through experiments on Kuhn and Leduc poker. Finally, we extend our results to games which are strategically equivalent to SS-CSP games. For the first time, our results identify a structural property of multi-player games that enable performance guarantees for the strategies produced by a broad class of self-play algorithms.
