I am a terrible poker player, losing every time even when the other players are also novices. (Bridge is my favorite card game.) I long felt that the reason I lose at poker is because I lack a ‘poker face’ (one that reveals nothing to the other players) and am also bad at reading other players’ faces, mannerisms, and body language that reveal something about the strength of their hands. In the language of the game, I think I have many ‘tells’ that other players pick up on while I fail miserably in detecting any tells that they may have. I was under the impression that these factors play an even more important role than knowledge of the odds, such as the likelihood of drawing an inside straight or filling a full house.
So I was a bit confused when I saw this item about an AI bot that beat professional poker players in six-player games of Texas Hold ’em. The reason I was confused is that while the AI bot would have a much better grasp of the odds than any human player and would not have any revealing tells, it also would not be able to read the other players’ tells.
The paper in which these results are reported appeared in the journal Science and can be read here. The reason for the research was not to play better poker but to use poker as a test case of understanding something known as the ‘Nash equilibrium’. (John Nash was the mathematician and economics Nobel prize winner featured in the book and film A Beautiful Mind.)
Poker has served as a challenge problem for the fields of artificial intelligence (AI) and game theory for decades (1). In fact, the foundational papers on game theory used poker to illustrate their concepts (2, 3). The reason for this choice is simple: no other popular recreational game captures the challenges of hidden information as effectively and as elegantly as poker. Although poker has been useful as a benchmark for new AI and game-theoretic techniques, the challenge of hidden information in strategic settings is not limited to recreational games. The equilibrium concepts of von Neumann and Nash have been applied to many real-world challenges such as auctions, cybersecurity, and pricing.
The past two decades have witnessed rapid progress in the ability of AI systems to play increasingly complex forms of poker (4–6). However, all prior breakthroughs have been limited to settings involving only two players. Developing a superhuman AI for multiplayer poker was the widely-recognized main remaining milestone. In this paper we describe Pluribus, an AI capable of defeating elite human professionals in six-player no-limit Texas hold’em poker, the most commonly played poker format in the world.
AI systems have reached superhuman performance in games such as checkers (7), chess (8), two-player limit poker (4), Go (9), and two-player no-limit poker (6). All of these involve only two players and are zero-sum games (meaning that whatever one player wins, the other player loses). Every one of those superhuman AI systems was generated by attempting to approximate a Nash equilibrium strategy rather than by, for example, trying to detect and exploit weaknesses in the opponent. A Nash equilibrium is a list of strategies, one for each player, in which no player can improve by deviating to a different strategy. Nash equilibria have been proven to exist in all finite games, and many infinite games, though finding an equilibrium may be difficult.
Two-player zero-sum games are a special class of games in which Nash equilibria also have an extremely useful additional property: any player who chooses to use a Nash equilibrium is guaranteed to not lose in expectation no matter what the opponent does (as long as one side does not have an intrinsic advantage under the game rules, or the players alternate sides). In other words, a Nash equilibrium strategy is unbeatable in two-player zero-sum games that satisfy the above criteria. For this reason, to “solve” a two-player zero-sum game means to find an exact Nash equilibrium. For example, the Nash equilibrium strategy for Rock-Paper-Scissors is to randomly pick Rock, Paper, or Scissors with equal probability. Against such a strategy, the best that an opponent can do in expectation is tie (10). In this simple case, playing the Nash equilibrium also guarantees that the player will not win in expectation. However, in more complex games even determining how to tie against a Nash equilibrium may be difficult; if the opponent ever chooses suboptimal actions, then playing the Nash equilibrium will indeed result in victory in expectation.
So what this result may be revealing is that knowing the odds and focusing on not having any tells of one’s own is more important to winning in poker than being able to read other players’ tells.