This is a small programming project I worked on in 2013-2014. Although I wrote a blog series about it at the time, this is not a repost of that series. Instead, this is a repost of the explanation I wrote earlier this year, when I uploaded the project to github. If you liked this article, you might also enjoy this interactive game, although I had nothing to do with that one.
The Prisoner’s Dilemma is an important concept in game theory, which captures the problem of altruism. Each of the two players chooses to either cooperate or defect. Cooperating incurs a personal cost, but benefits the other player. If both players cooperate, then they are better off than if they had both defected. In a single Prisoner’s Dilemma, it seems that it’s best to defect. However, if there are multiple games played in succession, it’s possible for players to punish defectors in subsequent games. When multiple games are played in succession, it is called the Iterated Prisoner’s Dilemma (IPD).
The best approach to the IPD is highly nontrivial. In 2012, William Press and Freeman Dyson proved that there is a class of “zero-determinant” strategies that seem dominant, and which would lead to mostly defection. However, Christoph Adami and Arend Hintze showed that the zero-determinant strategies are not dominant in the context of evolution. Understanding this issue could elucidate why humans and other creatures appear to be altruistic.
How the simulation works
- We have a population of 40 individuals. Each individual has 4 parameters that govern how they play IPD.
- Each individual plays IPD against 2 other individuals in the population, and their fitness is calculated from their average score.
- One individual dies, and another reproduces. The probability of reproduction increases with fitness, and the probability of death decreases with fitness.
- All the parameters of the individuals are mutated by small amounts.
- Steps 2-4 are repeated a million times. Each repetition is called a “generation”.