The IPD is an entertaining game that can be studied mathematically to gain some insight about certain complex systems. I've worked on it in several separate projects at various levels of sophistication ; this page is a formulation of the problem and a brief retelling of my experiments with it.
Description of the Prisoner's Dilemma
The classic formulation of the Prisoner's Dilemma involves two would-be prisoners trying to get themselves out of a possible jail term. For a clear explanation of this version of the Prisoner's Dilemma, try Leon Felkins' web site listed below.
I personally don't like the classic formulation of the dilemma because the
semantics of sing
and keep mum
from the classic version don't match well
with the semantics of cooperate
and defect,
at least in my mind. So my
preference is the following variation, which describes the same situation as the
original Prisoner's Dilemma but in more profitable James Bond–ish terms :
Suppose you're a spy on contract for a local candy company, Florence's Fabulous Treats, home of the legendary Fun Florence Taffy. Florence's competition in town is Carine's Confections, whose closely guarded specialty is Carine's Original Peanut Brittleuppose you're a spy on contract for a local candy company, Florence's Fabulous Treats, home of the legendary Fun Florence Taffy. Florence's competition in town is Carine's Confections, whose closely guarded specialty is Carine's Original Peanut Brittle.
One day, Florence approaches you and says that she's arranged to swap recipes with Carine : taffy for peanut brittle. She hands you a manila envelope with the recipe card inside and gives you your instructions.
Starting at noon the next day, you are to go to the southeastern corner of Central Park. You will walk counterclockwise along the perimeter of the park and place the envelope under the fifth trash can. (The park is a rectangle with trash cans spaced out evenly along the perimeter.) At the same time, Carine's spy will start at the northwest corner of the park and walk counterclockwise along the perimeter, placing her manila envelope, containing the precious Peanut Brittle recipe, under the fifth trash can.
You will both continue walking around the perimeter of the park until each of you arrives at the other spy's trash can, whereupon you will retrieve the envelopes and take them back to your bosses. Being a spy, you are not allowed to meet in person with Carine's spy, thus the complicated park-circling behavior.
Now, you know that if you somehow manage to trick Carine's spy and walk away with the peanut brittle recipe in hand without having divulged the taffy recipe, Florence will give you a huge bonus. But if the opposite happens, you'll get nothing for your incompetence. If you decide to hang on to the taffy recipe but Carine's spy doesn't leave the peanut brittle recipe for you, Florence will probably not be pleased but will have to pay you a little anyway. And last, if everything goes according to plan, you'll please your boss for having done the job well.
Thus, your payoff matrix (i.e. the pay that you will get for the job based on the actions during the exchange) will look like this :
| Carine's spy |
+--------+-----------+
| defect | cooperate |
----+-----------+--------+-----------+
| defect | $ 1000 | $ 5000 |
you +-----------+--------+-----------+
| cooperate | $ 0 | $ 3000 |
----+-----------+--------+-----------+
You have no reason to believe that Carine's spy will have a payoff matrix any different from your own. (Previous work experience in the spy business has shown you that employers are equally stingy when it comes to the spy budget.) So you sit down and start to think things out a little for yourself.
If Carine's spy leaves an empty envelope tomorrow, I had better leave an empty
envelope also. If I don't, Florence is sure to fire me, and I'll never get a spy
job in this town again. But if Carine's spy leaves an envelope with the recipe
in it, I would be better off leaving an empty envelope, because Carine and her
spy will get nothing, but Florence will get the Peanut Brittle recipe, and I'll
get a huge bonus.
So, thus reasoned, you decide it's in your best interest to leave an empty envelope for the exchange. Sadly enough, Carine's spy will probably come to the same conclusion based on a similar line of reasoning, neither recipe will be exchanged, and the sum of your scores will be lower than it would have been if both of you had cooperated. As Hofstadter points out, selfish rationality on both sides of this situation leads to an irrational outcome ! '
The assumption is that Carine's spy will have the same payoff matrix, and thus
you are both involved in a Prisoner's Dilemma situation. The game can be seen in
many other contexts as well, including nuclear disarmament (where dismantling a
nuclear weapon would be seen as cooperating and building more weapons as
defecting). In the case of nuclear disarmament, the rational
one-time move
of mutual defection is depressing enough.
So the game, while simple as far as rules and payoffs, is capable of modeling some rather complex social situations. This leads us into the next section, because often social situations are repeated several times with the same players over a given time span (e.g. nuclear disarmament strategies during the USSR–US cold war).
The Iterated Prisoner's Dilemma (IPD)
As Hofstadter and Axelrod note, the game leads immediately to mutual defection when played once. In the case of the candy spies, cooperating leads to the possibility of getting $ 0 in the worst case or a $ 3000 in the best case, while defecting leads to $ 1000 in the worst case versus $ 5000 in the best case. This result clearly depends on the structure of the payoff matrix shown above.
Things become much more interesting when people play multiple times, as in
Robert Axelrod's two Iterated Prisoner's Dilemma tournaments discussed below.
For example, suppose that you signed a contract with Florence to work for the
next 24 weeks doing weekly recipe exchanges with Carine. Then you and Carine's
spy would eventually accumulate a mutual history of interactions, so that on a
given week you might think, Hey, I've been cooperating for the last 3 weeks
and Carine hasn't given me squat. I'm going to keep Florence's recipe this time
to punish Carine.
And so on ...
Thus one starts playing the Iterated Prisoner's Dilemma (IPD), in which cooperation can emerge from a game in which any rational person would defect when playing once.
History of the Dilemma
The Prisoner's Dilemma was created in the late 1940s by Flood and Drescher, workers at the RAND corporation (a thinktank that mostly did work for the US military). Supposedly the game was created to help find strategies for a nuclear cold war.
Albert W. Tucker, one of three game theorist winners of the 1994 Nobel Memorial Prize, was the first person to publish a paper on the Prisoner's Dilemma, after he used the situation as an explanation in a psychology class he was giving at Stanford in 1950. In his paper, Tucker mentioned applications of the dilemma to many disciplines.
Axelrod and the Iterated Prisoner's Dilemma tournaments
Later on, Robert Axelrod, a political scientist at the University of Michigan, created a computer simulation of the Iterated Prisoner's Dilemma. Axelrod announced that he would hold a tournament to see what the best strategy for the IPD was.
The results of Axelrod's tournament were surprising in that one of the smallest and simplest strategies submitted, called tit–for–tat, won the tournament. Tit–for–tat, submitted by Anatole Rapoport, cooperates with its opponent on the first round of a faceoff, then on each consecutive round it repeats what its opponent did the previous time. One of the most interesting things about tit–for–tat is that it often loses individual faceoffs, but at the end of a tournament it ends up the overall winner because (1) tit–for–tat wins a lot of points when it plays itself, and (2) the other strategies in a given population generally do not win a lot of points when playing amongst themselves.
Axelrod, surprised by the results of his first competition, announced a second competition and solicited solutions from several top computer scientists and experts in game theory. In addition, he submitted two strategies, tit-for-two-tats (which cooperates the first two rounds, then defects iff the opponent defected both of the previous two rounds) and revised downing (description to come), both of which beat tit–for–tat when he added them to his original tournament situation. Again, however, the little tit–for–tat won.
Since the first tournaments
Tit–for–tat's robustness has led to a whole heap of theories and speculations in psychology, game theory, nuclear arms strategies, &c. Hofstadter mentions a few rules that apply in general to the IPD, but it is always a little dangerous to extrapolate too far into other domains. In general, say most folks, a strategy must (1) start off being nice and (2) be willing to punish the opponent's defections, but (3) not be too harsh in punishing the opponent.
Also, it should be mentioned that the optimum strategy in any given IPD depends on the general population makeup. For example, the best possible strategy when playing against a random strategy (one that picks a random response at each interaction) is to defect all the time, because among nonresponsive strategies (like random, all–cooperate, or all–defect) you'll get no reward for cooperating. On the other hand, if there are any responsive strategies present in the population, it is a bad idea to play all–defect because the responsive strategies will pick up on your lack of responsiveness and punish you accordingly. Thus, the optimum strategy depends a great deal on the environment that you use to evaluate a strategy.
A lot of research has since gone into the IPD and related game theoretical situations. I cannot really do much justice to it all, since I am really pretty ignorant when it comes to the newer developments ... all I can honestly provide is the bibliography that comes at the end of this page.
My experiences with the IPD
During the summer of 1994, I was lucky enough to go to Duke University for three weeks as part of their wonderful Talent Identification Program (TIP). I stayed in Pegram, had a roommate for the first time ever, and learned to program in Scheme with the help of Dave Reed. Go TIP LISP '94 !
In class one day (it was a few days after Dave's story time), Dave presented something called the Prisoner's Dilemma, in which two guys were supposed to be trying to get the shortest jail terms. Dave talked about doing a computer tournament and made everybody in the class submit a strategy. I think I submitted all–defect, but all I remember about it was that Dave was mad at me because my ultra–simple strategy had a bug and almost crashed the tournament. Oops. I wasn't interested in Dave's IPD tournament because Zack and I were working on a Mastermind program.
Even though I wasn't interested in the IPD at the time, thanks, Dave, for Dr. Seuss, recursion, and the Iterated Prisoner's Dilemma !
Many moons passed, and I went on to the North Carolina School of Science and Mathematics. Students at the school get one week each spring to conduct a special project of some kind, or attend some special lectures of some kind, or go on a trip of some kind, &c. When I was there, this was called Special Projects Week (SPW) ; now I think it's called Miniterm.
Whatever the name of the week, I suddenly got interested in the IPD again. I think this sudden interest coincided with my reading of Hofstadter's article (referenced below). I wrote Dave an email and asked him for the Prisoner's Dilemma tournament code he used for our Scheme class. Luckily, he responded and attached his source file. During SPW I translated Dave's Scheme code into C and tested the resulting tournament simulator with some basic strategies. Unfortunately nothing super interesting came as a result, but this project got me started down the dark path of game theory simulations.
A couple years later I got to reprogramming the tournament simulator I had worked on at Science and Math. I had just taken Funderlic's Operating Systems class at NC State and had just built my first dual processor refrigerator ... er, computer, so I was aching to multithread the Prisoner's Dilemma to try and speed it up.
I finally got the simulator to work around the end of the summer, at which point I took off for a year of study abroad in Grenoble, France. At the end of the year, I got the chance to work in the Leibniz Research Laboratory in Grenoble. During the summer I resurrected the multithreaded simulator code and started to play around with it once again, this time in the context of genetic algorithms and a little bit of neural networks.
References
Axelrod, Robert. 1984. The Evolution of Cooperation. New York: Basic Books.
Hofstadter, Douglas R. 1983.
Metamagical Themas: Computer Tournaments of the Prisoner's Dilemma Suggest How Cooperation Evolves.
Scientific American 248 (no.5): 16-26.