This project uses recurrent neural network based reinforcement learning to play Iterated Prisoner’s Dilemma. Multiple experiments are carried out with varying strategy compositions: RL-agent vs. stationary strategies, RL-agent vs. RL-agent, more than two RL-agents, and a mix of strategies. Cooperative behavior emerged in some RL-agent vs. RL-agent scenario. Non-uniform strategies evolved in some tournaments where number of RL-agents greater than two. Q-table and markov matrix are used to analyze agents’ learned strategies. A high variance is observed across trials when using Q-learning without experience replay.

Iterated Prisoner’s Dilemma with Reinforcement Learning