Connectionists: New Paper: MARL: Spiking and Nonspiking Agents in the Iterated Prisoner's Dilemma

Chris Christodoulou cchrist at cs.ucy.ac.cy
Thu Apr 7 11:20:30 EDT 2011


Dear Colleagues,

We would like to draw your attention to our recently published paper:

Vassiliades, V., Cleanthous, A. and Christodoulou, C. (2011). Multiagent 
Reinforcement Learning: Spiking and Nonspiking Agents in the Iterated 
Prisoner's Dilemma. IEEE Transactions on Neural Networks, 22(4), 639-653.

Available at:
http://dx.doi.org/10.1109/TNN.2011.2111384

Abstract
--------
This paper investigates multiagent reinforcement learning (MARL) in a 
general-sum game where the payoffs' structure is such that the agents are 
required to exploit each other in a way that benefits all agents. The 
contradictory nature of these games makes their study in multiagent 
systems quite challenging. In particular, we investigate MARL with spiking 
and nonspiking agents in the Iterated Prisoner's Dilemma by exploring the 
conditions required to enhance its cooperative outcome. The spiking agents 
are neural networks with leaky integrate-and-fire neurons trained with two 
different learning algorithms: 1) reinforcement of stochastic synaptic 
transmission, or 2) reward-modulated spike-timing-dependent plasticity 
with eligibility trace. The nonspiking agents use a tabular representation 
and are trained with Q- and SARSA learning algorithms, with a novel reward 
transformation process also being applied to the Q-learning agents. 
According to the results, the cooperative outcome is enhanced by: 1) 
transformed internal reinforcement signals and a combination of a high 
learning rate and a low discount factor with an appropriate exploration 
schedule in the case of non-spiking agents, and 2) having longer 
eligibility trace time constant in the case of spiking agents. Moreover, 
it is shown that spiking and nonspiking agents have similar behavior and 
therefore they can equally well be used in a multiagent interaction 
setting. For training the spiking agents in the case where more than one 
output neuron competes for reinforcement, a novel and necessary 
modification that enhances competition is applied to the two learning 
algorithms utilized, in order to avoid a possible synaptic saturation. 
This is done by administering to the networks additional global 
reinforcement signals for every spike of the output neurons that were not 
"responsible" for the preceding decision.

--------

Please contact me if you would like a personal reprint.

Kind regards,

Chris Christodoulou

* * *
Dr Chris Christodoulou          cchrist at cs.ucy.ac.cy
Department of Computer Science, University of Cyprus
75 Kallipoleos Ave, P.O. Box 20537, 1678 Nicosia, Cyprus
Tel. (+357) 22 892752, Fax (+357) 22 892701


More information about the Connectionists mailing list