Connectionists: New Paper: MARL: Spiking and Nonspiking Agents in the Iterated Prisoner's Dilemma
Chris Christodoulou
cchrist at cs.ucy.ac.cy
Thu Apr 7 11:20:30 EDT 2011
Dear Colleagues,
We would like to draw your attention to our recently published paper:
Vassiliades, V., Cleanthous, A. and Christodoulou, C. (2011). Multiagent
Reinforcement Learning: Spiking and Nonspiking Agents in the Iterated
Prisoner's Dilemma. IEEE Transactions on Neural Networks, 22(4), 639-653.
Available at:
http://dx.doi.org/10.1109/TNN.2011.2111384
Abstract
--------
This paper investigates multiagent reinforcement learning (MARL) in a
general-sum game where the payoffs' structure is such that the agents are
required to exploit each other in a way that benefits all agents. The
contradictory nature of these games makes their study in multiagent
systems quite challenging. In particular, we investigate MARL with spiking
and nonspiking agents in the Iterated Prisoner's Dilemma by exploring the
conditions required to enhance its cooperative outcome. The spiking agents
are neural networks with leaky integrate-and-fire neurons trained with two
different learning algorithms: 1) reinforcement of stochastic synaptic
transmission, or 2) reward-modulated spike-timing-dependent plasticity
with eligibility trace. The nonspiking agents use a tabular representation
and are trained with Q- and SARSA learning algorithms, with a novel reward
transformation process also being applied to the Q-learning agents.
According to the results, the cooperative outcome is enhanced by: 1)
transformed internal reinforcement signals and a combination of a high
learning rate and a low discount factor with an appropriate exploration
schedule in the case of non-spiking agents, and 2) having longer
eligibility trace time constant in the case of spiking agents. Moreover,
it is shown that spiking and nonspiking agents have similar behavior and
therefore they can equally well be used in a multiagent interaction
setting. For training the spiking agents in the case where more than one
output neuron competes for reinforcement, a novel and necessary
modification that enhances competition is applied to the two learning
algorithms utilized, in order to avoid a possible synaptic saturation.
This is done by administering to the networks additional global
reinforcement signals for every spike of the output neurons that were not
"responsible" for the preceding decision.
--------
Please contact me if you would like a personal reprint.
Kind regards,
Chris Christodoulou
* * *
Dr Chris Christodoulou cchrist at cs.ucy.ac.cy
Department of Computer Science, University of Cyprus
75 Kallipoleos Ave, P.O. Box 20537, 1678 Nicosia, Cyprus
Tel. (+357) 22 892752, Fax (+357) 22 892701
More information about the Connectionists
mailing list