thesis: Reinforcement learning models of the dopamine system and their behavioral implications

Nathaniel Daw daw at cs.cmu.edu
Wed Sep 3 16:15:09 EDT 2003


Dear Connectionists,

I thought that some of you might be interested in my recently completed
PhD thesis, "Reinforcement learning models of the dopamine system and
their behavioral implications," which is available as a (rather large 4Mb)
pdf download at

http://www.cs.cmu.edu/~daw/thesis.pdf

An abstract follows.

best,

Nathaniel Daw


ABSTRACT

This thesis aims to improve theories of how the brain functions and to
provide a framework to guide future neuroscientific experiments by making
use of theoretical and algorithmic ideas from computer science. The work
centers around the detailed understanding of the dopamine system, an
important and phylogenetically venerable brain system that is implicated
in such general functions as motivation, decision-making and motor
control, and whose dysfunction is associated with disorders such as
schizophrenia, addiction, and Parkinson's disease. A series of influential
models have proposed that the responses of dopamine neurons recorded from
behaving monkeys can be identified with the error signal from temporal
difference (TD) learning, a reinforcement learning algorithm for learning
to predict rewards in order to guide decision-making.

Here I propose extensions to these theories that improve them along a
number of dimensions simultaneously. The new models that result eliminate
several unrealistic simplifying assumptions from the original accounts;
explain many sorts of dopamine responses that had previously seemed
anomalous; flesh out nascent suggestions that these neurophysiological
mechanisms can also explain animal behavior in conditioning experiments;
and extend the theories' reach to incorporate proposals about the
computational function of several other brain systems that interact with
the dopamine neurons.

Chapter 3 relaxes the assumption from previous models that the system
tracks only short-term predictions about rewards expected within a single
experimental trial. It introduces a new model based on average-reward TD
learning that suggests that long-run reward predictions affect the
slow-timescale, tonic behavior of dopamine neurons. This account resolves
a seemingly paradoxical finding that the dopamine system is excited by
aversive events such as electric shock, which had fueled several published
attacks on the TD theories. These investigations also provide a basis for
proposals about the functional role of interactions between the dopamine
and serotonin systems, and about behavioral data on animal decision-making.

Chapter 4 further revises the theory to account for animals' uncertainty
about the timing of events and about the moment-to-moment state of an
experimental task. These issues are handled in the context of a TD
algorithm incorporating partial observability and semi-Markov dynamics; a
number of other new or extant models are shown to follow from this one in
various limits. The new theory is able to explain a number of previously
puzzling results about dopamine responses to events whose timing is
variable, and provides an appropriate framework for investigating
behavioral results concerning variability in animals' temporal judgments
and timescale invariance properties in animal learning.

Chapter 5 departs from the thesis' primary methodology of computational
modeling to present a complementary attempt to address the same issues
empirically. The chapter reports the results of an experiment that record
from the striatum of behaving rats (a brain area that is one of the major
inputs and outputs of the dopamine system), during a task designed to
probe the functional organization of decision-making in the brain. The
results broadly support the contention of most versions of the TD models
that the functions of action selection and reward prediction are
segregated in the brain, as in "actor/critic" reinforcement learning
systems.






More information about the Connectionists mailing list