Ph.D. thesis available
Michael Duff
duff at envy.cs.umass.edu
Tue May 7 11:13:54 EDT 2002
Dear Connectionists,
The following Ph.D. thesis has been made available:
Optimal Learning: Computational procedures for
Bayes-adaptive Markov decision processes
Michael O. Duff
Department of Computer Science
University of Massachusetts, Amherst
The thesis may be retrieved from:
http://envy.cs.umass.edu/People/duff/diss.html
-----------------------------------------------
Abstract
In broad terms, this dissertation is about decision making under
uncertainty. At each stage, a decision-making agent operating in
an uncertain world takes an action that elicits a reinforcement
signal and causes the state of the world (or agent) to change.
The agent's goal is to maximize the total reward it derives over
its entire duration of operation---an interval that may require
the agent to strike a delicate balance between two sometimes
conflicting impulses: (1) greedy expoitation of its current world
model, and (2) exploration of its world to gain information that
can refine the world model and improve the agent's policy.
Over the years, a number of researchers have formulated this problem
mathematically---"adaptive control processes," "dual control," "value
of information," and "optimal learning" all address essentially the
same issue and share a basic Bayesian framework that is well-suited
for modeling the role of information and for defining what a solution
is. Unfortunately, classical procedures for computing policies that
optimally balance expoitation with exploration are intractable and
have only been able to address problems that have a very small number
of physical states and short planning horizons.
This dissertation proposes compuational procedures that retain the
Bayesian formulation, but sidestep intractability by employing Monte-
Carlo simulation, function approximation, and diffusion modeling of
information-state dynamics.
More information about the Connectionists
mailing list