Ph.D. thesis available

Tue May 7 11:13:54 EDT 2002

Dear Connectionists,

The following Ph.D. thesis has been made available:

        Optimal Learning: Computational procedures for
           Bayes-adaptive Markov decision processes

                       Michael O. Duff

                Department of Computer Science

              University of Massachusetts, Amherst

The thesis may be retrieved from:

          http://envy.cs.umass.edu/People/duff/diss.html

-----------------------------------------------

Abstract

In broad terms, this dissertation is about decision making under
uncertainty.  At each stage, a decision-making agent operating in
an uncertain world takes an action that elicits a reinforcement
signal and causes the state of the world (or agent) to change.
The agent's goal is to maximize the total reward it derives over
its entire duration of operation---an interval that may require
the agent to strike a delicate balance between two sometimes
conflicting impulses: (1) greedy expoitation of its current world
model, and (2) exploration of its world to gain information that
can refine the world model and improve the agent's policy.

Over the years, a number of researchers have formulated this problem
mathematically---"adaptive control processes," "dual control," "value
of information," and "optimal learning" all address essentially the
same issue and share a basic Bayesian framework that is well-suited
for modeling the role of information and for defining what a solution
is.  Unfortunately, classical procedures for computing policies that
optimally balance expoitation with exploration are intractable and
have only been able to address problems that have a very small number
of physical states and short planning horizons.

This dissertation proposes compuational procedures that retain the
Bayesian formulation, but sidestep intractability by employing Monte-
Carlo simulation, function approximation, and diffusion modeling of
information-state dynamics.