PhD Thesis Available

Mon Oct 28 01:06:18 EST 1996

My Phd thesis is now available for download:

LARGE-SCALE DYNAMIC OPTIMIZATION USING TEAMS OF REINFORCEMENT LEARNING AGENTS

Robert Harry Crites

ftp://ftp.cs.umass.edu/pub/anw/pub/crites/root.ps.Z   (202517 bytes)

or from my homepage at:

http://www-anw.cs.umass.edu/People/crites/crites.html

Abstract:

Recent algorithmic and theoretical advances in reinforcement learning (RL)
are attracting widespread interest.  RL algorithms have appeared that
approximate dynamic programming (DP) on an incremental basis.  Unlike
traditional DP algorithms, these algorithms do not require knowledge of the
state transition probabilities or reward structure of a system.  This
allows them to be trained using real or simulated experiences, focusing
their computations on the areas of state space that are actually visited
during control, making them computationally tractable on very large
problems.  RL algorithms can be used as components of multi-agent
algorithms.  If each member of a team of agents employs one of these
algorithms, a new collective learning algorithm emerges for the
team as a whole.  In this dissertation we demonstrate that such collective
RL algorithms can be powerful heuristic methods for addressing large-scale
control problems.

Elevator group control serves as our primary testbed.  The elevator domain
poses a combination of challenges not seen in most RL research to date.
Elevator systems operate in continuous state spaces and in continuous time
as discrete event dynamic systems.  Their states are not fully observable
and they are non-stationary due to changing passenger arrival rates.  As a
way of streamlining the search through policy space, we use a team of RL
agents, each of which is responsible for controlling one elevator car.  The
team receives a global reinforcement signal which appears noisy to each
agent due to the effects of the actions of the other agents, the random
nature of the arrivals and the incomplete observation of the state.  In
spite of these complications, we show results that in simulation surpass
the best of the heuristic elevator control algorithms of which we are
aware.  These results demonstrate the power of RL on a very large scale
stochastic dynamic optimization problem of practical utility.