Technical Reports Available

Tue Sep 12 17:52:57 EDT 1989

**********DO NOT FORWARD TO OTHER BBOARDS**************
**********DO NOT FORWARD TO OTHER BBOARDS**************
**********DO NOT FORWARD TO OTHER BBOARDS**************

Two new technical reports are available:

CONNECTIONIST LEARNING FOR CONTROL: AN OVERVIEW
Andrew G. Barto
Department of Computer and Information Science
University of Massachusetts, Amherst MA 01003

COINS Technical Report 89-89
September 1989

Abstract---This report is an introductory overview of learning by
connectionist networks, also called artificial neural networks, with
a focus on the ideas and methods most relevant to the control of dynamical
systems. It is intended both to provide an overview of connectionist ideas
for control theorists and to provide connectionist researchers with an
introduction to certain issues in control. The perspective taken emphasizes
the continuity of the current connectionist research 
with more traditional research in control, signal processing, and
pattern classification. Control theory is a well--developed field with a large
literature, and many of the learning methods being described by connectionists
are closely related to methods that already have been intensively studied by
adaptive control theorists. On the other hand, the directions that connectionists
are taking these methods have characteristics that are absent in the traditional
engineering approaches. This report describes these characteristics and discusses
their positive and negative aspects. It is argued that connectionist approaches
to control are special cases of memory--intensive approaches, provided a
sufficiently generalized view of memory is adopted.  Because adaptive 
connectionist networks can cover the range between structureless lookup tables and
highly constrained model--based parameter estimation, they seem well--suited for
the acquisition and storage of control information. Adaptive networks can strike
a balance between the tradeoffs associated with the extremes of the
memory/model continuum.

LEARNING AND SEQUENTIAL DECISION MAKING

A. G. Barto
Department of Computer and Information Science
University of Massachusetts, Amherst MA 01003

R. S. Sutton
GTE Laboratories Incorporated
Waltham, MA 02254

C. J. C. H. Watkins
Philips Research Laboratories
Cross Oak Lane, Redhill Surrey RH1 5HA, England

COINS Technical Report 89-95
September 1989

Abstract---In this report we show how the class of adaptive prediction
methods that Sutton called ``temporal difference,'' or TD, methods are related
to the theory of squential decision making. TD methods have been
used as ``adaptive critics'' in connectionist learning systems, and have been
proposed as models of animal learning in classical conditioning experiments.
Here we relate TD methods to decision tasks formulated in terms of a stochastic
dynamical system whose behavior unfolds over time under the influence of a 
decision maker's actions. Strategies are sought for selecting actions
so as to maximize a measure of long-term payoff gain. Mathematically,
tasks such as this can be formulated as Markovian decision problems, and
numerous methods have been proposed for learning how to solve such problems.
We show how a TD method can be understood as a novel synthesis of concepts from
the theory of stochastic dynamic programming, which comprises the standard method
for solving such tasks when a model of the dynamical system is available, and the
theory of parameter estimation, which provides the appropriate context for studying
learning rules in the form of equations for updating associative strengths in
behavioral models, or connection weights in connectionist networks. Because this
report is oriented primarily toward the non-engineer interested in animal learning,
it presents tutorials on stochastic sequential decision tasks, stochastic dynamic
programming, and parameter estimation.

You can be these reports in several ways. I have followed Jordan Pollack's
very good suggestion and placed postscript files in the account kindly provided
at Ohio State for this purpose. Here is the version of Jordan's
instructions appropriate for getting them:

ftp cheops.cis.ohio-state.edu  (or, ftp 128.146.8.62)
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> get
(remote-file) barto.control.ps
(local-file) foo.ps
    587591 bytes sent in ?? seconds (?? Kbytes/s)
ftp> get
(remote-file) barto.sequential_decisions.ps
(local-file) bar.ps
904574 bytes sent in ?? seconds (?? Kbytes/s)
ftp> quit
unix> lpr *.ps

(note: these are rather large files: 38 and 51 pages respectively
when printed)

Alternatively, you can send requests for printed copies via e-mail to
Ms. Connie Smith using the address: Smith at cs.umass.EDU

or write to
Ms. Connie Smith
Department of Computer and Information Science
University of Massachusetts
Amherst, MA 01003

(but I would prefer that you use the ftp option if possible!)

Andy Barto

**********DO NOT FORWARD TO OTHER BBOARDS**************
**********DO NOT FORWARD TO OTHER BBOARDS**************
**********DO NOT FORWARD TO OTHER BBOARDS**************