No subject
Pankaj Mehra
mehra at aquinas.csl.uiuc.edu
Fri Mar 10 05:43:16 EST 1989
I have recently explored several connectionist models for learning
under _realistic_ learning scenarios. The class of problems for
which we are trying to acquire solutions by learning are decision
problems with the following characteristics:
(i) large number of continuous-valued PARAMETERS, each of which
(ia) takes on values from a finite range with a nonstationary
distribution
(ib) costs more to measure accurately.
{however, accuracy can be controlled by focussed sampling}
(ic) is not known to follow any particular parametric distribution
(ii) the optimization CRITERION (energy, if you will) is ill-defined
{much like the _blackbox_ in David Ackley's thesis}
(iii) a set of OPERATORS is available, and these are the _only_ instruments
for manipulating the problem state.
(iiia) the _causal_ relationships between the states before and
after the application of the operator are not known
(iiib) the _persistence_ model is incomplete - i.e. it is not
known a priori as to when the effect of an action will
be felt and how long will it persist
(iv) the TRAINING ENVIRONMENT is _slow reactive_ : it can be assumed to
produce reinforcement (prescriptive feedback) rather than an
error (evaluative feedback); however, the delays between an action
and subsequent reinforcement follow an _unknown_ distribution.
-------
These have been called Dynamic Decision Problems, and shown to be a rich class,
in the following publication [available upon request from the first author]:
Mehra, P. and B. W. Wah, "Architectures for Strategy Learning,"
in Computer Architectures for Artificial Intelligence Appli-
cations, ed. B. Wah and C. Ramamoorthy, Wiley, New York, NY,
1989 (in press).
{send e-mail to: mehra at cs.uiuc.edu}
-------
The above publication also examines the applicability of other well-known
learning techniques {empirical, probabilistic, decision theoretic, EBL,
hybrid techniques, learning to plan, etc} and suggests why ANSs might be
prefered over others. As a part of this comparision, several contemporary
connectionist models were found lacking in certain respects. I shall
summarize the criticisms here, and would like to have feedback from
those who have supported the use of these techniques.
BACK-PROPAGATION:
positive aspects:
Simplicity of programming the learning algorithm
An effective procedure for tuning of large parameter
sets representable as _band matrices_ (layered networks)
problematic assumptions:
Immediate feedback
Corrective {as against prescriptive} feedback
[I am aware of Ron Williams' work, though]
weakness as a learning approach
Requires tweaking of features (normalization biases) to the
extent that the degree of generalization varies drastically
as the degree of coarse coding changes. A great part of the
success in particular applications could therefore be attributed
to the intelligence of the researcher who codes those features
{rather than to the _learning_ algorithm}
REINFORCEMENT LEARNING
positive aspects
Can handle prescriptive feedback
Has been shown {Rich Sutton, Chuck Anderson} to work with delayed
feedback
problematic assumptions
The implementations known to this author assume
: persistence of effects decays _exponentially_ with time
: heuristic assumptions such as "recency" (that the more
recent an action is, the more is it responsible for the
feedback) and frequency (that the more frequently an
action occurs preceding the feedback, the more likely it
is to have caused the feedback) are _hardwired_ into the
learning algorithms
All the knowledge needed for learning is implicit as if the learning
critter was born with algorithms assuming exponential decay
and as if all actions in the world caused similar delay patterns
The nodes of the network compute functions much more complex than
in case of classical back-propagation.
weakness as a learning paradigm
All actions that occur at the same time and with the same frequency
are assumed equally likely to have caused the feedback. (ie. these
algorithms have an implicitly coded causal model)
No scope for using the same network to choose between actions having
different causal and persistence assumptions.
The learning algorithm amounts to a procedural encoding of environmental
knowledge. Any success of these algorithms in realistic applications is
in a large part due to the intelligence of the designer and the effort
they put in (for example to find just the right lambda for the
exponential decay factor).
-------
See my paper for details of Dynamic Decision Problems and extensive study of
how the basic learning model underlying _most_ of the existing learning
algorithms (either in AI or Connectionism) is at odds with the requirements
of training in the real world.
Comments welcome from those who read the paper, as well as from those
who just want to discuss the material of this basenote.
- Pankaj {Mehra at cs.uiuc.edu}
More information about the Connectionists
mailing list