No subject

Fri Mar 10 05:43:16 EST 1989

I have recently explored several connectionist models for learning
under _realistic_ learning scenarios. The class of  problems for
which we are trying to acquire solutions by learning are decision
problems with the following characteristics:

(i) large number of continuous-valued PARAMETERS, each of which
	(ia) takes on values from a finite range with a nonstationary
		distribution
	(ib) costs more to measure accurately.
		{however, accuracy can be controlled by focussed sampling}
	(ic) is not known to follow any particular parametric distribution
(ii) the optimization CRITERION (energy, if you will) is ill-defined
	{much like the _blackbox_ in David Ackley's thesis}
(iii) a set of OPERATORS is available, and these are the _only_ instruments
	for manipulating the problem state.
	(iiia) the _causal_ relationships between the states before and
		after the application of the operator are not known
	(iiib) the _persistence_ model is incomplete - i.e. it is not
		known a priori as to when the effect of an action will
		be felt and how long will it persist
(iv) the TRAINING ENVIRONMENT is _slow reactive_ : it can be assumed to
	produce reinforcement (prescriptive feedback) rather than an
	error (evaluative feedback); however, the delays between an action
	and subsequent reinforcement follow an _unknown_ distribution.
-------
These have been called Dynamic Decision Problems, and shown to be a rich class,
in the following publication [available upon request from the first author]:

Mehra, P. and B. W. Wah, "Architectures for Strategy Learning,"
  in Computer Architectures for Artificial Intelligence Appli-
  cations, ed. B. Wah and C. Ramamoorthy, Wiley, New York, NY,
  1989 (in press).

{send e-mail to: mehra at cs.uiuc.edu}
-------
The above publication also examines the applicability of other well-known
learning techniques {empirical, probabilistic, decision theoretic, EBL,
hybrid techniques, learning to plan, etc} and suggests why ANSs might be
prefered over others. As a part of this comparision, several contemporary
connectionist models were found lacking in certain respects. I shall
summarize the criticisms here, and would like to have feedback from
those who have supported the use of these techniques.

BACK-PROPAGATION:
positive aspects:
	Simplicity of programming the learning algorithm
	An effective procedure for tuning of large parameter
	  sets representable as _band matrices_ (layered networks)
problematic assumptions:
	Immediate feedback
	Corrective {as against prescriptive} feedback
		[I am aware of Ron Williams' work, though]
weakness as a learning approach
	Requires tweaking of features (normalization biases) to the
	extent that the degree of generalization varies drastically
	as the degree of coarse coding changes. A great part of the
	success in particular applications could therefore be attributed
	to the intelligence of the researcher who codes those features
	{rather than to the _learning_ algorithm}

REINFORCEMENT LEARNING
positive aspects
	Can handle prescriptive feedback
	Has been shown {Rich Sutton, Chuck Anderson} to work with delayed
	  feedback
problematic assumptions
	The implementations known to this author assume
		: persistence of effects decays _exponentially_ with time
		: heuristic assumptions such as "recency" (that the more
		  recent an action is, the more is it responsible for the
		  feedback) and frequency (that the more frequently an
		  action occurs preceding the feedback, the more likely it
		  is to have caused the feedback) are _hardwired_ into the
		  learning algorithms
	All the knowledge needed for learning is implicit as if the learning
		critter was born with algorithms assuming exponential decay
		and as if all actions in the world caused similar delay patterns
	The nodes of the network compute functions much more complex than
		in case of classical back-propagation.
weakness as a learning paradigm
	All actions that occur at the same time and with the same frequency
	are assumed equally likely to have caused the feedback. (ie. these
	algorithms have an implicitly coded causal model)

	No scope for using the same network to choose between actions having
	different causal and persistence assumptions.

	The learning algorithm amounts to a procedural encoding of environmental
	knowledge. Any success of these algorithms in realistic applications is
	in a large part due to the intelligence of the designer and the effort
	they put in (for example to find just the right lambda for the
	exponential decay factor).
-------
See my paper for details of Dynamic Decision Problems and extensive study of
how the basic learning model underlying _most_ of the existing learning
algorithms (either in AI or Connectionism) is at odds with the requirements
of training in the real world.

Comments welcome from those who read the paper, as well as from those
who just want to discuss the material of this basenote.

- Pankaj {Mehra at cs.uiuc.edu}