Convergence of constructive algorithms.

Marcus Frean marcus at cns.edinburgh.ac.uk
Fri Aug 10 16:37:13 EDT 1990


Jordan Pollack writes:

> In a related paper (which I cannot find), I'm quite sure that someone
> proved by construction that any (n input, 1 output) boolean function
> could be accomplished by a layering of TLU's, where each additional
> unit is guaranteed to decrease the number of mis-classified inputs.
> Perhaps this approach would help lead to some convergence proof for CC
> networks.


There are several papers that show convergence via guaranteeing each
unit reduces the output's errors by at least one. 

[NB: They all use linear threshold units, and require for convergence
that the training set be composed of binary patterns (or at least
convex: every pattern must be separable from all the others), since
then the worst case is always that a new unit captures a single
pattern and hence is able to correct the output unit by one.]


These include 
The "Tower algorithm":
	Gallant,S.I. 1986a. Three Constructive Algorithms for Network
	Learning.  Proc. 8th Annual Conf. of Cognitive Science Soc.
	p652-660. 
also discussed in 
	Nadal,J. 1989. Study of a Growth Algorithm for Neural Networks
	International J. of Neural Systems, 1,1:55-59 
The performance of this method closely matches that of the "Tiling"
Algorithm of Mezard and Nadal, although the proof there is for
reduction of at least one error per layer rather than per unit.



The "neural decision tree" approach is shown to converge by 
	M. Golea and M. Marchand, A Growth Algorithm for Neural
	Network Decision Trees, EuroPhys.Lett. 12, 205 (1990).
and also
	J.A. Sirat and J.P. Nadal, Neural Trees: A New Tool for
	Classification, preprint, submitted to "Network", April 90.



The "Upstart" algorithm (my favourite....)
	Frean,M.R. 1990. The Upstart Algorithm: A Method for
	Constructing and Training Feedforward Neural Networks.
	Neural Computation. 2:2, 198-209.  
in which new units are devoted to correcting errors made by existing
units (in this sense it has bears some resemblance to Cascade
Correlation).  A binary tree of units is constructed, but it is not a
decision tree: "daughter" units correct their "parent", with the most
senior parent being the output unit.

Marcus.
---------------------------------------------------------------------



More information about the Connectionists mailing list