Combining connectionism with the GA

Thu Nov 9 15:53:09 EST 1989

There's been too much Email ink spilled lately about potential interactions 
between evolutionary algorithms and connectionist networks for me not to 
throw my two cents in.  I announced a new tech report, "Evolution, learning 
and culture: Computational metaphors for adaptive algorithms" on this list a 
couple of weeks ago and I don't know whether the recent storm of messages 
has anything to do with that report or not.  Anyway, some of the things I'll 
say below are better said there.

Let me begin with Hammerstrom's analysis, related in [rudnick at cse.ogc.edu 
26 Oct 89].  His basic point seems to be that connectionist algorithms (he 
uses NetTalk as his example) take a long time, and putting an evolutionary 
outer loop around them can only make matters worse.  And if we are 
satisfied with the results of a single backpropagation (BP) search and the 
evloutionary loop is doing nothing more than randomly repeating this 
experiment, he may be right.

But only if.  First, who today is satisfied with vanilla BP in feed-forward 
networks?  (And I'm not just BP bashing; almost all of what I say applies 
equally to other gradient descent connectionist learning algorithms.)  A 
great deal of current research is concerned with both simplifying such nets 
to more minimal structures, and elaborating the nets (e.g., with recurrence) 
to solve more difficult problems.  Also, BP performance is known to be 
highly variable under stochastic variation.  Consequently most investigators 
do use an 'outer-loop' iteration, using multiple restarts to improve their 
confidence in the solution found.  The Genetic Algoritms (GAs) can help with 
these connectionist problems.  (To clarify, when I talk of 'evolutionary 
algorithms' I have some variant of John Holland's Genetic Algorithm in mind 
because I believe it to be the most powerful.  But there are other evolution-
like algorithms (eg, Ackley, Fogel Jr. & Sr.) and these may also prove useful 
to connectionists.) 

Second, there is a considerable body of work that shows that evolutionary 
search is far from random.   GA's are extremely effective sampling 
procedures, at least on some kinds of problems.  (See Goldberg's book or the 
most recent GA proceedings for characterizations of what makes a problem 
'GA-hard'.)  Further, there are reasons to believe that connectionist nets are 
a good problem for the GA: the GLOBAL SAMPLING performed by the GA is 
very compimentary to the LOCAL SEARCH performed by gradient descent 
procedures like BP.  Bridges complains that we are compounding ignorance 
when we try to consider hybrids of connectionist and GA algorithms 
[clay at cs.cmu.edu 7 Nov 89].  But we are beginning to understand the basic 
features of connectionist search (as function approximators, via analysis of 
internal structure,etc.), and there are substantial things known about the 
GA, too (e.g., Holland's Schema Theorem and its progeny).  These foundations 
do suggest deliberate research strategies and immediately eliminate others 
(eg, some of the naive ways in which we might encode a network onto a GA 
string).

There are a tremendous number of ways the techniques can be combined, 
with the GA as simply an outer loop around a conventional BP simulation 
being one of the least imaginative.  For example, when used in conjunction 
with the GA there is a very good question as to how long each BP training 
cycle must be in order to provide a useful fitness measure for the GA.  
Preliminary results of ours suggest much shorter training times are 
possible.  Similarly, use of the GA seems to tolerate quicker search (i.e., 
higher learning rate and momentum values) than typical.  Another rich 
dimension is just how a GA bit string is related to a net, from a literal 
encoding of every real number weight at one extreme to complex 
developmental programs that build nets at the other.  One feature of these 
hybrids that should not be underestimated is that the GA offers a very 
natural mechanism for introducing PARALLELISM into connectionist 
algorithms, since each individual in a population can be evaluated 
independently.  We have had some success exploiting this parallelism in two 
implementations, one in a SIMD (Connection Machine) environment and one 
using a heterogeneous mess of distributed computers.  

Finally we shouldn't let computer science drive the entire agenda.  
Theoretical biology and evolutionary neurophysiology have some extremely 
important and unresolved questions that models like these can help to 
address, for example concerning complex, Lamarckian-like interactions 
between the learning of individuals and the evolution of species.  (I think 
the Harp et al. simulation may be particularly useful to evolutionary 
neurophysiologists.)  One of the things that makes connectionism most 
exciting is that the same class of systems that interest (some) 
mathematicians as new statistical techniques also interest neuroscientists 
as a theory to coordinate their data collection.  I think GA/connectionist 
hybrids are important for similar reasons: they make sense as algorithms 
AND they make sense as models of natural phenomena.

This note has been long on enthusiasm and short on specifics.  But some 
results have already been reported, by me and others.  And I expect there to 
be many new results reported at the upcoming sessions (at IJCNN in 
Washington and at the NIPS workshop) devoted to this topic.  So watch this 
space.

Richard K. Belew

	rik%cs at ucsd.edu

	Assistant Professor
	CSE Department  (C-014)
	UCSD
	San Diego, CA 92093         
	619 / 534-2601 or  534-5288 (messages)