XOR and BP

Fri Mar 19 11:42:34 EST 1993

David Glaser writes:

>> ..................., but isn't back-prop with a learning rate of 1
>> (see Luis B. Almeida's posting of 15.3.93) doing something quite a lot
>> like random walk ?

Probably not really.
I ran a couple of simulations using the 2-2-1 (+ true unit) 
architecture but doing random search in weight space (instead 
of backprop). On average, I had to generate 1500 random weight 
initializations before hitting the first XOR solution (with a  
uniform distribution for each weight between -10.0 and +10.0).  

Different architectures and different initialization conditions 
influence the average number of trials, of course. Since there 
are only 16 mappings from the set of 4 input patterns to a single 
binary output, a hypothetical bias-free architecture allowing 
only such mappings would require about 16 random search trials 
on average. The results above seem to imply that Luis' backprop 
procedure had to fight against a `negative' architectural bias.

The success of any learning system depends so much on the right 
bias.  Of course, there are architectures and corresponding 
learning algorithms that solve XOR in a single `epoch'.

Juergen Schmidhuber
Institut fuer Informatik
Technische Universitaet Muenchen
Arcisstr. 21,  8000 Muenchen 2,  Germany 
schmidhu at informatik.tu-muenchen.de