combining generalizers' guesses

Thu Jul 29 16:50:53 EDT 1993

Barak Pearlmutter writes:
>For instance, if you run backpropagation on the same data twice, with
>the same architecture and all the other parameters held the same, it
>will still typically come up with different answers.  Eg due to
>differences in the random initial weights.
..
>  Averaging out this effect is a guaranteed win.
>                                       --Barak.

I agree.  I think that the surprising issue here is that the local minima
that people have been trying like crazy to avoid for the passed few years 
can actually be used to improve performance!  

I think that one direction to take is be to stop trying to find the global
optimum and instead try to find "complementary" or "orthogonal" local optima.
Reilly's multi-resolution architectures [1], Schapire's Boosting algorithm [2] 
and Brieman's Stacked Regression [3] are good examples.  Of course there are
many other approaches that one could take some of which are proposed in my PhD
thesis.

I think that there is a lot of work to be done in this area.  I'd be glad to
hear from people experimenting with related algorithms or who are interested
in discussing more details.

Michael
--------------------------------------------------------------------------------
Michael P. Perrone                                      Email: mpp at cns.brown.edu
Institute for Brain and Neural Systems                  Tel:   401-863-3920
Brown University                                        Fax:   401-863-3934
Providence, RI 02912

[1]
@incollection{ReillyEtAl87,
   AUTHOR    = {R. L. Reilly and C. L. Scofield and C. Elbaum and L. N Cooper},
   TITLE     = {Learning System Architectures Composed of Multiple Learning Modules},
   BOOKTITLE = {Proc. IEEE First Int. Conf. on Neural Networks},
   YEAR      = {1987},
   PUBLISHER = {IEEE},
   PAGES     = {495-503},
   volume    = 2
   }

[2]
@article{Schapire90,
   AUTHOR    = {R. Schapire},
   TITLE     = {The strength of weak learnability},
   JOURNAL   = {Machine Learning},
   YEAR      = {1990},
   NUMBER    = {2},
   PAGES     = {197-227},
   VOLUME    = {5}
}

[3]
@techreport{Breiman92,
   AUTHOR    = {Leo Breiman},
   TITLE     = {Stacked regression},
   YEAR      = {1992},
   INSTITUTION = {Department of Statistics, University of California, Berkeley},
   MONTH     = {August},
   NUMBER    = {{TR}-367},
   TYPE      = {Technical Report}
}