"Orthogonality" of the generalizers being combined

Mon Jul 3 10:15:53 EDT 1995

David Wolpert wrote

> Getting back to the precise subject of Nathan's posting: Those
> interested in a formal analysis touching on how the generalizers being
> combined should differ from one another should read the Ander Krough
> paper (to come out in NIPS7) that I mentioned in my previous
> posting.

The reference is (it is not terribly formal):

  "Neural Network Ensembles, Cross Validation, and Active Learning"
  by Anders Krogh and Jesper Vedelsby
  To appear in NIPS 7.

It is in Neuroprose (see below for details) or at
http://www.nordita.dk/~krogh/papers.html.

Peter Sollich and I are finishing up an analysis of an ensemble of
linear networks.  It may sound trivial, but it actually isn't.  We'll
post it when we're done.  Among other things, we find that averaging
under-regularized (ie over-fitting) networks trained on slightly
different training sets can give a great improvement over a single
network trained on all the data.  This doesn't sound too surprising,
but I think that is why ensembles work in a lot of applications:
People use identical over-parametrized networks and then the ensemble
averages the over-fitting away.  I've seen some neural network
predictors for protein secondary structure where that is the case.  It
means that an ensemble can sometimes replace regularization.   We
discuss it a bit in

  "Improving Prediction of Protein Secondary Structure using Structured
     Neural Networks and Multiple Sequence Alignments"
  by by S\o ren K. Riis and Anders Krogh
  NORDITA preprint 95/34 S (try http://www.nordita.dk/~krogh/papers.html)

It's hot today.

 - Anders

---------------------------------------------------------------------

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/krogh.ensemble.ps.Z

The file krogh.ensemble.ps.Z can now be copied from Neuroprose.
The paper is 8 pages long.

Hardcopies are NOT available.

       Neural Network Ensembles, Cross Validation, and Active Learning

                    by Anders Krogh and Jesper Vedelsby

Abstract:
Learning of continuous valued functions using neural network ensembles
(committees) can give improved accuracy, reliable estimation of the
generalization error, and active learning.  The ambiguity is defined as
the variation of the output of ensemble members averaged over unlabeled
data, so it quantifies the disagreement among the networks.  It is
discussed how to use the ambiguity in combination with cross-validation
to give a reliable estimate of the ensemble generalization error, and
how this type of ensemble cross-validation can sometimes improve
performance.  It is shown how to estimate the optimal weights of the
ensemble members using unlabeled data.  By a generalization of query by
committee, it is finally shown how the ambiguity can be used to select
new training data to be labeled in an active learning scheme.

The paper will appear in
G. Tesauro, D. S. Touretzky and T. K. Leen, eds.,
"Advances in Neural Information Processing Systems 7",
MIT Press, Cambridge MA, 1995.

________________________________________

Anders Krogh

Nordita
Blegdamsvej 17, 2100 Copenhagen, Denmark

email: krogh at nordita.dk
Phone: +45 3532 5503
Fax:   +45 3138 9157
W.W.Web: http://www.nordita.dk/~krogh/
________________________________________