committee's

Fri Jul 30 04:33:31 EDT 1993

It is great that attention is focussed on the effective use
of solution space samples for non-linear models.

Allow me to promote our pre-historic work on network voting:

NEURAL NETWORK ENSEMBLES  by  L.K. Hansen and P. Salamon
IEEE Trans. Pattern Analysis and Machine Intell. {\bf 12}, 993-1001, (1990)

Besides finding experimentally that the ensemble consensus often 
is 'better than the best'.... expressions were derived
for the ensemble error rate based on different assumptions
on error correlations. The key invention is to describe the ensemble by the
'difficulty distribution'. This description was inspired by earlier work on
so called 'N-version programming' by Eckhardt and Lee:

A THEORETICAL BASIS FOR THE ANALYSIS OF MULTIVERSION SOFTWARE
SUBJECT TO COINCIDENT ERRORS  by  D.E. Eckhardt and L.D. Lee
IEEE Trans. Software Eng. {\bf 11} 1511-1517 (1985)

In a feasibility study on Handwritten digits the viability of
voting among small ensembles was confirmed (the consensus outperformed 
the best individual by 25%) and the theoretical
estimate of ensemble performance was found to fit well to
the observed. Further, the work of Schwartz et al.
[Neural Computation {\bf 2}, 371-382 (1990)] was applied to estimate the 
learning curve based on the distribution of generalizations of
a small ensemble:

ENSEMBLE METHODS FOR HANDWRITTEN DIGIT RECOGNITION  by
L.K. Hansen, Chr. Liisberg, and P. Salamon
In proceedings of The Second IEEE Workshop on Neural
Networks for Signal Processing: NNSP'92 Eds. S.Y. Kung et al., 
IEEE Service Center Piscataway, 333-342, (1992)

While I refer to these methods as *ensemble* methods
(to emphasize the statistical relation and to invoke
associations to artistic ensembles), I note that theorists have 
reserved  *committee machines* for a special, constrained, 
network architecture (see eg. Schwarze and Hertz 
[Euro.Phys.Lett. {\bf 20}, 375-380, (1992)]). 
In the theorist committee (TC) all weights from hiddens to output are fixed
to unity during training. This is very different from voting
among independently trained networks: while the TC explores
the function space of a large set of parameters (hence needs
very many training examples), a voting system based on independently
trained nets only explores the function space of the individual
network. The voting system can improve generalization by reducing
'random' errors due to training algorithms etc.

---------------------

 Lars Kai Hansen,                                Tel:   (+45) 4593 1222 (tone) 3889
 CONNECT, Electronics Institute B349             Fax:   (+45) 4288 0117
 Technical University of Denmark                 email: lars at eiffel.ei.dth.dk
 DK-2800 Lyngby DENMARK