Multiple Models, Committee of nets etc...
Michael P. Perrone
mpp at cns.brown.edu
Thu Jul 29 02:43:58 EDT 1993
For those interested in the recent discussion of Multiple Models, Committees,
etc., the following references may be of interest. The first three references
deal exactly with the issues that have recently been discussed on Connectionists.
The salient contributions from these papers are:
1) A very general result which proves that averaging ALWAYS improves optimization
performance for a broad class of (convex) optimization problems including MSE, MLE,
Maximum Entropy, Maximum Mutual Information, Splines, HMMs, etc. This is a result
about the topology of the optimization measure and is independent of the underlying
data distribution, learning algorithm or network architecture.
2) A closed form solution to the optimal weighted average of a set of regression
estimates (Here, I regard density estimation and classification as special cases of
regression) for a given cross-validation set and MSE optimization. It should be
noted that the solution may suffer from over-fitting when the CV set is not
representative of the true underlying distribution. However the solution is
amenable to ridge regression and a wide variety of heuristic robustification
techniques.
3) Experiments on real-world datasets (NIST OCR data, human face data and timeseries
data) which demonstrate the improvement due to averaging. The improvement is so
dramatic that in most cases the average estimator performs significantly better than
the best individual estimator. (It is important to note that the CV performance of
a network is not a guaranteed predictor for performance on an independent test set.
So a network which has the best performance on the CV set may not have the best
performance on the test set; however in practice, even when the CV performance is a
good predictor for test set performance, the average estimator usually performs
better.)
4) Numerous extensions including bootstrapped and jackknifed neural net generation; and
averaging over "hyperparameters" such as architectures, priors and/or regularizers.
5) An interpretation of averaging in the case of MSE optimization, as a
regularizer which performs smoothing by variance reduction. This implies that
averaging is having no effect on the bias of the estimators. In fact, for a given
population of estimators, the bias of the average estimator will be the same as
the expected bias of any estimator in the population.
6) A very natural definition of the number of "distinct" estimators in a population
which emphasizes two points: (a) Local minima are not necessarily a bad thing!
We can actually USE LOCAL MINIMA TO IMPROVE PERFORMANCE; and (b) There is an
important distinction between the number of local minima in parameter space and
the number of local minima in function space. Function space is what we are
really concerned with and empirically, averaging suggests that there are not
that many "distinct" local minima in trained populations. Therefore one direction
for the future is to devise ways of generating as many "distinct" estimators as
possible.
The other three references deal with what I consider to be the flip side of the
same coin: On one side is the problem of combining networks, on the other is the
the problem of generating networks. These three references explore neural net
motivated divide and conquer heuristics within the CART framework.
Enjoy!
Michael
--------------------------------------------------------------------------------
Michael P. Perrone Email: mpp at cns.brown.edu
Institute for Brain and Neural Systems Tel: 401-863-3920
Brown University Fax: 401-863-3934
Providence, RI 02912
--------------------------------------------------------------------------------
@phdthesis{Perrone93,
AUTHOR = {Michael P. Perrone},
TITLE = {Improving Regression Estimation: Averaging Methods for Variance Reduction
with Extensions to General Convex Measure Optimization},
YEAR = {1993},
SCHOOL = {Brown University, Institute for Brain and Neural Systems; Dr. Leon N Cooper, Thesis Supervisor},
MONTH = {May}
}
@inproceedings{PerroneCooper93CAIP,
AUTHOR = {Michael P. Perrone and Leon N Cooper},
TITLE = {When Networks Disagree: Ensemble Method for Neural Networks},
BOOKTITLE = {Neural Networks for Speech and Image processing},
YEAR = {1993},
PUBLISHER = {Chapman-Hall},
EDITOR = {R. J. Mammone},
NOTE = {[To Appear]},
where = {London}
}
@inproceedings{PerroneCooper93WCNN,
AUTHOR = {Michael P. Perrone and Leon N Cooper},
TITLE = {Learning from What's Been Learned: Supervised Learning in Multi-Neural Network Systems},
BOOKTITLE = {Proceedings of the World Conference on Neural Networks},
YEAR = {1993},
PUBLISHER = {INNS}
}
---------------------
@inproceedings{Perrone91,
AUTHOR = {M. P. Perrone},
TITLE = {A Novel Recursive Partitioning Criterion},
BOOKTITLE = {Proceedings of the International Joint Conference on Neural Networks},
YEAR = {1991},
PUBLISHER = {IEEE},
PAGES = {989},
volume = {II}
}
@inproceedings{Perrone92,
AUTHOR = {M. P. Perrone},
TITLE = {A Soft-Competitive Splitting Rule for Adaptive Tree-Structured Neural Networks},
BOOKTITLE = {Proceedings of the International Joint Conference on Neural Networks},
YEAR = {1992},
PUBLISHER = {IEEE},
PAGES = {689-693},
volume = {IV}
}
@inproceedings{PerroneIntrator92,
AUTHOR = {M. P. Perrone and N. Intrator},
TITLE = {Unsupervised Splitting Rules for Neural Tree Classifiers},
BOOKTITLE = {Proceedings of the International Joint Conference on Neural Networks},
YEAR = {1992},
ORGANIZATION = {IEEE},
PAGES = {820-825},
volume = {III}
}
More information about the Connectionists
mailing list