Flat Minima
Josef Hochreiter
hochreit at informatik.tu-muenchen.de
Thu Apr 4 11:45:52 EST 1996
FTP-host: flop.informatik.tu-muenchen.de (131.159.8.35)
FTP-filename: /pub/articles-etc/hochreiter.fm.ps.gz
FLAT MINIMA
Sepp Hochreiter Juergen Schmidhuber
To appear in Neural Computation (accepted 1996)
38 pages, 154 K compressed, 463 K uncompressed
We present a new algorithm for finding low-complexity neural
networks with high generalization capability. The algorithm
searches for a ``flat'' minimum of the error function. A flat
minimum is a large connected region in weight-space where the
error remains approximately constant. An MDL-based, Bayesian
argument suggests that flat minima correspond to ``simple''
networks and low expected overfitting. The argument is based
on a Gibbs algorithm variant and a novel way of splitting
generalization error into underfitting and overfitting error.
Unlike many previous approaches, ours does not require Gauss-
assumptions and does not depend on a ``good'' weight prior -
instead we have a prior over input/output functions, thus ta-
king into account net architecture and training set. Although
our algorithm requires the computation of second order deri-
vatives, it has backprop's order of complexity. Automatically,
it effectively prunes units, weights, and input lines. Expe-
riments with feedforward and recurrent nets are described. In
applications to stock market prediction, flat minimum search
outperforms conventional backprop, weight decay, ``optimal
brain surgeon'' / ``optimal brain damage''. We also provide
pseudo code of the algorithm (omitted from the NC-version).
To obtain a copy, cut and paste one of these:
netscape http://www7.informatik.tu-muenchen.de/~hochreit/pub.html
netscape http://www.idsia.ch/~juergen/onlinepub.html
Sepp Hochreiter, TUM
Juergen Schmidhuber, IDSIA
P.S.: Info on recent IDSIA postdoc job opening:
http://www.idsia.ch/~juergen/postdoc.html
More information about the Connectionists
mailing list