some questions on training neural nets...
Wray Buntine
wray at ptolemy-ethernet.arc.nasa.gov
Fri Feb 4 15:47:25 EST 1994
Tom Dietterich and William Finnof covered a lot of issues.
I'd just like to highlight two points:
* this is a contentious area
* there are several opposing factors at play that
confuse our understanding of this
================ detail
Basically, this comment below is SO true.
> There are many ways to manage the bias/variance tradeoff. I would say
> that there is nothing approaching complete agreement on the best
> approaches (and more fundamentally, the best approach varies from one
> application to another, since this is really a form of prior). The
> approaches can be summarized as
The bias/variance tradeoff lies at the heart of almost all disagreements
between different learning philosophies such as classical, Bayesian, minimum
description length, resampling schemes (now often viewed as empirical
Bayesian), statistical physics approaches, and the various
"implementation" schemes.
One thing to note is that there are several quite separate forces
in operation here:
computational and search issues:
(e.g. maybe early stopping works better
because its a more efficient way of
searching the space of smaller networks ?)
prior issues:
(e.g. have you thrown in 20 attributes you
happen to think might apply, but probably
15 are irrelevant; OR did a medical
specialist carefully pick all 10 attributes
and assures you every one is important,
OR is a medical specialist able to solve the
task blind, just be reading the 20 attribute
values (without seeing the patient), etc.)
(e.g. are 30 hidden units adequate for the structure
of the task? )
asking the right question:
(e.g. sometimes the question: what's the "best" network
is a bit silly when you have a small amount of
data, perhaps you should be trying to find
10 reasonable alternative networks and pool their
results (ala. Michael Perrone's NIPS'93 workshop)
understanding your representation:
(e.g. with rule based systems, each rule has a good
interpretation so the question of how to
prune, etc., is something you can understand
well BUT with a large feed-forward network,
understanding the structure of the space is more
involved, e.g. if I set these 2 weights to zero
what the hell happens to my proposed solution)
(e.g. this confuses the problem of designing
good regularizes/priors/network-encodings).
Problem is that theory people tend to focus on one, maybe two
of these, whereas application people tend to confuse them together.
Wray Buntine
NASA Ames Research Center phone: (415) 604 3389
Mail Stop 269-2 fax: (415) 604 3594
Moffett Field, CA, 94035-1000 email: wray at kronos.arc.nasa.gov
More information about the Connectionists
mailing list