Subtractive methods / Cross validation (includes summary)
pratt@cs.rutgers.edu
pratt at cs.rutgers.edu
Wed Nov 27 15:33:37 EST 1991
Hi,
FYI, I've summarized the recent discussion on subtractive methods below.
A couple of comments:
o [Ramachandran and Pratt, 1992] presents a new subtractive method, called
Information Measure Based Skeletonisation (IMBS). IMBS induces a decision
tree hidden unit hyperplanes in a learned network in order to detect which
are superfluous. Single train/test holdout experiments on three real-world
problems (Deterding vowel recognition, Peterson-Barney vowel recognition,
heart disease diagnosis) indicate that this method doesn't degrade
generalization scores while it substantially reduces hidden unit counts.
It's also very intuitive.
o There seems to be some confusion between the very different goals of:
(1) Evaluating the generalization ability of a network,
and
(2) Creating a network with the best possible generalization performance.
Cross-validation is used for (1). However, as P. Refenes points out, once
the generalization score has been estimated, you should use *all* training
data to build the best network possible.
--Lori
@incollection{ ramachandran-92,
MYKEY = " ramachandran-92 : .con .bap",
EDITOR = "D. S. Touretzky",
BOOKTITLE = "{Advances in Neural Information Processing Systems 4}",
AUTHOR = "Sowmya Ramachandran and Lorien Pratt",
TITLE = "Discriminability Based Skeletonisation",
ADDRESS = "San Mateo, CA",
PUBLISHER = "Morgan Kaufmann",
YEAR = 1992,
NOTE = "(To appear)"
}
Summary of discussion so far:
hht: Hans Henrik Thodberg <thodberg at nn.meatre.dk>
sf: Scott_Fahlman at sef-pmax.slisp.cs.cmu.edu
jkk: John K. Kruschke <KRUSCHKE at ucs.indiana.edu>
rs: R Srikanth <srikanth at cs.tulane.edu>
pr: P.Refenes at cs.ucl.ac.uk
gh: Geoffrey Hinton <hinton at ai.toronto.edu>
kl: Ken Laws <LAWS at ai.sri.com>
js: Jude Shavlik <shavlik at cs.wisc.edu>
hht~~: Request for discussion. Goal is good generalisation: achievable
hht~~: if nets are of minimal size. Advocates subtractive methods
hht~~: over additive ones. Gives Thodberg, Lecun, Weigend
hht~~: references.
sf~~: restricting complexity ==> better generalization only when
sf~~: ``signal components are larger and more coherent than the noise''
sf~~: Describes what cascade correlation does.
sf~~: Questions why a subtractive method should be superior to this.
sf~~: Gives reasons to believe that subtractive methods might be slower
sf~~: (because you have to train, chop, train, instead of just train)
jkk~~: Distinguishes between removing a node and just removing its
jkk~~: participation (by zeroing weights, for example). When nodes
jkk~~: are indeed removed, subtractive schemes can be more expensive,
jkk~~: since we are training nodes which will later be removed.
jkk~~: Cites his work (w/Mavellan) on schemes which are both additive
jkk~~: and subtractive.
rs~~: Says that overgeneralization is bad: distinguishes best fit from
rs~~: most general fit as potentially competing criteria.
pr~~: Points out that pruning techniques are able to remove redundant
pr~~: parts of the network. Also points out that using a cross-validation
pr~~: set without a third set is ``training on the testing data''.
gh~~: Points out that, though you might be doing some training on the testing
gh~~: set, since you only get a single number as feedback from it, you aren't
gh~~: really fully training on this set.
gh~~: Also points out that techniques such as his work on soft-weight sharing
gh~~: seem to work noticeably better than using a validation set to decide
gh~~: when to stop training.
hht~~: Agrees that comparitive studies between subtractive and additive
hht~~: methods would be a good thing. Describes a brute-force subtractive
hht~~: Argues, by analogy to automobile construction and idea generation, why
hht~~: subtractive methods are more appealing than additive ones.
~~pr: Argues that you'd get better generalization if you used more
~~pr: examples for training; in particular not just a subset of all
~~pr: training examples present.
~~kl: Points out the similarity between the additive/subtractive debate
~~kl: and stepwise-inclusion vs stepwise-deletion issues in multiple
~~kl: regression.
~~js: Points out that when reporting the number of examples used for
~~js: training, it's important to include the cross-validation examples
~~js: as well.
More information about the Connectionists
mailing list