Subtractive network design

Tue Nov 19 16:28:12 EST 1991

You say

"The solution proposed by Scott 
Fahlman i.e. to use the cross-validation performance as an indicator of 
when to stop is not complete, because as soon as you do this the cross-
validation dataset becomes part of the training dataset ... So any improvement 
in generalisation is probably due to the fact that we are using a larger
training dataset."

I think this is wrong because you only get a single number (when to stop
training) from the validation set.  So even if you made the validation set
contain infinitely many cases, you would still be limited by the size of the
original training set.

Quite apart from this point, pruning techniques such as the soft-weight
sharing method recently advertised on this net by Steve Nowlan and me
(Pearlmutter, 1999) seem to work noticeably better than using a validation set
to decide when to stop training.  However, the use of a validation set is much
simpler and therefore a good thing to try for people in a hurry.

Geoff