some questions on training neural nets...

Tue Feb 1 03:37:10 EST 1994

Hi neural net experts,

I am using backprop (and variations of it) quite often although I have
not followed neural net (NN) research as well as I wanted. Some rather 
basic issues in training NN still puzzle me a lot, and I hope to get advice 
and help from the experts in the area. Sorry for being ignorant.

Say we are learning a function F (such as a Boolean function of n vars).
The training set (TR) and testing set (TS) are drawn randomly according to
the same probability distribution, with no noise added in.

1. Is it true that, since there is no noise, the smaller the training error
on TR, the better it would predict in general on TS? That is, stopping 
training earlier is not needed (so cross-validation is not needed).

2. Is it true that, to get reliable prediction (good or bad), we should
always choose net architecture with a minimum number of hidden units 
(or weights via weight decaying)? Will cross-validation help if we have
too much freedom in the net (could results on the validation set be coincident)?

3. If, for some reason, cross-validation is needed, and TR is split to
TR1 (for training) and TR2 (for validation), what would be the proper ways
to do cross-validation? Training on TR1 uses only partial information in 
TR, but training TR1 to find right parameters and then training on TR1+TR2 
may require parameters different from the estimation of training TR1. 

4. In case the net has too much freedom (even different random seeds
produce very different predictive accuracies), how can we effectively 
reduce the variations? Weight decaying seems to be a powerful tool, any others?
What kind of "simple" functions weight decaying is biased to?

Thanks very much for help
Charles