Subtractive network design

Wed Nov 20 06:13:34 EST 1991

You point out (quite correctly) that the validation set only
gives a single number. 
Now, suppose we have a dataset of k training vectors. We divide
this dataset into two subsets (N, M) of sizes  n, m such 
that n+m=k. We use the first subset as the training set,
and the second subset as the validation set. 
The only difference between N and M is that N is used during 
both passes whilst M is only used during the forward pass. 

My argument is that if we used M for both passes we would 
still get a better generalisation anyway because we have more
points from which to approximate the polynomial, and more
constraints to satisfy. The only case in which this is not 
true is when N is already sufficiently large (and
representative) but this is hardly ever the case in practise.

You also say: 

> I think this is wrong because you only get a single number
> (when to stop training) from the validation set. So even if 
> you made the validation contain infinitely many cases, you 
> would still be limited by the size of the original training 
> set.

My conjecture is that if you used these "infinitely many cases",
for both passes (starting with a small network and  increasing 
it gradually until convergence) you would get equally good, and 
perhaps better generalisation.

Paul