Subtractive network design
P.Refenes@cs.ucl.ac.uk
P.Refenes at cs.ucl.ac.uk
Wed Nov 20 06:13:34 EST 1991
You point out (quite correctly) that the validation set only
gives a single number.
Now, suppose we have a dataset of k training vectors. We divide
this dataset into two subsets (N, M) of sizes n, m such
that n+m=k. We use the first subset as the training set,
and the second subset as the validation set.
The only difference between N and M is that N is used during
both passes whilst M is only used during the forward pass.
My argument is that if we used M for both passes we would
still get a better generalisation anyway because we have more
points from which to approximate the polynomial, and more
constraints to satisfy. The only case in which this is not
true is when N is already sufficiently large (and
representative) but this is hardly ever the case in practise.
You also say:
> I think this is wrong because you only get a single number
> (when to stop training) from the validation set. So even if
> you made the validation contain infinitely many cases, you
> would still be limited by the size of the original training
> set.
My conjecture is that if you used these "infinitely many cases",
for both passes (starting with a small network and increasing
it gradually until convergence) you would get equally good, and
perhaps better generalisation.
Paul
More information about the Connectionists
mailing list