redundancy (was Re: batch-mode implementations)
honavar@iastate.edu
honavar at iastate.edu
Sat Oct 19 13:30:33 EDT 1991
Scott Fahlman wrote:
>>I guess you could measure redundancy by seeing if some subset of the
>>training data set produces essentially the same gradient vector as the full
>>set.
Yann Le Cun responded:
> Hmmm, I think any dataset for which you expect good generalization is redunda
nt.
> Train your net on 30% of the dataset, and measure how many of the remaining
> 70% you get right. If you get a significant portion of them right, then
> accumulating gradients on these examples (without updating the weights) would
> be little more than a waste of time.
It is probably useful to distinguish between redundancy WITHIN the training set
and the redundancy BETWEEN the training and test sets (or, redundancy in
the combined training and test sets). I suspect Scott Fahlman was
refering to the redundancy (R1) within the training set while Le Cun
was refering to the redundancy (R2) in the set formed by the union of
training set and test set (please correct me if I am wrong). I would
expect the relationship between generalization and R1 to be quite different
from the relationship between generaization and R2.
Whether the two measures of redundancy will be the same or not will almost
certainly depend on the method(s) (e.g., sampling procedures, sample size
reduction techniques) used to arrive at the data actually given to the
network during training.
In fact, if a training set T (obtained say, by random sampling
from some underlying distribution) were to be preprocessed in
some fashion (e.g., using statistical techniques) and reduced
training set T' was obtained from T after eliminating the "redundant" samples,
clearly the redundancy (R1') within the reduced training set T' will be much
smaller than the redundancy (R1) in the original training set T although the
overall redundancy (R2) in the set formed by the union of T and the test data
may be more or less equal to the redundancy (R2') in the set formed by the
union of T' and the test data. My guess is that the generalization on the test
data will be more or less the same irrespective of whether T or T' is used for
training the network.
Vasant Honavar
honavar at iastate.edu
More information about the Connectionists
mailing list