batch-mode parallel implementations
neural!lamoon.neural!yann@att.att.com
neural!lamoon.neural!yann at att.att.com
Fri Oct 18 11:08:03 EDT 1991
Scott Fahlman writes:
>I avoid using the term "on-line" for what I call "per-sample" or
>"continuous" updating of weights.
I personally prefer the phrase "stochastic gradient" to all of these.
>I guess you could measure redundancy by seeing if some subset of the
>training data set produces essentially the same gradient vector as the full
>set.
Hmmm, I think any dataset for which you expect good generalization is redundant.
Train your net on 30% of the dataset, and measure how many of the remaining
70% you get right. If you get a significant portion of them right, then
accumulating gradients on these examples (without updating the weights) would
be little more than a waste of time.
This suggests the following (unverified) postulate:
The better the generalization, the bigger the speed difference between
on-line (per-sample, stochastic....) and batch.
In other words, any dataset interesting enough to be learned (as opposed to
stored) has to be redundant.
There might be no such thing as a large non-redundant dataset that is worth
learning.
-- Yann
More information about the Connectionists
mailing list