batch-mode parallel implementations 
    neural!lamoon.neural!yann@att.att.com 
    neural!lamoon.neural!yann at att.att.com
       
    Fri Oct 18 11:08:03 EDT 1991
    
    
  
   Scott Fahlman writes:
    >I avoid using the term "on-line" for what I call "per-sample" or
    >"continuous" updating of weights.
I personally prefer the phrase "stochastic gradient" to all of these.
   >I guess you could measure redundancy by seeing if some subset of the
   >training data set produces essentially the same gradient vector as the full
   >set.
Hmmm, I think any dataset for which you expect good generalization is redundant.
Train your net on 30% of the dataset, and measure how many of the remaining
70% you get right. If you get a significant portion of them right, then
accumulating gradients on these examples (without updating the weights) would
be little more than a waste of time.
This suggests the following (unverified) postulate:
 The better the generalization, the bigger the speed difference between
 on-line (per-sample, stochastic....) and batch.
In other words, any dataset interesting enough to be learned (as opposed to
stored) has to be redundant.
There might be no such thing as a large non-redundant dataset that is worth 
learning.
  -- Yann
    
    
More information about the Connectionists
mailing list