problems with large training sets

Fri Jan 5 11:00:15 EST 1990

Dear Connectionists:

I have a problem which I believe is shared by many others.

In taking error propagation networks out of the "toy problem" domain, and into
the "real world", the number of examples in the training set increases
rapidly.  For weight updating, true gradient descent requires calculating the
partial gradient from every element in the training set and taking a small
step in the opposite direction to the total gradient.  Both these requirements
are impractical when the training set is large.  Adaptive step size techniques
can give an order of magnitude decrease in computation over a fixed scaling of
the gradient and, for initial training, small subsets can give a sufficiently
accurate estimation of the gradient.  My problem is that I don't have an
adpative step size algorithm that works on the noisy gradient obtained from a
subset of the training set.  Does anyone have any ideas?  (I'd be glad to
coordinate suggestions and short summaries of published work and post back to
the list.)  To kick off, my best technique to date is included below.

Thanks,

Tony Robinson. 
(ajr at eng.cam.ac.uk)