batch learning
Geoffrey Hinton
hinton at ai.toronto.edu
Fri Nov 1 10:19:01 EST 1991
Differentiation is a linear operator. So the derivative of the sum of the
individual errors is the sum of the derivatives of the individual errors.
The fact that differentiation is linear is actually helpful for some fancier
ideas. To improve the conditioning of the error surface, it would be nice to
convolve it with a gaussian blurring function so that sharp curvatures across
ravines are reduced, but gentle slopes along ravines remain. Instead of
convolving the error surface and then differntiating, we can differentiate and
then convolve if we want. The momentum method is actually a funny version of
this where we convolve along the path using a one-sided exponetially decaying
filter.
PS: As Rick Szeliski pointed out years ago, convolving a quadratic surface
with a gaussian does not change its curvature, it just moves the whole thing
up a bit (I hope I got this right!).
But of course, our surfaces are not quadrartic. They have plateaus, and
convolution with a gaussian causes nearby plateaus to smooth out nasty
ravines, and also allows the gradients in ravines to be "seen" on the plateaus.
Geoff
More information about the Connectionists
mailing list