Instantaneous and Average performance measures

Thu Nov 7 18:23:09 EST 1991

ABOUT INSTANTANEOUS AND AVERAGE PERFORMANCE MEASURE:
  A gradient descent learning based on an estimate of the performance
measure	U(w) (w = weights) can be represented as
	dw = -a0 grad( est[U(w)] ) dt
where a0 is the step size, w represents the weights, t the time.
  The usual technique of moving the weights for each training sample
can be represented as
	dw = - a0 grad( L(w,z) ) dt
where z reprents the training sample, L(w,z) is the instantaneous
performance measure.
  A good point about using an instantaneous performance measure L(w,z)
in the gradient descent, (instead of waiting a few epochs for
estimating U(w) and upgrade the weights) is that noise is inherently
added to the process.
  Under some conditions (which ones?), the instantaneous learning can
be rewritten as
	dw = - a0 grad( U(w) ) dt + b0 dx
where x is a standard Brownian motion.  This equation represents a
diffusion process, which can be viewed as shaking the movement of the
current weight point in the weight space.  It is known that this
process is a simulated annealing process.  It is suspected that a
minimum obtained this way will be better than with an averaging
method.
  Has somebody done work on the quality of a solution obtained after a
given time of running BackProp, or simulated annealing?  Are there
quantitative results about how long it takes to reach a given quality
of solution?

send email to: thierry at signus.utah.edu