Does backprop need the derivative ??

Tue Feb 9 17:25:35 EST 1993

The question has been asked whether the full derivative is needed for
backprop to work, or whether the sign of the derivative is sufficient. 

As far as I am aware, the discussion has not defined at what point the
derivative is truncated to +/-1. This might occur (1) for each input/output
pair when the error is fed into the output layer, (2) in epoch based
learning, the exact derivative of each weight over the training set might be
computed, but the update to the weight truncated, or (3...) many
intermediate cases.

I believe one problem with limited precision weights is as follows. The
magnitude of the update may be smaller than the limit of precision on the
weight (which has much greater magnitude). If the machine arithmetic then
rounds the updated weight to the nearest representable value, the updated
weight will be rounded to its old value, and no learning will occur. 

I am co-author of a technical report which addressed this problem. In our
algorithm, weights had very limited precision but their derivatives over the
whole training set were computed exactly. The weight update step would shift
the weight value to the next representable value with a probability
proportional to the size of the derivative. 

In our inexhaustive testing, we found that very limited precision weights
and activations could be used. 

The technical report is available in hardcopy (limited numbers) and
postscript. My addresses are "guy at cs.uq.oz.au" and "Guy Smith, Department of
Computer Science, The University of Queensland, St Lucia 4072, Australia".

Guy Smith.