Limited precision implementations (updated posting)

Mon Feb 25 16:57:00 EST 1991

Connectionist researchers,

Here is an updated posting on limited precision implementations of
neural networks. It is my impression that research in this area is still
fragmentary. This is surprising, because the literature on analog and
digital implementations is growing very fast. There is a wide range of
possibly applicable rules of thumb. Claims about sufficient precision
differ from single bits to 20 bits or more for certain models. Hard
problems may need higher precision. There may be a trade-off between few
weights (nodes) with high precision weights (activations) versus many
weights (nodes) with low precision weights (act.). The precise relation
between precision in weights and activations remains unclear, as does
the relation between the effect of precision on learning and recall.

Thanks for all comments so far.

        Jaap

  Jacob M.J. Murre
  Unit of Experimental and Theoretical Psychology
  Leiden University
  P.O. Box 9555
  2300 RB Leiden
  The Netherlands

General comments by researchers

By Soheil Shams:

  As far as the required precision  for neural computation is concerned,
  the precision  is directly proportional to the difficulty of the problem
  you are trying to solve.  For example in training a back-propagation
  network to discriminate between two very similar classes of inputs,  you
  will need to have high precision  values and arithmetic to effectively
  find the narrow region in the space that the separating hyperplane has
  to be drawn at.  I believe that the lack of analytical information in
  this area is due to this relationship between the specific application
  and the required precision . At the NIPS90 workshop on massively
  parallel implementations,  some people indicated they have determined,
  EMPERICALLY, that for most problems, 16-bit precision is required for
  learning and 8-bit for recall of back-propagation.

By Roni Rosenfeld:

  Santosh Venkatesh (of Penn State, I believe, or is it U. Penn?) did some
  work a few years ago on how many bits are needed per weight.  The surprising
  result was that 1 bit/weight did most of the work, with additional bits
  contributing surprisingly little.

By Thomas Baker:

  ... We have found that for backprop learning, between twelve and sixteen
  bits are needed.  I have seen several other papers with these same
  results.  After learning, we have been able to reduce the weights to
  four to eight bits with no loss in network performance.  I have also
  seen others with similar results.  One method that optical and analog
  engineers use is to calculate the error by running the feed forward
  calculations with limited precision, and learning the weights with a
  higher precision. The weights are quantized and updated during training.

  I am currently collecting a bibliography on limited precision papers.
  ... I will try to keep in touch with others that are doing research in
  this area.

References

Brause, R. (1988). Pattern recognition and fault tolerance in non-linear
  neural networks. Biological Cybernetics, 58, 129-139.

Hollis, P.W., J.S. Harper, J.J. Paulos (1990). The effects of precision
  constraints in a backpropagation learning network. Neural Computation,
  2, 363-373.

Holt, J.L., & J-N. Hwang (in prep.). Finite precision error analysis of
  neural network hardware implementations. Univ. of Washington, FT-10,
  WA 98195.
  (Comments by the authors:
  We are in the process of finishing up a paper which gives
  a theoretical (systematic) derivation of the finite precision
  neural network computation.  The idea is a nonlinear extension
  of "general compound operators" widely used for error analysis
  of linear computation.  We derive several mathematical formula
  for both retrieving and learning of neural networks.  The
  finite precision error in the retrieving phase can be written
  as a function of several parameters, e.g., number of bits of
  weights, number of bits for multiplication and accumlation,
  size of nonlinear table-look-up, truncation/rounding or jamming
  approaches, and etc.  Then we are able to extend this retrieving
  phase error analysis to iterative learning to predict the necessary
  number of bits.  This can be shown using a ratio between the
  finite precision error and the (floating point) back-propagated
  error.  Simulations have been conducted and matched the theoretical
  prediction quite well.)

Hong, J. (1987). On connectionist models. Tech. Rep., Dept. Comp. Sci.,
  Univ. of Chicago, May 1987.
  (Demonstrates that a network of perceptrons needs only
  finite-precision weights.)

Jou, J., & J.A. Abraham (1986). Fault-tolerant matrix arithmetic and
  signal processing on highly concurrent computing structures. Proceedings
  of the IEEE, 74, 732-741.

Kampf, F., P. Koch, K. Roy, M. Sullivan, Z. Delalic, & S. DasGupta
  (1989). Digital implementation of a neural network. Tech. Rep. Temple
  Univ., Philadelphia PA, Elec. Eng. Div.

Moore, W.R. (1988). Conventional fault-tolerance and neural computers.
  In: R. Eckmiller, & C. Von der Malsburg (Eds.). Neural Computers. NATO
  ASI Series, F41, (Berling: Springer-Verlag), 29-37.

Nadal, J.P. (1990). On the storage capacity with sign-constrained
  synaptic couplings. Network, 1, 463-466.

Nijhuis, J., & L. Spaanenburg (1989). Fault tolerance of neural
  associative memories. IEE Proceedings, 136, 389-394.

Rao, A., M.R. Walker, L.T. Clark, & L.A. Akers (1989). Integrated
  circuit emulation of ART networks. Proc. First IEEE Conf. Artificial
  Neural Networks, 37-41, Institution of Electrical Engineers, London.

Rao, A., M.R. Walker, L.T. Clark, L.A. Akers, & R.O. Grondin (1990).
  VLSI implementation of neural classifiers. Neural Computation, 2, 35-43.
  (The paper by Rao et al. give an equation for the number of bits of
  resolution required for the bottom-up weights in ART 1:

        t = (3 log N) / log(2),

  where N is the size of the F1 layer in nodes.)