Limited precision implementations (updated posting)
MURRE%rulfsw.LeidenUniv.nl@BITNET.CC.CMU.EDU
MURRE%rulfsw.LeidenUniv.nl at BITNET.CC.CMU.EDU
Mon Feb 25 16:57:00 EST 1991
Connectionist researchers,
Here is an updated posting on limited precision implementations of
neural networks. It is my impression that research in this area is still
fragmentary. This is surprising, because the literature on analog and
digital implementations is growing very fast. There is a wide range of
possibly applicable rules of thumb. Claims about sufficient precision
differ from single bits to 20 bits or more for certain models. Hard
problems may need higher precision. There may be a trade-off between few
weights (nodes) with high precision weights (activations) versus many
weights (nodes) with low precision weights (act.). The precise relation
between precision in weights and activations remains unclear, as does
the relation between the effect of precision on learning and recall.
Thanks for all comments so far.
Jaap
Jacob M.J. Murre
Unit of Experimental and Theoretical Psychology
Leiden University
P.O. Box 9555
2300 RB Leiden
The Netherlands
General comments by researchers
By Soheil Shams:
As far as the required precision for neural computation is concerned,
the precision is directly proportional to the difficulty of the problem
you are trying to solve. For example in training a back-propagation
network to discriminate between two very similar classes of inputs, you
will need to have high precision values and arithmetic to effectively
find the narrow region in the space that the separating hyperplane has
to be drawn at. I believe that the lack of analytical information in
this area is due to this relationship between the specific application
and the required precision . At the NIPS90 workshop on massively
parallel implementations, some people indicated they have determined,
EMPERICALLY, that for most problems, 16-bit precision is required for
learning and 8-bit for recall of back-propagation.
By Roni Rosenfeld:
Santosh Venkatesh (of Penn State, I believe, or is it U. Penn?) did some
work a few years ago on how many bits are needed per weight. The surprising
result was that 1 bit/weight did most of the work, with additional bits
contributing surprisingly little.
By Thomas Baker:
... We have found that for backprop learning, between twelve and sixteen
bits are needed. I have seen several other papers with these same
results. After learning, we have been able to reduce the weights to
four to eight bits with no loss in network performance. I have also
seen others with similar results. One method that optical and analog
engineers use is to calculate the error by running the feed forward
calculations with limited precision, and learning the weights with a
higher precision. The weights are quantized and updated during training.
I am currently collecting a bibliography on limited precision papers.
... I will try to keep in touch with others that are doing research in
this area.
References
Brause, R. (1988). Pattern recognition and fault tolerance in non-linear
neural networks. Biological Cybernetics, 58, 129-139.
Hollis, P.W., J.S. Harper, J.J. Paulos (1990). The effects of precision
constraints in a backpropagation learning network. Neural Computation,
2, 363-373.
Holt, J.L., & J-N. Hwang (in prep.). Finite precision error analysis of
neural network hardware implementations. Univ. of Washington, FT-10,
WA 98195.
(Comments by the authors:
We are in the process of finishing up a paper which gives
a theoretical (systematic) derivation of the finite precision
neural network computation. The idea is a nonlinear extension
of "general compound operators" widely used for error analysis
of linear computation. We derive several mathematical formula
for both retrieving and learning of neural networks. The
finite precision error in the retrieving phase can be written
as a function of several parameters, e.g., number of bits of
weights, number of bits for multiplication and accumlation,
size of nonlinear table-look-up, truncation/rounding or jamming
approaches, and etc. Then we are able to extend this retrieving
phase error analysis to iterative learning to predict the necessary
number of bits. This can be shown using a ratio between the
finite precision error and the (floating point) back-propagated
error. Simulations have been conducted and matched the theoretical
prediction quite well.)
Hong, J. (1987). On connectionist models. Tech. Rep., Dept. Comp. Sci.,
Univ. of Chicago, May 1987.
(Demonstrates that a network of perceptrons needs only
finite-precision weights.)
Jou, J., & J.A. Abraham (1986). Fault-tolerant matrix arithmetic and
signal processing on highly concurrent computing structures. Proceedings
of the IEEE, 74, 732-741.
Kampf, F., P. Koch, K. Roy, M. Sullivan, Z. Delalic, & S. DasGupta
(1989). Digital implementation of a neural network. Tech. Rep. Temple
Univ., Philadelphia PA, Elec. Eng. Div.
Moore, W.R. (1988). Conventional fault-tolerance and neural computers.
In: R. Eckmiller, & C. Von der Malsburg (Eds.). Neural Computers. NATO
ASI Series, F41, (Berling: Springer-Verlag), 29-37.
Nadal, J.P. (1990). On the storage capacity with sign-constrained
synaptic couplings. Network, 1, 463-466.
Nijhuis, J., & L. Spaanenburg (1989). Fault tolerance of neural
associative memories. IEE Proceedings, 136, 389-394.
Rao, A., M.R. Walker, L.T. Clark, & L.A. Akers (1989). Integrated
circuit emulation of ART networks. Proc. First IEEE Conf. Artificial
Neural Networks, 37-41, Institution of Electrical Engineers, London.
Rao, A., M.R. Walker, L.T. Clark, L.A. Akers, & R.O. Grondin (1990).
VLSI implementation of neural classifiers. Neural Computation, 2, 35-43.
(The paper by Rao et al. give an equation for the number of bits of
resolution required for the bottom-up weights in ART 1:
t = (3 log N) / log(2),
where N is the size of the F1 layer in nodes.)
More information about the Connectionists
mailing list