Efficient parallel Backprop

Thu Nov 7 17:36:00 EST 1991

The trick mentioned by Hal McCartor which consists in storing each weight
twice (one copy in the processor that takes care of the presynaptic unit, and
one copy in the processor that takes care of the postsynaptic unit) and update
them both independently is probably one of the best techniques.  It does
require to make strong assumptions about the architecture of the network, and
only costs you a factor of two in efficiency.  It requires to broadcast the
states, but in most cases there is a lot less states than weights.
Unfortunately, it does not work so well in the case of shared-weight networks.

I first heard about it from Leon Bottou (then at University of Paris-Orsay)
in 1987. This trick was used in the L-NEURO backprop chip designed by Marc
Duranton at the Philips Labs in Paris.

  -- Yann  Le Cun