No subject

Tue Jun 6 06:52:25 EDT 2006

is not much you can parallelize if you do per-sample training.  Take the
vanilla version of the backprop for example, assuming a network has 20
hidden and output units and 300 weights, then all you can do in parallel is
evaluating 20 sigmoid functions and 300 multiply-add (you can't even do
that because of the dependencies among the parameters). Thus if you have
thousands of processors in a parallem machine, most processors will idle.
In strict per-sample case, sample i+1 needs to use the weight updated
by sample i, so you can't run multiple copies of the same network. And that
is the trick several people came up (indenpendently) to speed up backprop
training on parallel machines. Unless we modify the algorithm a little bit,
I can't see a way to run multiple copies of a network in parallel in
per-sample case.

- Xiru Zhang