online parallel implementation

Sat Oct 19 12:55:19 EDT 1991

This message concerns an attempt to apply some parallelism
to online back-propagation.

I had recently access to N = 20 to 40 NeXT workstations on which I could
perform learning experiments with back-propagation. My training database
was huge (TIMIT, more than half a million patterns, but
organized in sequences - sentences - of about 100 'frames' each),
so I did not want to use a batch-based method.

The idea I attempted to implement was the following:

Split the database into N copies.
Run N versions of the network on each of the N copies (on the N machines).
Share weights _asynchronously_ among the networks, after 1 or more sequence.

A 'server' program running on a separate machine received requests
from any of the other machines to collect its contribution
and return to it the current global moving average of the weights.

Since I was running backpropagation through time the weight
update was performed only after each sequence even in the
single machine implementation, hence the update was not
much less 'online' in the parallel implementation. 

Unfortunately, I don't have anymore access to these machines
- because I have moved to a new institution - and I didn't have
time to perform enough experiments and compare this approach
with others.

Yoshua Bengio
MIT