No subject
Tom Osborn
osborn%rana.usc.edu at usc.edu
Fri Oct 6 16:59:50 EDT 1989
Steve Harnad asks:
> I have a simple question: What capabilities of PDP systems do and
> do not depend on the net's actually being implemented in parallel,
> rather than just being serially simulated? Is it only speed and
> capacity parameters, or something more?
An alternative question to ask is:
What differences does synchronous vs asynchronous processing make?
Both may be _implemented_ in on serial or parallel machines - synch
on serial by keeping old state vectors, synch on parallel by using
some kind of lock-step control (with associated costs), asynch on
serial by adding a stochastic model of unit/neuron processing, asynch
on parallel - trivial.
The _importance_ of of synch vs asynch is apparent for Hopfield/Little nets
and Boltzmann machines:
For Hopfield (utilising asynch processing, with random selection of one
unit at a time and full connectivity), you get one Energy (Liapunov)
function.
BUT for Little nets (utilising synch processing - entire new
state vector computed from the old one), you have a different but
related Energy function. These two Energy function have the same
stationary points, but the dynamics differ.
[I can't comment on performance implications].
For Boltzmann machines, three different regimes may apply (if not all units
are connected). The same two as above (with different dynamics) and I
recall that there is no general convergence proof for the full synch case.
Another parallel regime (ie, synch) updating where sets of neuron
(no two directly connected) are processed together - dynamically, this
corresponds exactly to asynch updating, but with linear performance scaling
on parallel machines (assuming the partitioning problem was done ahead
of time).
Answers to the question for back-prop are more diverse: To maintain
equivalence with asynch processing, Parallel implementations may synch
process _layers_ at a time, or a pipeline effect may be set up,
or the data may be managed to optimise some measure of performance
(eg, for learning or info processing). HOWEVER, there _must_ be
synchronisation between the computed and desired output values for
back-prop learning to work (to compute the delta).
Someone else should comment.
Tom Osborn
On Sabbatical Leave (till Jan '90) at:
Center for Neural Engineering,
University of Southern California
Los Angeles CA 90089-0782
'Permanently', University of Technology, Sydney.
More information about the Connectionists
mailing list