No subject

Fri Oct 6 16:59:50 EDT 1989

Steve Harnad asks:

> I have a simple question: What capabilities of PDP systems do and
> do not depend on the net's actually being implemented in parallel,
> rather than just being serially simulated? Is it only speed and
> capacity parameters, or something more?

An alternative question to ask is: 

What differences does synchronous vs asynchronous processing make? 

Both may be _implemented_ in on serial or parallel machines - synch 
on serial by keeping old state vectors, synch on parallel by using 
some kind of lock-step control (with associated costs), asynch on 
serial by adding a stochastic model of unit/neuron processing, asynch 
on parallel - trivial.

The _importance_ of of synch vs asynch is apparent for Hopfield/Little nets 
and Boltzmann machines: 

For Hopfield (utilising asynch processing, with random selection of one
unit at a time and full connectivity), you get one Energy (Liapunov) 
function. 

BUT for Little nets (utilising synch processing - entire new 
state vector computed from the old one), you have a different but 
related Energy function. These two Energy function have the same 
stationary points, but the dynamics differ.

[I can't comment on performance implications].

For Boltzmann machines, three different regimes may apply (if not all units
are connected). The same two as above (with different dynamics) and I
recall that there is no general convergence proof for the full synch case.
Another parallel regime (ie, synch) updating where sets of neuron 
(no two directly connected) are processed together - dynamically, this 
corresponds exactly to asynch updating, but with linear performance scaling
on parallel machines (assuming the partitioning problem was done ahead
of time).

Answers to the question for back-prop are more diverse: To maintain
equivalence with asynch processing, Parallel implementations may synch
process _layers_ at a time, or a pipeline effect may be set up, 
or the data may be managed to optimise some measure of performance 
(eg, for learning or info processing). HOWEVER, there _must_ be 
synchronisation between the computed and desired output values for 
back-prop learning to work (to compute the delta).

Someone else should comment.

Tom Osborn

On Sabbatical Leave (till Jan '90) at:
Center for Neural Engineering,
University of Southern California
Los Angeles  CA  90089-0782

'Permanently', University of Technology, Sydney.