No subject

Fri Apr 29 23:58:16 EDT 1988

To: connectionists					29 apr 88
From: wfreeman at garnet
Re: a physiologist's view of connectionism
i'd like some feedback on this essay before it gets frozen into print
and invite commentary of any sort.	thanks in advance, walter

       Why neural networks don't yet fly: inquiry into the
            neurodynamics of biological intelligence.

                Walter J Freeman
                Department of Physiology-Anatomy
                University of California
                Berkeley CA 94720 USA

        2nd Annual Intern. Conf. on Neural Networks
            San Diego CA 23 - 27 July 1988
                               Abstract

               Sensory  and  perceptual  information  exists   as
       space-time  patterns  of  neural activity in cortex in two
       modes: axonal pulses and dendritic currents.  Which one we
       observe  depends  on the experimental techniques we choose
       in order to make our observations.   The  brain  does  its
       analysis  of  sensory  input, as in feature extraction and
       preprocessing, in the medium of action potentials as point
       processes in networks of individual neurons.  It does syn-
       thesis of its sensory input with past experience  and  ex-
       pectancy  of  future action in the medium of dendritic in-
       tegration in local mean fields.  Both  kinds  of  activity
       are  found to coexist in olfactory and visual cortex, each
       preceding and then following the other.   The  transforma-
       tion  of  information from the pulse mode to the dendritic
       mode involves a state transition of the  cortical  network
       that can be modeled by a Hopf bifurcation in both software
       and hardware embodiments.  This state  transition  appears
       to  be  an essential precursor to an act of neural pattern
       classification.  However,  the  models  suggest  that  the
       classification  of  a  given  stimulus into one of several
       learned classes is done by a mapping of the stimulus  into
       a  landscape  that  has been shaped by prior learning, and
       that it is not done by a multiple bifurcation into one  of
       a  collection  of  limit cycle attractors at the moment of
       choice.

                        Introduction

     The strongest justification at present for the study  of  neural  net-
works  is the inspiration they draw from the performance characteristics of
their biological cousins.  Yet it is often unclear what is to be copied and
what  omitted.    John  Denker among others has pointed out that both birds
and airplanes have wings, but that only birds have feathers.  While  it  is
true that brains and neural networks share certain structural features such
as massive parallelism, biological networks solve complex  problems  easily
and  creatively,  and  existing neural networks do not.  Whereas Wilbur and
Orville Wright solved first the problems of lift and  then  of  control  in
flight,  neural networkers have solved only the problems of statics and not
the problems of dynamic control.  Neural networks have  not  yet  begun  to
soar.

     One reason I will argue here is that most theoreticians have pursued a
false goal of stability, and have not reckoned with the intrinsic instabil-
ity of wetware brains that enables their remarkable adaptiveness.

     A related key limitation in many  current approaches is  the  lack  of
application  by engineers of the hierarchical modes in which wetware brains
sustain information for  storage,  transformation,  and  other  operations.
There  are  two  documented modes that in some senses are diametrically op-
posed but in other senses are strongly complementary. Probably  others  ex-
ist,  but they need not concern us here.  One is typified by the action po-
tential and the point process, the other by the synaptic potential and  the
local  mean field.  In sensory systems the one is the basis for feature ex-
traction, preprocessing and analysis.  The other is the basis for experien-
tial  integration,  classification  and synthesis.  Neither can supplant or
function without the other.  They coexist in the same  layers  of  neurons,
and  whether we observe one or the other depends on how we acquire, process
and measure our biological data.

     My aim in this brief review is to exemplify these two modes of  infor-
mation, describe how they are derived from brains and how they are convert-
ed each to the other, and explain their significance for the design of  new
and more successful neural networks.

                Examples of biological information

     Information in biological networks of the kind  I  am  concerned  with
here  takes  the form of space-time neural activity patterns.  Each pattern
is relational and neither  symbolic  nor  representational.   It  does  not
"stand  for"  something outside the brain, as a letter does in an alphabet,
nor does it reside in fixed form as a goal or a "teacher".  It is a dynamic
process that mediates adaptive behavior.  It results from a stimulus and in
some sense causes a response, but it also incorporates past experience  and
the intent of future action.  These being private and unique to each brain,
we cannot in principle as observers know the exact information  content  of
each pattern or even the coordinate system in which it is embedded.

     What we can do is to establish statistically the relation of  a  given
space-time  pattern  of  neural  activity  to an antecedent or a consequent
event in the outside world.  We do this by repeatedly  presenting  each  of
two or more stimuli to a subject and then demonstrating some invariant con-
tiguity between each stimulus and a  consequent  neural  activity  pattern.
Because  we do not know the metric of the internal computational spaces, we
must collect numerous input-output pairs and rely on statistical invariants
that  emerge from one or another form of ensemble averaging.  For the point
process each ensemble is collected over time from one or more  points,  and
for  the  mean  field  it is collected simultaneously at multiple points in
space in the form of a set of recordings in time.  The distinction is  cru-
cial though subtle.

     I will cite examples from the primary visual cortex and from  the  ol-
factory bulb, a specialized form of sensory cortex that is located close to
the input of the olfactory system.

     The paradigmatic experiment in the pulse mode in olfaction consists in
locating  a  single neuron in the bulb with a microelectrode, presenting in
succession odorants A, B, C,... at the same  or  different  concentrations,
and  measuring  the  pulse firing rate of the neuron.  This is repeated for
neuron i, ii, iii,... at different spatial  locations  in  the  bulb.   The
results are presented in the form of a table, which shows that each odorant
over some concentration range (typically narrow) excites some  neurons  but
not most others, indicating that each odorant establishes a spatial pattern
of selective activation in the bulb, putatively resembling a  constellation
of  stars  in  the night sky, although each neurons typically responds to a
variety of odorants.    This is a form of labeled line coding,  with  pulse
rate or probability as the state variable for each line, channel or axon..

     The paradigmatic experiment in the wave mode is to  record  the  elec-
troencephalogram  (EEG)  from  an  array  of macroelectrodes (optimally 64)
placed on the surface of the bulb.  All of the simultaneously recorded  EEG
traces contain a common waveform or carrier that differs across the spatial
array in amplitude.  Odorant-specific patterns of amplitude i, ii,  iii,...
are  seen to recur on presentation of odorants A, B, C,..., but only if the
subjects are trained to discriminate them each from the others and only  if
they are motivated to do so (Skarda & Freeman, 1987).  Learning and arousal
are both essential.  The odorant information is expressed as a spatial  am-
plitude  modulation  of  the common carrier for the duration of a sniff, on
the order of 0.1 sec.  It can be likened to a monochromatic half-tone  pic-
ture in a newspaper.  The information density is spatially uniform, because
no one dot in the picture carries by its size any more or less  information
than  any  other.   The  carrier is identified by making a spatial ensemble
average of the 64 traces that are recorded during a sniff and then regress-
ing  this ensemble average onto each unaveraged EEG trace to derive its am-
plitude coefficient.  One cannot use time ensemble averaging  over  sniffs,
because the spectrum of the carrier and its phase relations to the initiat-
ing stimulus vary unpredictably across inhalations.  The result of measure-
ment is a 64x1 vector that expresses the spatial pattern of amplitude.

     These two kinds of information, pulse and wave, coexist in  each  area
of  the bulb, and in other stages of the olfactory system as well.  Whether
one sees the one or the other kind depends on the  experimental  procedures
that one uses, which in turn depends on one's goals and hypotheses.

     Comparable results hold for the primary visual cortex.  The well-known
paradigm  in the pulse mode is to measure the pulse rate of a single neuron
while repeatedly presenting patterned light stimuli to the retina so as  to
define its receptor field.  This is repeated for a large number of neurons,
and the results are presented in the form of  graphs  showing  the  spatial
structures  of orientation and ocular dominance columns and the topographic
mapping of the retina onto the dozen or more specialized  subareas  of  the
visual cortex for color, motion detection, etc.  The inference is made that
"features" of the visual world are extracted and mapped spatially onto  the
cortex in the firing rates of labeled lines.  The information is said to be
encoded in the pulse trains of single neurons.

     Activity in the wave mode is likewise  recorded  with  arrays  of  ma-
croelectrodes  on  the visual cortex of an awake, motivated, trained Rhesus
monkey (Freeman & van Dijk, 1987).  A common carrier is retrieved from  the
EEG traces by linear decomposition, and its spatial pattern is expressed in
the matrix of coefficients that are obtained by fitting each trace  to  the
spatial  ensemble average.  A specific, identifiable spatial pattern of am-
plitude modulation is found to recur on each trial when the motivated  sub-
ject  is inferred to be discriminating a specific visual cue.  This is evi-
dence for distributed coding of information similar to the wave mode of ol-
factory coding.  Evidence for this mode of activity has also been found re-
cently in the visual cortex of the cat (Gray & Singer, 1988).

     Again, it is apparent that these two kinds of information in the pulse
and wave modes coexist in the cortex.

                Neural mechanisms of transformation

     In this section I will consider the mechanisms by which  the  discrete
activity  in the pulse mode is transformed into the wave mode and then back
again.  In doing so I will draw on experiments in software (Freeman &  Yao,
1988)  and hardware (Freeman, Eisenberg & Burke, 1987) modeling of cortical
dynamics.  I will argue that the activity patterns in the pulse  mode  con-
stitute  the  end  result of stimulus analysis by neural preprocessing, and
that the patterns in the wave mode manifest the results of spatial integra-
tion  of  the  pulse activity with past experience and present motivational
state.

     The conversion from pulses to waves takes place  at  synapses.   There
are  many kinds and locations of modifiable synapses, two of which are par-
ticularly important for information processing.  One type  is  the  primary
synapse  between  an  incoming  axon and its target cortical neuron.  It is
subject to change in respect to recent use by nonspecific  facilitation  or
by attenuation in respect to the local volume of input into a neighborhood.
Attenuation is  a  multiplicative  form  of  inhibition  that  operates  in
processes of dynamic range compression and signal normalization.

     The other type supports the long  range  excitatory  connections  that
form  innumerable  feedback  loops  of  mutual excitation within a cortical
layer.  These secondary synapses among  cortical  neurons  are  subject  to
change  in  respect to associative learning in accordance with some variant
of the Hebb rule.  The matrix of  numbers  representing  the  strengths  of
synaptic  action corresponds to the W matrix of  Amari (1977) and the T ma-
trix of Hopfield (1982).  When an animal is trained to discriminate an odor
A, B, C,... a Hebbian nerve cell assembly is formed among the cortical neu-
rons by the strengthened synaptic connections between each  pair  of  coac-
tivated neurons (Freeman, 1968, 1975).  This nerve cell assembly is a basis
for the classification of odorants by trained subjects (Skarda  &  Freeman,
1987).

     Conversion of waves to pulses occurs  at  trigger  zones  of  neurons,
where  the sum of dendritic currents regulates the firing rate of each neu-
ron.  The relation between membrane current  and  pulse  density,  both  of
which  are  continuous  variables,  has  the  form of a sigmoid.  The range
between threshold (zero firing rate) and asymptotic maximum  is  much  nar-
rower  than  a comparable input-output sigmoid relation at the synapses, so
that as a general rule the pulse-wave conversion takes place  in  a  small-
signal  near-linear range.  It follows from this and related considerations
that the operation of the local neighborhood can be expressed as  a  linear
time-invariant  integrator  cascaded  with a static nonlinear bilateral sa-
turation function (Freeman, 1967, 1968, 1975).

     An important feature that distinguishes the  biological  sigmoid  from
its  neural  network  cousins  is the finding that the maximal slope of the
curve and thereby the maximal gain of the local ensemble  is  displaced  to
the  excitatory  side  (Freeman, 1979).  Input not only excites neurons, it
increases their forward gains.  Furthermore, the slope of the curve is  in-
creased with factors that increase arousal and motivation in animals.  When
a stimulus is given to which a subject has been sensitized  by  discrimina-
tive  training,  so  that  a  nerve  cell assembly has been formed, the re-
excitation within the assembly is enhanced by both the input-dependent non-
linearity  and  by arousal.  The result is regenerative feedback in a high-
gain system that leads to instability.  The large collection  of  intercon-
nected  and interactive ensembles undergoes a state transition from a pres-
timulus state to a stimulated state.

     The state prior to entry of input is low-amplitude  and  low-gain,  so
that  neurons not interacting strongly with each other are free to react to
input on incoming lines.  When sufficient input arrives to one  or  another
nerve  cell assembly in an aroused subject, the amplitude and gain both in-
crease, and the neurons strongly interact with each other.  In this  highly
interactive  state  the information that each received during the preceding
input state is disseminated rapidly over the entire extent of  an  interac-
tive  area  of  neurons,  apparently within a few milliseconds over regions
comprising many square mm or cm of surface area and many millions  of  neu-
rons.   The  spatial density of the information becomes uniform, just as it
does in a 2-dimensional Fourier transform of a visual scene.

     Simulations of these transitions have shown that the input information
in  the pulse mode is not degraded or lost in the conversion from the pulse
mode to the wave mode.  The point-wise input is mapped  under  spatial  in-
tegration  into a distribution of spatial activity that introduces the past
experience of the subject through the nerve cell assemblies and the present
state  of  expectation  embodied  in  the  factors  relating to arousal and
motivation, that is, brain state with respect to future action.

     When it is read out, both in the brain and in the models that simulate
the  process, the output is coarse-grained at the surface and summed in ab-
solute value or by squaring to give a positive quantity  at  each  channel,
which re-establishes the pulse mode at the input to the next stage.  In the
process of coarse-graining the output from the preceding stage is "cleaned"
under  spatiotemporal  integration  to  attenuate  all activity that is not
shared by the entire transmitting array of neurons.  Only  the  cooperative
activity that is shared by the whole is successfully injected into the next
succeeding stage.  This completes the inverse transformation  back  to  the
pulse mode.

     In brief, input on labeled lines that is injected by  axons  into  the
cortex  can  destabilize  the neural mass, depending on past experience and
present motivation, and the cortex can converge to a distributed pattern of
fluctuating activity that expresses that confluence of stimulus, experience
and expectation.  The key to understanding is  the  state  transition  that
changes  the  properties of the cortex and extends the information from the
local to the distributed mode.  It is an input-induced  transition  from  a
low-energy disordered state to a high-energy more ordered state (Prigogine,
1984; Skarda & Freeman, 1987).

     Transmission outside of the cortex has not yet  been  studied,  but  I
postulate  that  similar state transitions may occur in subcortical masses,
so that with each transfer of information from one brain mass  to  another,
there  is injection of information on labeled lines, transition to integra-
tion in the wave mode, and reconversion of the integrand to a labeled  line
pattern on the output channels, the last neural output being the discharges
of motor neurons in the brainstem and spinal cord.

                  Implications for neural networks

     The current theory and practice of neural  networks  has  incorporated
many  of  the  important  features of the static design of nervous systems,
particularly those based on parallel feedforward nets,  but  has  neglected
the  dynamics  of real nervous systems in favor of unrealistic abstractions
that do not do justice to the ceaseless fluctuations  of  neural  activity.
These  are  commonly  designated  as "noise" and removed by stimulus-locked
time ensemble averaging in order to impose the ideal of an invariant  base-
line  that  precedes the stimulus arrival, or to which the system converges
as it "learns" or "perceives".

     It is this artificial creation of a stable  equilibrium  to  represent
the  desired  state of neural networks, the incorporation of it as a design
criterion, that has crippled their performance.  Wet nervous systems do not
have  equilibria except under deep anesthesia, surgical isolation, or near-
terminal damage of one kind or another.  These reduced states are  in  fact
useful for measuring the open loop time and space constants of parts of wet
nervous systems, but clearly, and in the case of general anesthesia by  de-
finition,  there is no information processing in such reduced systems.  In-
stead, nervous systems appear to be designed by evolution to  be  destabil-
ized  by  input.  They seek input as a means of inducing state changes that
disseminate and integrate fresh input with past experience as the basis for
impending action.

     There is no intrinsic hardware or  software  barrier  to  constructing
neural  networks  that  have  the properties of effecting these transforma-
tions.  We have demonstrated the principles by which they occur  in  simple
systems, that are built with well-known components and algorithms, that are
used in novel ways as dictated by the theory and the correspondance to per-
formance  to  the wetware (Freeman, Eisenburg & Burke, 1987; Freeman & Yao,
1988).  The key attributes are the biological sigmoid curve,  the  associa-
tional  connectivity that is subject to modification by learning, the vari-
able global gain under motivational factors,  and,  most  importantly,  the
ability  to change from a low-level receiving state to high-level transmit-
ting state.

     In the low-level state the input injects information into the  system.
In  the  high-level  state induced by the input in the prepared system, the
information is integrated, globally distributed, and  incorporated  into  a
novel  form  of display.  The forms of spatial integration of the output in
the wetware brain are not yet known.  We replace them  with  a  simple  Eu-
clidean  distance  measure  in  n-space,  where n is the number of channels
simulated each with its amplitude of output of  the  common  carrier.   The
models show robust abilities for rapid classification of input into learned
categories despite the presence of noise,  incomplete  inputs,  overlap  of
templates and component variability in the case of the hardware embodiment.

        In these models the input on labeled lines induces a global  oscil-
lation  by  a  state  transition  corresponding to a Hopf bifurcation.  The
classification is performed by the use of a Euclidean distance  measure  in
64-space.  The output is by step functions on labeled lines from a decision
function operating on the distributed pattern in the wave  mode.    Conver-
gence  to a pattern depends on the input and not on the initial conditions.
Classification succeeds well before  asymptotic  convergence  to  a  steady
state.   Thereby  the  frame  rate  for successive input samples can exceed
10/sec, so that a fluctuating and unpredictable environment can be  tracked
by the rapidly adapting device.

References

Amari S (1977a) Neural theory of association and concept-formation.
        Biological Cybernetics 26: 175-185.
Amari S (1977b) Dynamics of pattern formation in lateral-inhibition
        type neural fields.  Biological Cybernetics 27: 77-87.
Freeman WJ (1967) Analysis of function of cerebral cortex by use
        of control systems theory.  Logistics Review 1: 5-40.
Freeman WJ (1968) Analog simulation of prepyriform cortex in the
        cat.  Mathematical Biosciences 2: 181-190.
Freeman WJ (1975) Mass Action in the Nervous System. Academic Press,
        New York.
Freeman WJ (1979) Nonlinear dynamics of paleocortex manifested in
        the olfactory EEG.  Biological Cybernetics 35: 21-37.
Freeman WJ, Eisenberg J & Burke B (1987) Hardware simulation of
        dynamics in learning: the SPOCK.  Proceedings 1st Int. Conf.
        Neural Networks San Diego III: 435-442.
Freeman WJ & van Dijk B (1987) Spatial patterns of visual cortical
        fast EEG during conditioned reflex in a rhesus monkey.
        Brain Research 422: 267-276.
Freeman WJ, Yao Y & Burke B (1988) Central pattern generating and
        recognizing in olfactory bulb: a correlation rule.  Neural
        Networks, in press.
Gray CM & Singer W (1988) Nonlinear cooperativity mediates oscillatory
        responses in orientation columns of cat visual cortex.
        Submitted.
Hopfield JJ (1982) Neural networks and physical systems with emergent
        collective computational abilities.  Proc. Nat'l. Acad.
        Sci. USA 79: 3088-3092.
Prigogine I (1984) From Being to Becoming.  Freeman, New York.
Skarda CA & Freeman WJ (1987) How brains make chaos in order to
        make sense of the world.  Brain & Behavioral Sciences 10:
        161-195.

Supported by grants MH06686 from the National Institute of Mental
Health and 87NE129 from the Air Force Office of Scientific Research.