No subject
wfreeman%garnet.Berkeley.EDU@violet.berkeley.edu
wfreeman%garnet.Berkeley.EDU at violet.berkeley.edu
Fri Apr 29 23:58:16 EDT 1988
To: connectionists 29 apr 88
From: wfreeman at garnet
Re: a physiologist's view of connectionism
i'd like some feedback on this essay before it gets frozen into print
and invite commentary of any sort. thanks in advance, walter
Why neural networks don't yet fly: inquiry into the
neurodynamics of biological intelligence.
Walter J Freeman
Department of Physiology-Anatomy
University of California
Berkeley CA 94720 USA
2nd Annual Intern. Conf. on Neural Networks
San Diego CA 23 - 27 July 1988
Abstract
Sensory and perceptual information exists as
space-time patterns of neural activity in cortex in two
modes: axonal pulses and dendritic currents. Which one we
observe depends on the experimental techniques we choose
in order to make our observations. The brain does its
analysis of sensory input, as in feature extraction and
preprocessing, in the medium of action potentials as point
processes in networks of individual neurons. It does syn-
thesis of its sensory input with past experience and ex-
pectancy of future action in the medium of dendritic in-
tegration in local mean fields. Both kinds of activity
are found to coexist in olfactory and visual cortex, each
preceding and then following the other. The transforma-
tion of information from the pulse mode to the dendritic
mode involves a state transition of the cortical network
that can be modeled by a Hopf bifurcation in both software
and hardware embodiments. This state transition appears
to be an essential precursor to an act of neural pattern
classification. However, the models suggest that the
classification of a given stimulus into one of several
learned classes is done by a mapping of the stimulus into
a landscape that has been shaped by prior learning, and
that it is not done by a multiple bifurcation into one of
a collection of limit cycle attractors at the moment of
choice.
Introduction
The strongest justification at present for the study of neural net-
works is the inspiration they draw from the performance characteristics of
their biological cousins. Yet it is often unclear what is to be copied and
what omitted. John Denker among others has pointed out that both birds
and airplanes have wings, but that only birds have feathers. While it is
true that brains and neural networks share certain structural features such
as massive parallelism, biological networks solve complex problems easily
and creatively, and existing neural networks do not. Whereas Wilbur and
Orville Wright solved first the problems of lift and then of control in
flight, neural networkers have solved only the problems of statics and not
the problems of dynamic control. Neural networks have not yet begun to
soar.
One reason I will argue here is that most theoreticians have pursued a
false goal of stability, and have not reckoned with the intrinsic instabil-
ity of wetware brains that enables their remarkable adaptiveness.
A related key limitation in many current approaches is the lack of
application by engineers of the hierarchical modes in which wetware brains
sustain information for storage, transformation, and other operations.
There are two documented modes that in some senses are diametrically op-
posed but in other senses are strongly complementary. Probably others ex-
ist, but they need not concern us here. One is typified by the action po-
tential and the point process, the other by the synaptic potential and the
local mean field. In sensory systems the one is the basis for feature ex-
traction, preprocessing and analysis. The other is the basis for experien-
tial integration, classification and synthesis. Neither can supplant or
function without the other. They coexist in the same layers of neurons,
and whether we observe one or the other depends on how we acquire, process
and measure our biological data.
My aim in this brief review is to exemplify these two modes of infor-
mation, describe how they are derived from brains and how they are convert-
ed each to the other, and explain their significance for the design of new
and more successful neural networks.
Examples of biological information
Information in biological networks of the kind I am concerned with
here takes the form of space-time neural activity patterns. Each pattern
is relational and neither symbolic nor representational. It does not
"stand for" something outside the brain, as a letter does in an alphabet,
nor does it reside in fixed form as a goal or a "teacher". It is a dynamic
process that mediates adaptive behavior. It results from a stimulus and in
some sense causes a response, but it also incorporates past experience and
the intent of future action. These being private and unique to each brain,
we cannot in principle as observers know the exact information content of
each pattern or even the coordinate system in which it is embedded.
What we can do is to establish statistically the relation of a given
space-time pattern of neural activity to an antecedent or a consequent
event in the outside world. We do this by repeatedly presenting each of
two or more stimuli to a subject and then demonstrating some invariant con-
tiguity between each stimulus and a consequent neural activity pattern.
Because we do not know the metric of the internal computational spaces, we
must collect numerous input-output pairs and rely on statistical invariants
that emerge from one or another form of ensemble averaging. For the point
process each ensemble is collected over time from one or more points, and
for the mean field it is collected simultaneously at multiple points in
space in the form of a set of recordings in time. The distinction is cru-
cial though subtle.
I will cite examples from the primary visual cortex and from the ol-
factory bulb, a specialized form of sensory cortex that is located close to
the input of the olfactory system.
The paradigmatic experiment in the pulse mode in olfaction consists in
locating a single neuron in the bulb with a microelectrode, presenting in
succession odorants A, B, C,... at the same or different concentrations,
and measuring the pulse firing rate of the neuron. This is repeated for
neuron i, ii, iii,... at different spatial locations in the bulb. The
results are presented in the form of a table, which shows that each odorant
over some concentration range (typically narrow) excites some neurons but
not most others, indicating that each odorant establishes a spatial pattern
of selective activation in the bulb, putatively resembling a constellation
of stars in the night sky, although each neurons typically responds to a
variety of odorants. This is a form of labeled line coding, with pulse
rate or probability as the state variable for each line, channel or axon..
The paradigmatic experiment in the wave mode is to record the elec-
troencephalogram (EEG) from an array of macroelectrodes (optimally 64)
placed on the surface of the bulb. All of the simultaneously recorded EEG
traces contain a common waveform or carrier that differs across the spatial
array in amplitude. Odorant-specific patterns of amplitude i, ii, iii,...
are seen to recur on presentation of odorants A, B, C,..., but only if the
subjects are trained to discriminate them each from the others and only if
they are motivated to do so (Skarda & Freeman, 1987). Learning and arousal
are both essential. The odorant information is expressed as a spatial am-
plitude modulation of the common carrier for the duration of a sniff, on
the order of 0.1 sec. It can be likened to a monochromatic half-tone pic-
ture in a newspaper. The information density is spatially uniform, because
no one dot in the picture carries by its size any more or less information
than any other. The carrier is identified by making a spatial ensemble
average of the 64 traces that are recorded during a sniff and then regress-
ing this ensemble average onto each unaveraged EEG trace to derive its am-
plitude coefficient. One cannot use time ensemble averaging over sniffs,
because the spectrum of the carrier and its phase relations to the initiat-
ing stimulus vary unpredictably across inhalations. The result of measure-
ment is a 64x1 vector that expresses the spatial pattern of amplitude.
These two kinds of information, pulse and wave, coexist in each area
of the bulb, and in other stages of the olfactory system as well. Whether
one sees the one or the other kind depends on the experimental procedures
that one uses, which in turn depends on one's goals and hypotheses.
Comparable results hold for the primary visual cortex. The well-known
paradigm in the pulse mode is to measure the pulse rate of a single neuron
while repeatedly presenting patterned light stimuli to the retina so as to
define its receptor field. This is repeated for a large number of neurons,
and the results are presented in the form of graphs showing the spatial
structures of orientation and ocular dominance columns and the topographic
mapping of the retina onto the dozen or more specialized subareas of the
visual cortex for color, motion detection, etc. The inference is made that
"features" of the visual world are extracted and mapped spatially onto the
cortex in the firing rates of labeled lines. The information is said to be
encoded in the pulse trains of single neurons.
Activity in the wave mode is likewise recorded with arrays of ma-
croelectrodes on the visual cortex of an awake, motivated, trained Rhesus
monkey (Freeman & van Dijk, 1987). A common carrier is retrieved from the
EEG traces by linear decomposition, and its spatial pattern is expressed in
the matrix of coefficients that are obtained by fitting each trace to the
spatial ensemble average. A specific, identifiable spatial pattern of am-
plitude modulation is found to recur on each trial when the motivated sub-
ject is inferred to be discriminating a specific visual cue. This is evi-
dence for distributed coding of information similar to the wave mode of ol-
factory coding. Evidence for this mode of activity has also been found re-
cently in the visual cortex of the cat (Gray & Singer, 1988).
Again, it is apparent that these two kinds of information in the pulse
and wave modes coexist in the cortex.
Neural mechanisms of transformation
In this section I will consider the mechanisms by which the discrete
activity in the pulse mode is transformed into the wave mode and then back
again. In doing so I will draw on experiments in software (Freeman & Yao,
1988) and hardware (Freeman, Eisenberg & Burke, 1987) modeling of cortical
dynamics. I will argue that the activity patterns in the pulse mode con-
stitute the end result of stimulus analysis by neural preprocessing, and
that the patterns in the wave mode manifest the results of spatial integra-
tion of the pulse activity with past experience and present motivational
state.
The conversion from pulses to waves takes place at synapses. There
are many kinds and locations of modifiable synapses, two of which are par-
ticularly important for information processing. One type is the primary
synapse between an incoming axon and its target cortical neuron. It is
subject to change in respect to recent use by nonspecific facilitation or
by attenuation in respect to the local volume of input into a neighborhood.
Attenuation is a multiplicative form of inhibition that operates in
processes of dynamic range compression and signal normalization.
The other type supports the long range excitatory connections that
form innumerable feedback loops of mutual excitation within a cortical
layer. These secondary synapses among cortical neurons are subject to
change in respect to associative learning in accordance with some variant
of the Hebb rule. The matrix of numbers representing the strengths of
synaptic action corresponds to the W matrix of Amari (1977) and the T ma-
trix of Hopfield (1982). When an animal is trained to discriminate an odor
A, B, C,... a Hebbian nerve cell assembly is formed among the cortical neu-
rons by the strengthened synaptic connections between each pair of coac-
tivated neurons (Freeman, 1968, 1975). This nerve cell assembly is a basis
for the classification of odorants by trained subjects (Skarda & Freeman,
1987).
Conversion of waves to pulses occurs at trigger zones of neurons,
where the sum of dendritic currents regulates the firing rate of each neu-
ron. The relation between membrane current and pulse density, both of
which are continuous variables, has the form of a sigmoid. The range
between threshold (zero firing rate) and asymptotic maximum is much nar-
rower than a comparable input-output sigmoid relation at the synapses, so
that as a general rule the pulse-wave conversion takes place in a small-
signal near-linear range. It follows from this and related considerations
that the operation of the local neighborhood can be expressed as a linear
time-invariant integrator cascaded with a static nonlinear bilateral sa-
turation function (Freeman, 1967, 1968, 1975).
An important feature that distinguishes the biological sigmoid from
its neural network cousins is the finding that the maximal slope of the
curve and thereby the maximal gain of the local ensemble is displaced to
the excitatory side (Freeman, 1979). Input not only excites neurons, it
increases their forward gains. Furthermore, the slope of the curve is in-
creased with factors that increase arousal and motivation in animals. When
a stimulus is given to which a subject has been sensitized by discrimina-
tive training, so that a nerve cell assembly has been formed, the re-
excitation within the assembly is enhanced by both the input-dependent non-
linearity and by arousal. The result is regenerative feedback in a high-
gain system that leads to instability. The large collection of intercon-
nected and interactive ensembles undergoes a state transition from a pres-
timulus state to a stimulated state.
The state prior to entry of input is low-amplitude and low-gain, so
that neurons not interacting strongly with each other are free to react to
input on incoming lines. When sufficient input arrives to one or another
nerve cell assembly in an aroused subject, the amplitude and gain both in-
crease, and the neurons strongly interact with each other. In this highly
interactive state the information that each received during the preceding
input state is disseminated rapidly over the entire extent of an interac-
tive area of neurons, apparently within a few milliseconds over regions
comprising many square mm or cm of surface area and many millions of neu-
rons. The spatial density of the information becomes uniform, just as it
does in a 2-dimensional Fourier transform of a visual scene.
Simulations of these transitions have shown that the input information
in the pulse mode is not degraded or lost in the conversion from the pulse
mode to the wave mode. The point-wise input is mapped under spatial in-
tegration into a distribution of spatial activity that introduces the past
experience of the subject through the nerve cell assemblies and the present
state of expectation embodied in the factors relating to arousal and
motivation, that is, brain state with respect to future action.
When it is read out, both in the brain and in the models that simulate
the process, the output is coarse-grained at the surface and summed in ab-
solute value or by squaring to give a positive quantity at each channel,
which re-establishes the pulse mode at the input to the next stage. In the
process of coarse-graining the output from the preceding stage is "cleaned"
under spatiotemporal integration to attenuate all activity that is not
shared by the entire transmitting array of neurons. Only the cooperative
activity that is shared by the whole is successfully injected into the next
succeeding stage. This completes the inverse transformation back to the
pulse mode.
In brief, input on labeled lines that is injected by axons into the
cortex can destabilize the neural mass, depending on past experience and
present motivation, and the cortex can converge to a distributed pattern of
fluctuating activity that expresses that confluence of stimulus, experience
and expectation. The key to understanding is the state transition that
changes the properties of the cortex and extends the information from the
local to the distributed mode. It is an input-induced transition from a
low-energy disordered state to a high-energy more ordered state (Prigogine,
1984; Skarda & Freeman, 1987).
Transmission outside of the cortex has not yet been studied, but I
postulate that similar state transitions may occur in subcortical masses,
so that with each transfer of information from one brain mass to another,
there is injection of information on labeled lines, transition to integra-
tion in the wave mode, and reconversion of the integrand to a labeled line
pattern on the output channels, the last neural output being the discharges
of motor neurons in the brainstem and spinal cord.
Implications for neural networks
The current theory and practice of neural networks has incorporated
many of the important features of the static design of nervous systems,
particularly those based on parallel feedforward nets, but has neglected
the dynamics of real nervous systems in favor of unrealistic abstractions
that do not do justice to the ceaseless fluctuations of neural activity.
These are commonly designated as "noise" and removed by stimulus-locked
time ensemble averaging in order to impose the ideal of an invariant base-
line that precedes the stimulus arrival, or to which the system converges
as it "learns" or "perceives".
It is this artificial creation of a stable equilibrium to represent
the desired state of neural networks, the incorporation of it as a design
criterion, that has crippled their performance. Wet nervous systems do not
have equilibria except under deep anesthesia, surgical isolation, or near-
terminal damage of one kind or another. These reduced states are in fact
useful for measuring the open loop time and space constants of parts of wet
nervous systems, but clearly, and in the case of general anesthesia by de-
finition, there is no information processing in such reduced systems. In-
stead, nervous systems appear to be designed by evolution to be destabil-
ized by input. They seek input as a means of inducing state changes that
disseminate and integrate fresh input with past experience as the basis for
impending action.
There is no intrinsic hardware or software barrier to constructing
neural networks that have the properties of effecting these transforma-
tions. We have demonstrated the principles by which they occur in simple
systems, that are built with well-known components and algorithms, that are
used in novel ways as dictated by the theory and the correspondance to per-
formance to the wetware (Freeman, Eisenburg & Burke, 1987; Freeman & Yao,
1988). The key attributes are the biological sigmoid curve, the associa-
tional connectivity that is subject to modification by learning, the vari-
able global gain under motivational factors, and, most importantly, the
ability to change from a low-level receiving state to high-level transmit-
ting state.
In the low-level state the input injects information into the system.
In the high-level state induced by the input in the prepared system, the
information is integrated, globally distributed, and incorporated into a
novel form of display. The forms of spatial integration of the output in
the wetware brain are not yet known. We replace them with a simple Eu-
clidean distance measure in n-space, where n is the number of channels
simulated each with its amplitude of output of the common carrier. The
models show robust abilities for rapid classification of input into learned
categories despite the presence of noise, incomplete inputs, overlap of
templates and component variability in the case of the hardware embodiment.
In these models the input on labeled lines induces a global oscil-
lation by a state transition corresponding to a Hopf bifurcation. The
classification is performed by the use of a Euclidean distance measure in
64-space. The output is by step functions on labeled lines from a decision
function operating on the distributed pattern in the wave mode. Conver-
gence to a pattern depends on the input and not on the initial conditions.
Classification succeeds well before asymptotic convergence to a steady
state. Thereby the frame rate for successive input samples can exceed
10/sec, so that a fluctuating and unpredictable environment can be tracked
by the rapidly adapting device.
References
Amari S (1977a) Neural theory of association and concept-formation.
Biological Cybernetics 26: 175-185.
Amari S (1977b) Dynamics of pattern formation in lateral-inhibition
type neural fields. Biological Cybernetics 27: 77-87.
Freeman WJ (1967) Analysis of function of cerebral cortex by use
of control systems theory. Logistics Review 1: 5-40.
Freeman WJ (1968) Analog simulation of prepyriform cortex in the
cat. Mathematical Biosciences 2: 181-190.
Freeman WJ (1975) Mass Action in the Nervous System. Academic Press,
New York.
Freeman WJ (1979) Nonlinear dynamics of paleocortex manifested in
the olfactory EEG. Biological Cybernetics 35: 21-37.
Freeman WJ, Eisenberg J & Burke B (1987) Hardware simulation of
dynamics in learning: the SPOCK. Proceedings 1st Int. Conf.
Neural Networks San Diego III: 435-442.
Freeman WJ & van Dijk B (1987) Spatial patterns of visual cortical
fast EEG during conditioned reflex in a rhesus monkey.
Brain Research 422: 267-276.
Freeman WJ, Yao Y & Burke B (1988) Central pattern generating and
recognizing in olfactory bulb: a correlation rule. Neural
Networks, in press.
Gray CM & Singer W (1988) Nonlinear cooperativity mediates oscillatory
responses in orientation columns of cat visual cortex.
Submitted.
Hopfield JJ (1982) Neural networks and physical systems with emergent
collective computational abilities. Proc. Nat'l. Acad.
Sci. USA 79: 3088-3092.
Prigogine I (1984) From Being to Becoming. Freeman, New York.
Skarda CA & Freeman WJ (1987) How brains make chaos in order to
make sense of the world. Brain & Behavioral Sciences 10:
161-195.
Supported by grants MH06686 from the National Institute of Mental
Health and 87NE129 from the Air Force Office of Scientific Research.
More information about the Connectionists
mailing list