No subject

Mon Jun 5 16:42:55 EDT 2006

In response to your invitation for comments on
cogpsy at coglab.psy.soton.ac.uk I have the following :

I believe it is past time to reconsider the foundations.  The
critical deficiency in current connectionism is lack of
understanding of the meaning of the term "system architecture".
Any system which performs a complex function using large numbers
of components experiences very severe constraints on its system
architecture if building, repairing or adding features to the
system are to be feasible. Current electronic systems are in
production which use billions of individual transistors.  Such
systems must have a simple functional architecture which can (for
example) relate failures experienced at the high functional level
to defects occurring at the device level. The reason the von
Neumann architecture is ubiquitous for electronic systems  is
that it provides the means for such a simple relationship. It
achieves this by partitioning functionality into consistent
elements at a number of levels of detail. The consistent element
is the instruction, and instruction based descriptions can be
seen at every level. Typical levels include device (instruction
'open gate');  assembly code (instruction 'jump'); software
(instruction 'do:[  ]');  procedure call (instruction 'disconnect
x'); through features (instruction 'carry out Y') to major system
functions (instruction 'periodically test for problems') and
overall function.

The requirement for a function to operate within a simple
architecture is crucial. To illustrate, if I needed to design a
function to connect telephones together, many designs would be
possible, and would carry out the function efficiently, in some
cases more efficiently than designs actually in use. However, the
vast majority of those designs would be useless once it was
necessary for them to interact with and support functions like
testing (a connection did not work, which component is
defective ?) or billing (who should be charged how much for
that call ?) or adding features (allow recipient of call to
identify caller).  Conclusions drawn about human cognition from
a simulation which performs (for example) face recognition
without considering how that function would fit within the total
cognitive/behavioral system are almost certainly invalid.

In my 1990 book I argued that although the brain obviously did
not have a von Neumann architecture, very similar pressures
exist for a simple functional architecture (for example, the
need to build many copies from DNA 'blueprints'). I went on to
develop a new and complete system architecture based on an
element of functionality appropriate to the brain, the pattern
extraction/action recommendation element.  This architecture
separates cognition into a few major functions, which can in
turn be partitioned further all the way down to neurons. The
functionality required in neurons is defined by the functional
partitioning at a higher level, and that functional partitioning
is in turn constrained by the information available to individual
neurons, the kind of changes which can be made in neuron
connectivity, and the timescale of such changes. (see a couple of
1997 papers).

This architecture shows phenomena which bear a remarkable
resemblance to human brain phenomena, including unguided
learning by categorization generating declarative memory;
dream sleep;  procedural memory;  emotional arousal;  and
even internally generated image sequences. All these phenomena
play a functional role in generating behavior from sensory
input, and have also been demonstrated by electronic simulation
(Coward 1996).

My response to the questions to be posed to the panelists
would be:

1.  Should memory  be used for learning? Is memoryless learning
an unnecessary restriction on learning algorithms?

In the pattern extraction hierarchy architecture,  which appears
to me to be the only option other than the obviously inapplicable
von Neumann,  one major type of learning (associated with a major
separation in the architecture) is the process of sorting
experience into categories and associating behaviors with those
categories. A category is established by extracting and recording
a set of patterns from one unfamiliar object, and developed by
adding patterns extracted from any subsequent object which
contains many of the patterns already in the category. Memory of
objects is thus the prerequisite to this type of learning,
which is associated with the cortex. Memoryless learning occurs
in other major functions and is an appropriate model in those
functions.

2.  Is local learning a sensible idea? Can better learning
algorithms be developed without this restriction?

The real issues here are first to identify what information
could feasibly be made available to a neuron (e.g.   past
firing of the neuron itself;  correlated firing of neurons
in its neighborhood;  correlated firing between the neuron and
another neuron;  correlated firing within a separate functional
group of neurons;  feedback from pleasure or pain;  or feedback
of some expected result).  The second issue is to identify the
nature of the feasible changes to the neuron which could be
produced (e.g.  assignment or removal of a neuron;  addition or
deletion of an input;  correlated addition or deletion of a set
of inputs;  changes in relative strength of inputs;  correlated
changes in the strength of a set of inputs;  general change in
effective input strengths (i.e. threshold change);  how long a
change lasts).  Only after these qualitative factors have been
defined by higher functional requirements can quantitative
algorithms can be developed.

3.  Who designs the network inside an autonomous learning system
such as the brain?

Within the pattern extraction hierarchy architecture it is
possible to start from random connectivity and sort experienced
objects into categories without guidance or feedback.

References:

Coward L.A. (1990),   'Pattern Thinking',  New York:  Praeger
(Greenwood)

Coward L.A. (1996),  'Understanding of Consciousness through
Application of Techniques for Design of Extremely Complex
Electronic Systems'  Towards a Science of Consciousness , Tucson,
Arizona.

Coward L.A. (1997),  'Unguided Categorization, Direct and
Symbolic Representation, and Evolution of Cognition in a
Modified Connectionist Theory',  to be published in Proceedings
of the Conference on New Trends in Cognitive Science, Austria
1997.

Coward L.A. (1997), 'The Pattern Extraction Architecture: a
Connectionist Alternative to the Von Neumann Architecture',
to be published in the Proceedings of International Workshop
on Artificial and Natural Neural Networks,  Canary Islands 1997.

================================================================

An additional note from Prof. Andrew Coward
<ANDREW.COWARD.0082863 at nt.com>

I believe the brain has a functional architecture which I label
the pattern extraction hierarchy. At the highest level,
functionality separates into five major functions. The first
extracts constant patterns from the environment (e.g. object
color independent of illumination). The second allows the set
of patterns which have been extracted from one object to enter
the third function. The third function  generates a set of
alternative behavioral recommendations with respect to the
selected object. The fourth function selects one (or perhaps
none) of the alternatives to proceed to action, and the fifth
function implements the action. This functional separation can
be observed in the major physiology (roughly, the primary
sensory cortex, the thalamus, the cortex,  the basal ganglia,
and the cerebellum).  There are of course levels of detail and
complexity below this.  Within each function, the needs of that
function determine the functionality required from neurons
within the function, subject to what neuron functionality is
possible (which in turn is one factor forcing the use of the
architecture).

I mentioned in the earlier note that functional partitioning is
constrained by the possible neuron functionality given
limitations in the areas of the information available to
individual neurons, the kind of changes which can be made in
neuron connectivity, and the timescale of such changes.
Expanding on this somewhat:

The source of information which controls changes to extracted
pattern could be: feedback from comparison with expected result; 
feedback from pleasure or pain;  past firing of neuron itself;
correlated firing of neurons in neighbourhood;  correlated
firing between neuron and another neuron;  correlated firing
within a separate functional group of neurons

The nature of the changes to the extracted pattern produced
in the neuron could be:  assignment or removal of a neuron; =0Aaddition or =
deletion of an input;  correlated addition or
deletion of a set of inputs;  changes in relative strength of
inputs;  correlated changes in the strength of a set of inputs; =0Ageneral =
change in effective input strengths (i.e. threshold
change);  changes in sensitivity to other parameters.

The permanence of the changes to the extracted pattern could be:
change only at time source of information is present;  change
for limited time following source of information being present;
change for long period following source of information being
present.

In each high level function, the particular combination of
information used, changes which occur, and timescale is dictated
by the type of high level functionality. For example, in the
behavioral alternative generation region the changes which are
required include assignment of neurons, biased random assignment
of inputs, setting sensitivity to the arousal factor for the
assigned region, deletion of inactive inputs, and threshold
reduction. The sources of information which ultimately control
these changes are correlated firing of neurons in neighbourhood,
correlated firing of neurons in the neighbourhood with firing
of an input, correlated firing of a separate functional group
of neurons, and firing of the neuron itself.  Timescale is short
and long for different operations. Each type of change has an
associated source(s) of information and timescale.  This
combination of functionality at the neuron level gives rise
to all the phenomena of declarative memory at the higher level,
including dream sleep.

A different combination of neuron parameters is required in the
behavioral alternative selection function, and gives rise to
learning which is not declarative. In this function, pleasure
and pain act on recently firing neurons to modulate the ability
of similar firing in the future to gain control of action. I
apply the term 'memoryless' to this learning because no record
of prior states is preserved (although the memory of pleasure
and pain may be preserved in the alternative generation function).

I regard the perceptron and even the adaptive resonance neurons
as simplistic in the system sense, although it turns out that
the Hebbian neuron plays an important system role.

The above is a summary of some discussion in the papers which
have been accepted for publication in the proceedings of a
couple of conferences in the next month or so. I could send
copies if you are interested.

I appreciate the opportunity to discuss.

Andrew Coward.

================================================================
APPENDIX:

1997 International Conference on Neural Networks (ICNN'97)
Houston, Texas (June 8 -12, 1997)
----------------------------------------------------------------
Further information on the conference is available on the
conference web page:

http://www.mindspring.com/~pci-inc/ICNN97/
------------------------------------------------------------------
PANEL DISCUSSION ON

"CONNECTIONIST LEARNING: IS IT TIME TO RECONSIDER THE FOUNDATIONS?"

-------------------------------------------------------------------
This is to announce that a panel will discuss the above question
at ICNN'97 on Monday afternoon (June 9). Below is the abstract
for the panel discussion broadly outlining the questions to be
addressed. I am also attaching a slightly modified version of a
subsequent note sent to the panelist. I think the issues are
very broad and the questions are simple. The questions are not
tied to any specific "algorithm" or "network architecture" or
"task to be performed." However, the answers to these simple
questions may have an enormous effect on the "nature of
algorithms" that we would call "brain-like" and for the design
and construction of autonomous learning systems and robots. I
believe these questions also have a bearing on other brain
related sciences such as neuroscience, neurobiology and
cognitive science.

Asim Roy
Arizona State University

-------------------------
PANEL MEMBERS

1. Igor Aleksander
2. Shunichi Amari
3. Eric Baum
4. Jim Bezdek
5. Rolf Eckmiller
6. Lee Giles
7. Geoffrey Hinton
8. Dan Levine
9. Robert Marks
10. Jean Jacques Slotine
11. John G. Taylor
12. David Waltz
13. Paul Werbos
14. Nicolaos Karayiannis (Panel Moderator, ICNN'97 General Chair)
15. Asim Roy

Six of the above members are plenary speakers at the meeting.
-------------------------

PANEL TITLE:

"CONNECTIONIST LEARNING: IS IT TIME TO RECONSIDER THE FOUNDATIONS?"

ABSTRACT

Classical connectionist learning is based on two key ideas.
First, no training examples are to be stored by the learning
algorithm in its memory (memoryless learning). It can use and
perform whatever computations are needed on any particular
training example, but must forget that example before examining
others. The idea is to obviate the need for large amounts of
memory to store a large number of training examples. The second
key idea is that of local learning - that the nodes of a network
are autonomous learners.Local learning embodies the viewpoint
that simple, autonomous learners, such as the single nodes of a
network, can in fact produce complex behavior in a collective
fashion. This second idea, in its purest form, implies a
predefined net being provided to the algorithm for learning,
such as in multilayer perceptrons.

Recently, some questions have been raised about the validity
of these classical ideas. The arguments against classical ideas
are simple and compelling. For example, it is a common fact that
humans do remember and recall information that is provided to
them as part of learning. And the task of learning is
considerably easier when one remembers relevant facts and
information than when one doesn=92t. Second, strict local learning
(e.g. back propagation type learning) is not a feasible idea for
any system, biological or otherwise. It implies predefining a
network "by the system" without having seen a single training
example and without having any knowledge at all of the
complexity of the problem. Again, there is no system that can
do that in a meaningful way. The other fallacy of the local
learning idea is that it acknowledges the existence of a
"master" system that provides the design so that autonomous
learners can learn.

Recent work has shown that much better learning algorithms, in
terms of computational properties (e.g. designing and training
a network in polynomial time complexity, etc.) can be developed
if we don=92t constrain them with the restrictions of classical
learning. It is, therefore, perhaps time to reexamine the ideas
of what we call "brain-like learning."

This panel will attempt to address some of the following
questions on classical connectionists learning:

1.  Should memory  be used for learning? Is memoryless learning
an unnecessary restriction on learning algorithms?
2.  Is local learning a sensible idea? Can better learning
algorithms be developed without this restriction?
3.  Who designs the network inside an autonomous learning system
such as the brain?

-------------------------

A SUBSEQUENT NOTE SENT TO THE PANELIST

The panel abstract was written to question the two pillars of
classical connectionist learning - memoryless learning and pure
local learning. With regards to memoryless learning, the basic
argument against it is that humans do store information
(remember facts/information) in order to learn. So memoryless
learning, as far I understand, cannot be justified by any
behavioral or biological observations/facts. That does not mean
that humans store any and all information provided to them. They
are definitely selective and parsimonious in the choice of
information/facts to collect and store.

We have been arguing that it is the "combination" of memoryless
learning and pure local learning that is not feasible for any
system, biological or otherwise. Pure local learning, in this
context, implies that the system somehow puts together a set of
"local learners" that start learning with each learning example
given to it (e.g. in back propagation) without having seen a
single training example before and without knowing anything
about the complexity of the problem. Such a system can be
demonstrated to do well in some cases, but would not work in
general.

Note that not all existing neural network algorithms are of
this pure local learning type. For example, if I understand
correctly, in constructive algorithms such as ART, RBF,
RCE/hypersphere and others,  a "decision" to create a new node
is made by a "global decision-maker" based on evidence on
performance of the existing system. So there is quite a bit of
global coordination and "decision-making" in those algorithms
beyond the simple "local learning".

Anyway, if we "accept" the idea that memory can indeed be used
for the purpose of learning (Paul Werbos indicated so in one of
his notes), the terms of the debate/discussion change
dramatically. We then open the door to the development of far
more robust and reliable learning algorithms with much nicer
properties than before. We can then start to develop algorithms
that are closer to "normal human learning processes". Normal
human learning includes processes such as (1) collection and
storage of information about a problem, (2) examination of the
information at hand to determine the complexity of the problem,
(3) development of trial solutions (nets)for the problem, (4)
testing of trial solutions (nets), (5)discarding such trial
solutions (nets) if they are not good enough, and (6) repetition
of these processes until an acceptable solution is found. And
these learning processes are implemented within the brain,
without doubt, using local computing mechanisms of different
types. But these learning processes cannot exist without
allowing for storage of information about the problem.

One of the "large" missing pieces in the neural network field
is the definition or characterization of an autonomous learning
system such as the brain. We have never defined the external
behavioral characteristics of our learning algorithms. We have
largely pursued algorithm development from an "internal
mechanisms" point of view (local learning, memoryless learning)
rather than from the point of view of "external behavior or
characteristics" of these resulting algorithms. Some of these
external characteristics of our learning algorithms might be:(1)
the capability to design the net on their own, (2) polynomial
time complexity of the algorithm in design and training of the
net, (3) generalization capability, and (4) learning from as
few examples as possible (quickness in learning).

It is perhaps time to define a set of desirable external
characteristics for our learning algorithms. We need to
define characteristics that are "independent of": (1) a
particular architecture, (2) the problem to be solved
(function approximation, classification, memory, etc.),
(3)local/global learning issues, and (4) issues of whether
to use memory or not to learn. We should rather argue about
these external properties than issues of global/local learning
and of memoryless learning.

With best regards,
Asim Roy
Arizona State University