Registration form and tentative program for conference previously announced.
    B344DSL@UTARLG.UTA.EDU 
    B344DSL at UTARLG.UTA.EDU
       
    Mon Jan 20 20:09:00 EST 1992
    
    
  
TENTATIVE schedule for Optimality Conference, UT Dallas, Feb. 6-8,
1992
ORAL PRESENTATIONS --
Thursday, Feb. 6, AM:
Daniel Levine, U. of Texas, Arlington -- Don't Just Stand There, Optimize Something!
Samuel Leven, Radford U. -- (title to be announced)
Mark Deyong, New Mexico State U. -- Properties of Optimality in Neural Networks
Wesley Elsberry, Battelle Research Labs -- Putting Optimality inits Place: Argument on 
       Context, Systems, and Neural Networks
Graham Tattersall, University of East Anglia -- Optimal Generalisation in Artificial Neural
       Networks 
Thursday, Feb. 6, PM:
Steven Hampson, U. of Cal., Irvine -- Problem Solving in a Connectionist World Model
Ian Parberry, University of North Texas -- (title to be announced)
Richard Golden, U. of Texas, Dallas -- Identifying a Neural Network's Computational Goals:
       a Statistical Optimization Perspective
Arun Jagota, SUNY at Buffalo -- Efficient Optimizing Dynamics in a Hopfield-style network
Friday, Feb. 7, AM:
Gershom Rosenstein, Hebrew University -- For What are Brains Striving?
Gail Carpenter, Boston University -- Supervised Minimax Learning and Prediction of     
Nonstationary Data by Self-Organizing Neural Networks
Stephen Grossberg, Boston University -- Vector Associative Maps: Self-Organizing Neural
       Networks for Error-based Learning, Spatial Orientation, and Sensory-Motor Control
Haluk Ogmen, University of Houston -- Self-Organization via Active Exploration in Robotics
Friday, Feb. 7, PM:
David Stork, Ricoh California Research Center -- Non-optimality in Neurobiological Systems
David Chance, Central Oklahoma University -- Real-time Neuronal Models Examined in a
       Classical Conditioning Network
Samy Bengio, Universit de Montral -- On the Optimization of a Synaptic Learning Rule
Harold Szu, Naval Surface Warfare Center -- Why Do We Study Neural Networks on VLSI
       Chips and Why Are Wavelets More Natural for Brain-Style Computing?
Saturday, Feb. 8, AM:
Karl Pribram, Radford University -- The Least Action Principle: Does it Apply to Cognitive
       Processes?
Herve Abdi, University of Texas, Dallas -- Generalization of the Linear Auto-Associator
Paul Prueitt, Georgetown University -- (title to be announced)
Sylvia Candelaria de Ram, New Mexico State U. -- Interactive Sub-systems of Natural    
Language and the Treatment of Specialized Function
Saturday, Feb. 8, PM:
Panel discussion on the basic themes of the conference
POSTERS
Basari Bhaumik, Indian Inst. of Technology, New Delhi -- A Multilayer Network for      
Determining Subjective Contours
Joachim Buhmann, Lawrence Livermore Labs -- Complexity Optimized Data Clustering by
       Competitive Neural Networks
John Johnson, University of Mississippi -- The Genetic Adaptive Neural Network Training
       Algorithm for Generic Feedforward Artificial Neural Systems
Subhash Kak, Louisiana State U. -- State Generators and Complex Neural Memories
Harold Szu, Naval Surface Warfare Center -- Moving Beyond LMS Energy for Natural       
Classifiers
                 ABSTRACTS RECEIVED SO FAR FOR OPTIMIZATION CONFERENCE
                            (alphabetical by first author):
                             Generalization of the Linear
                                    Auto-Associator
                 Herve Abdi, Dominique Valentin, and Alice J. O'Toole
                             University of Texas at Dallas
       The classical auto-associator can be used to model some processes in prototype
abstraction.  In particular, the eigenvectors of the auto-associative matrix have been
interpreted as prototypes or macro-features (Anderson et al, 1977, Abdi, 1988, O'Toole and
Abdi, 1989).  It has also been noted that computing these eigenvectors is equivalent to
performing the principal component analysis of the matrix of objects to be stored in the
memory.
       This paper describes a generalization of the linear auto-associator in which units (i.e.,
cells) can be of differential importance, or can be non-independent or can have  a bias.  The
stimuli to be stored in the memory can have also a differential importance (or can be
non-independent).  The constraints expressing response bias and differential importance of
stimuli are implemented as positive semi-definite matrices. The Widrow-Hoff learning rule
is applied to the weight matrix in a generalized form which takes the bias and the differential
importance constraints into account to compute the error.  Conditions for the convergence
of the learning rule are examined and convergence is shown to be dependent only on the
ratio of the learning constant to the smallest non-zero eigenvalue of the weight matrix.  The
maximal responses of the memory correspond to generalized eigenvectors, these vectors are
biased-orthogonal (i.e., they are orthogonal after the response bias is implemented).  It is
also shown that (with an appropriate choice of matrices for response biais and differential
importance), the generalized auto-associator is able to implement the general linear model
of statistics (including correspondence analysis, dual scaling, optimal scaling, canonical
correlation analysis, generalized principal component analysis, etc.)
       Applications and Monte Carlo simulation of the generalized auto-associator dealing
with face processing will be presented and discussed.
                               On the Optimization of a
                                Synaptic Learning Rule
                          Samy Bengio, Universit de Montral
                 Yoshua Bengio, Massachusetts Institute of Technology
                       Jocelyn Cloutier, Universit de Montral
                          Jan Gecsei, Universit de Montral
       This paper presents an original approach to neural modeling based on the idea of
tuning synaptic learning rules with optimization methods.  This approach relies on the idea
of considering the synaptic modification rule as a parametric function which has local inputs,
and is the same for many neurons.  Because the space of learning algorithms is very large,
we propose to use biological knowledge about synaptic mechanisms, in order to design the
form of such rules. The optimization methods used for this search do not have to be
biologically plausible, although the net result of this search may be a biologically plausible
learning rule.
       In the experiments described in this paper, local optimization method (gradient
descent) as well as global optimization method (simulated annealing) were used to search
for new learning rules.  Estimation of parameters of synaptic modification rules consists of
a joint global optimization of the rules themselves, as well as, of multiple networks that learn
to perform some tasks with these rules.
       Experiments are described in order to assess the feasibility of the proposed method
for very simple tasks. Experiments of classical conditioning for Aplysia yielded a rule that
allowed a network to reproduce five basic conditioning phenomena. Experiments with
two-dimentional categorization problems yielded a rule for a network with a hidden layer
that could be used to learn some simple but non-linearly separable classification tasks.  The
rule parameters were optimized for a set of classification tasks and the generalization was
tested successfully on a different set of tasks.  Initial experiments can be found in [1, 2].
                                      References
[1] Bengio, Y. & Bengio, S. (1990).  Learning a synaptic learning rule. Technical Report
       #751.  Computer Science Department. Universit de Montral.
[2] Bengio Y., Bengio S., & Cloutier, J. (1991).  Learning a synaptic learning rule.
       IJCNN-91-Seattle.
                        Complexity Optimized Data Clustering by
                              Competitive Neural Networks
                Joachim Buhmann, Lawrence Livermore National Laboratory
                      Hans Khnel, Technische Universitt Mnchen
       Data clustering is a complex optimization problem with applications ranging from
vision and speech processing to data transmission and data storage in technical as well as
in biological systems.  We discuss a clustering strategy which explicitly reflects the tradeoff
between simplicity and precision of a data representation.  The resulting clustering algorithm
jointly optimizes distortion errors and complexity costs. A maximum entropy estimation of
the clustering cost function yields an optimal number of clusters, their positions and their
occupation probabilities.  An iterative version of complexity optimized clustering is imple-
mented by an artificial neural network with winner-take-all connectivity.  Our approach
establishes a unifying framework for different clustering methods like K-means clustering,
fuzzy clustering, entropy constrainted vector quantization or topological feature maps and
competitive neural networks.
                              Interactive Sub-systems of
                               Natural Language and the
                               Treatment of Specialized
                                       Function
 
                 Sylvia Candelaria de Ram, New Mexico State University
       Context-sensitivity and rapidity of communication are two things that become
ecological essentials as cognition advances. They become ``optimals'' as cognition develops
into something elaborate, long-lasting, flexible, and social.  For successful operation of
language's default speech/gesture mode, articulation response must be rapid and
context-sensitive.  It does not follow that all linguistic cognitive function will or can be
equally fast or re-equilibrating.  But it may follow that articulation response mechanisms are
specialized in different ways than those for other cognitive functions.  The special properties
of the varied mechanisms would then interact in language use.  In actuality, our own
architecture is of this sort [1,2,3,4].  Major formative effects on our language, society, and
individual cognition apparently result [5]. ``Optimization'' leads to perpetual linguistic drift
(and adaptability) and hypercorrection effects (mitigated by emotion), so that we have
multitudes of distinct but related languages and speech communities.
       Consider modelling the variety of co-functioning mechanisms for utterance and
gesture articulation, interpretation, monitoring and selection.  Wherein lies the source of the
differing function in one and another mechanism?  Suppose [parts of] mechanisms are
treated as parts of a multi-layered, temporally parallel, staged architecture (like ours).  The
layers may be inter-connected selectively [6].  Any given portion may be designed to deal
with particular sorts of excitation [7,8,9].  A multi-level belief/knowledge logic enhanced for
such structures [10] has properties extraordinary for a logic, properties which point up some
critical features of ``neural nets'' having optimization properties pertinent to intelligent,
interactive systems.
                                      References
[1] Candelaria de Ram, S. (1984).  Genesis of the mechanism for sound change as suggested
       by auditory reflexes. Linguistic Association of the Southwest, El Paso.
[2] Candelaria de Ram, S. (1988).  Neural feedback and causation of evolving speech styles. 
       New Ways of Analyzing Language Variation (NWAV-XVII), Centre de recherces
       mathmatiques, Montreal, October.
[3] Candelaria de Ram, S. (1989).  Sociolinguistic style shift and recent evidence on `prese-
       mantic' loci of attention to fine acoustic difference.  New Ways of Analyzing
       Language Variation joint with American Dialect Society (NWAV-XVIII/ ADSC),
       Durham, NC, October.
[4] Candelaria de Ram, S. (1991b).  Language processing: mental access and sublanguages. 
       Annual Meeting, Linguistic Association of the Southwest (LASSO), Austin, Sept.
       1991.
[5] Candelaria de Ram, S. (1990b).  The sensory basis of mind: feasibility and functionality
       of a phonetic sensory store.  [Commentary on R. Ntnen, The role of attention in
       auditory information processing as revealed by event-related potentials and other
       brain measures of cognitive function.] Behav. Brain Sci. 13, 235-236.
[6] Candelaria de Ram, S. (1990c).  Sensors & concepts: Grounded cognition.  Working
       Session on Algebraic Approaches to Problem Solving and Representation, June
       27-29, Briarcliff, NY.
[7] Candelaria de Ram, S. (1990a).  Belief/knowledge dependency graphs with sensory
       groundings.  Third Int. Symp. on Artificial Intelligence Applications of Engineering
       Design and Manufacturing in Industrialized and Developing Countries, Monterrey,
       Mexico, Oct. 22-26, pp. 103-110.
[8] Candelaria de Ram, S. (1991a).  From sensors to concepts: Pragmasemantic system
       constructivity.  Int. Conf. on Knowledge Modeling and Expertise Transfer KMET'91,
       Sophia-Antipolis, France, April 22-24.  Also in Knowledge Modeling and Expertise
       Transfer, IOS Publishing, Paris, 1991, pp. 433-448.
[9] Ballim, A., Candelaria de Ram, S., & Fass, D. (1989).  Reasoning using inheritance from
       a mixture of knowledge and beliefs.  In S. Ramani, R. Chandrasekar, & K.S.R.
       Anjaneylu (Eds.),  Knowledge Based Computer Systems.  Delhi: Narosa, pp. 387-396;
       republished by Vedams Books International, New Delhi, 1990.  Also in Lecture Notes
       in  Computer Science series No. 444, Springer-Verlag, 1990.
[10] Candelaria de Ram, S. (1991c).  Why to enter into dialogue is to come out with
       changed speech: Cross-linked modalities, emotion, and language.  Second Invitational
       Venaco Workshop and European Speech Communication Association Tutorial and
       Research Workshop on the Structure of Multimodal Dialogue, Maratea, Italy, Sept.
       16-20, 1991.
               Fuzzy ARTMAP: Adaptive Resonance for Supervised Learning
                           Gail Carpenter, Boston University
       A neural network architecture for incremental supervised learning of recognition
categories and multidimensional maps in response to arbitrary sequences of analog or binary
input vectors will be described.  The architecture, called Fuzzy ARTMAP, achieves a
synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by
exploiting a close formal similarity between the computations of fuzzy subsethood and ART
category choice, response, and learning.  Fuzzy ARTMAP also realizes a new Minimax
Learning Rule that conjointly minimizes predictive error and maximizes code compression,
or generalization.  This is achieved by a match tracking process that increases the ART
vigilance parameter by the minimum amount needed to correct a predictive error.  As a
result, the system automatically learns a minimal number of recognition categories, or
"hidden units," to meet accuracy criteria.  Category proliferation is prevented by normalizing
input vectors at a preprocessing stage.  A normalization procedure called complement
coding leads to a symmetric theory in which the MIN operator () and the MAX operator
() of fuzzy logic play complementary roles.  Complement coding uses on-cells and off-cells
to represent the input pattern, and preserves individual feature amplitudes while normalizing
the total on-cell/off-cell vector.  Learning is stable because all adaptive weights can only
decrease in time.  Decreasing weights corresponds to increasing sizes of category "boxes." 
Improved prediction is achieved by training the system several times using different orderings
of the input set.  This voting strategy can also be used to assign probability estimates to
competing predictions given small, noisy, or incomplete training sets.  Simulations illustrate
Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic
algorithm systems.  These simulations include (i) finding points inside vs. outside a circle; (ii)
learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous
function; (iv) a letter recognition database; and (v) a medical database.
                      Properties of Optimality in Neural Networks
             Mark DeYong and Thomas Eskridge, New Mexico State University
       This presentation discusses issues concerning optimality in neural and cognitive
functioning.  We discuss these issues in terms of the tradeoffs they impose on the design of
neural network systems.  We illustrate the issues with example systems based on a novel
VLSI neural processing element developed, fabricated, and tested by the first author.  There
are four general issues of interest:
 Biological Realism vs. Computational Power.  Many implementations of neurons
sacrifice computational power for biological realism.  Biological realism imposes a set of
constraints on the structure and timing of certain operations in the neuron.  Taken as an
absolute requirement, these constraints, though realistic, reduce the computational power
of individual neurons, and of systems built on those neurons.  However, to ignore the
biological characteristics of neurons is to ignore the best example of the type of device
we are trying to implement.  In particular, simple non-biologically inspired neurons
perform a completely different style of processing than biologically inspired ones.  Our
work allows for biological realism in areas where it increases computational power, while
ignoring the low-level details that are simply by-products of organic systems.
 Task-Specific Architecture vs. Uniform Element, Massive Parallelism.  A central issue in
developing neural network systems is whether to design networks specific to a particular
task or to adapt a general-purpose network to accomplish the task.  Developing task-
specific architectures allows for small, fast networks that approach optimality in
performance, but require more effort during the design stage.  General-purpose
architectures approach optimality in design that merely needs to be adapted via weight
modifications to a new problem, but suffer from performance inefficiencies due to
unneeded and/or redundant nodes.  Our work hypothesizes that task-specific architec-
tures that use a building-block approach combined with fine-tuning by training will
produce the greatest benefits in the tradeoff between design and performance optimality.
 Time Independence vs. Time Dependence.  Many neural networks assume that each
input vector is independent of other inputs, and the job of the neural network is to
extract patterns within the input vector that are sufficient to characterize it.  For
problems of this type, a network that assumes time independence will provide acceptable
performance.  However, if the input vectors cannot be assumed to be independent, the
network must process the vector with respect to its temporal characteristics.  Networks
that assume time independence have a variety of well-known training and performance
algorithms, but will be unwieldy when applied to a problem in which time independence
does not hold.  Although temporal characteristics can be converted into levels, there will
be a loss of information that may be critical to solving the problem efficiently.  Networks
that assume time dependence have the advantage of being able to handle both time
dependent and time independent data, but do not have well known, generally applicable
training and performance algorithms.  Our approach is to assume time dependence, with
the goal of handling a larger range of problems rather than having general training and
performance methods.
 Hybrid Implementation vs. Analog or Digital Only.  The optimality of hardware
implementations of neural networks depends in part on the resolution of the second
tradeoff mentioned above.  Analog devices generally afford faster processing at a lower
hardware overhead than digital, whereas digital devices provide noise immunity and a
building-block approach to system design.  Our work adopts a hybrid approach where
the internal computation of the neuron is implemented in analog, and the extracellular
communication is performed digitally.  This gives the best of both worlds: the speed and
low hardware overhead of analog and the noise immunity and building-block nature of
digital components.
       Each of these issues has individual ramifications for neural network design, but
optimality of the overall system must be viewed as their composite.  Thus, design decisions
made in one area will constrain the decisions that can be made in the other areas.
          Putting Optimality in its Place: Arguments on Context, Systems and 
                                    Neural Networks
                    Wesley Elsberry, Battelle Research Laboratories
       Determining the "optimality" of a particular neural network should be an exercise in
multivariate analysis.  Too often, performance concerning a narrowly defined problem has
been accepted as prima facie evidence that some ANN architecture has a specific level of
optimality.  Taking a cue from the field of genetic algorithms (and the theory of natural
selection from which GA's are derived), I offer the observation that optimality is selected
in the phenotype, i.e., the level of performance of an ANN is inextricably bound to the
system of which it is a part.  The context in which the evaluation of optimality is performed
will influence the results of that evaluation greatly.  While compartmentalized and
specialized tests of ANN performance can offer insights, the construction of effective systems
may require additional consideration to be given to the assumptions of such tests.  Many
benchmarks and other tests assume a static problem set, while many real-world applications
offer dynamical problems.  An ANN which performs "optimally" in a test may perform
miserably in a putatively similar real-world application.  Recognizing the assumptions which
underlie evaluations is important for issues of optimal system design; recognizing the need
for "optimally sub-optimal" response in adaptive systems applied to dynamic problems is
critical to proper placement of priority given to optimality of ANN's.
   
                            Identifying a Neural Network's
                                 Computational Goals:       
                        A Statistical Optimization Perspective
                            
                   Richard M. Golden, University of Texas at Dallas
       The importance of identifying the computational goal of a neural network
computation is first considered from the perspective of Marr's levels of descriptions theory
and Simon's theory of satisficing.  A "statistical optimization perspective" is proposed as a
specific implementation of the more general theories of Marr and Simon.  The empirical
"testability" of the "statistical optimization perspective" is also considered.  It is argued that
although such a hypothesis is only occasionally empirically testable, such a hypothesis plays
a fundamental role in understanding complex information processing systems.
       The usefulness of the above theoretical framework is then considered with respect
to both artificial neural networks and biological neural networks.  An argument is made that
almost any artificial neural networks may be viewed as optimizing a statistical cost function.
To support this claim, the large class of back-propagation feed-forward artificial neural
networks and Cohen-Grossberg type recurrent artificial neural networks are formally viewed
as optimizing specific statistical cost functions. Specific statistical tests for deciding whether
the statistical environment of the neural network is "compatible" with the statistical cost
function the network is presumably optimizing are also proposed.  Next, some ideas
regarding the applicability of such analyses to much more complicated artificial neural
networks which are "closer approximations" to real biological neural networks will also be
discussed.
                            Vector Associative Maps: Self-
                            organizing Neural Networks for
                             Error-based Learning, Spatial
                        Orientation, and Sensory-motor Control
                         Stephen Grossberg, Boston University
       This talk describes a new class of neural models for unsupervised error-based
learning.  Such a Vector Associative Map, or VAM, is capable of autonomously calibrating
the spatial maps and arm trajecgtory parameters used during visually guided reaching. 
VAMs illustrate how spatial and motor representations can self-organize using a unified
computational format.  They clarify how an autonomous agent can build a self-optimizing
hierarchy of goal-oriented actions based upon more primitive, endogenously generated
exploration of its environment.  Computational properties of ART and VAM systems are
complementary.  This complementarity reflects different processing requirements of sensory-
cognitive versus spatial-motor systems, and suggests that no single learning algorithm can be
used to design an autonomous behavioral agent.
                                           
                         Problem Solving in a Connectionistic
                                      World Model
                  Steven Hampson, University of California at Irvine
       Stimulus-Response (S-R), Stimulus-Evaluation (S-E), and Stimulus-Stimulus (S-S)
models of problem solving are central to animal learning theory.  When applicable, the
procedural S-R and S-E models can be quite space efficient, as they can potentially learn
compact generalizations over the functions they are taught to compute.  On the other hand,
learning these generalizations can be quite time consuming, and adjusting them when
conditions change can take as long as learning them in the first place.  In contrast, the S-S
model developed here does not learn a particular input-to-output mapping, but simply
records a series of "facts" about possible state transitions in the world.  This declarative
world model provides fast learning, easy update and flexible use, but is space expensive. 
The procedural/declarative distinction observed in biological behavior suggests that both
types of mechanisms are available to an organism in its attempts to balance, if not optimize,
both time and space requirements.  The work presented here investigates the type of
problems that are most effectively addressed in an S-S model.
                          Efficient Optimising Dynamics in a
                                Hopfield-style Network
                 Arun Jagota, State University of New York at Buffalo
       Definition: A set of vertices (nodes, points) of a graph such that every pair is
connected by an edge (arc, line) is called a clique.  An example in the ANN context is a set
of units such that all pairs are mutually excitatory. The following description applies to
optimisation issues in any problems that can be modeled with cliques.
       Background I: We earlier proposed a variant (essentially a special case) of the
Hopfield network which we called the Hopfield-Style Network (HSN).  We showed that the
stable states of HSN are exactly the maximal cliques of an underlying graph. The depth of
a local minimum (stable state) is directly proportional (although not linearly) to the size of
the corresponding clique.  Any graph can be made the underlying graph of HSN.  These
three facts suggest that HSN  with suitable optimising dynamics  can be applied to the
CLIQUE (optimisation) problem, namely that of ``Finding the largest clique in any given
graph''.
       Background II: The CLIQUE problem is NP-Complete, suggesting that it is most
likely intractable. Recent results from Computer Science suggest that even approximately
solving this problem is probably hard.  Researchers have shown that on most (random)
graphs, however, it can be approximated fairly well.  The CLIQUE problem has many
applications, including (1) Contentaddressable memories can be modeled as cliques. (2)
ConstraintSatisfaction Problems (CSPs) can be represented as the CLIQUE problem.  Many
problems in AI from Computer Vision, NLP, KR etc have been cast as CSPs. (3) Certain
object recognition problems in Computer Vision can be modeled as the CLIQUE problem.
Given an image object A and a reference object B, one problem is to find a sub-object of
B which ``matches'' A.  This can be represented as the CLIQUE problem. 
       Abstract: We will present details of the modeling of optimisation problems related
to those described in Background II (and perhaps others) as the CLIQUE problem. We will
discuss how HSN can be used to obtain optimal or approximate solutions.  In particular, we
will describe three (efficient) gradient-descent dynamics on HSN, discuss their optimisation
capabilities, and present theoretical and/or empirical evidence for such. The dynamics are,
Discrete: (1) Steepest gradient-descent (2) rho-annealing.  Continuous: (3) Mean-field
annealing.  We will discuss characterising properties of these dynamics including, (1)
emulates a well-known graph algorithm, (2) is suited only for HSN, (3) originates from
statistical mechanics and has gained wide attention for its optimisation properties.  We will
also discuss the continuous Hopfield network dynamics as a special case of (3). 
                             State Generators and Complex
                                    Neural Memories
                      Subhash C. Kak, Louisiana State University
       The mechanism of self-indexing for feedback neural networks that generates
memories from short subsequences is generalized so that a single bit together with an
appropriate update order suffices for each memory.  This mechanism can explain how stimu-
lating an appropriate neuron can then recall a memory.  Although the information is
distributed in this model, yet our self-indexing mechanism [1] makes it appear localized. 
Also a new complex valued neuron model is presented to generalize McCulloch-Pitts
neurons.
       There are aspects to biological memory that are distributed [2] and others that are
localized [3].  In the currently popular artificial neural network models the synaptic weights
reflect the stored memories, which are thus distributed over the network.  The question then
arises when these models can explain Penfield's observations on memory localization.  This
paper shows that memory localization.  This paper shows that such a memory localization
does occur in these models if self-indexing is used.  It is also shown how a generalization of
the McCulloch-Pitts model of neurons appears essential in order to account for certain
aspects of distributed information processing.  One particular generalization, described in
the paper, allows one to deal with some recent findings of Optican & Richmond [4].
       Consider the model of the mind where each mental event corresponds to some neural
event. Neurons that deal with mental events may be called cognitive neurons.  There would
be other neurons that simply compute without cognitive function.  Consider now cognitive
neurons dealing with sensory input that directly affects their behaviour.  We now show that
independent cognitive centers will lead to competing behaviour.
       Even non-competing cognitive centres would show odd behaviour since collective
choice is associated with non-transitive logic.  This is clear from the ordered choice paradox
that occurs for any collection of cognitive individuals.
       This indicates that a scalar energy function cannot be associated with a neural
network that performs logical processing. Because if that were possible then all choices made
by a network could be defined in an unambiguous hierarchy, with at worst more than one
choice having a particular value. The question of cyclicity of choices, as in the ordered choice
paradox, will not arise.
                                      References
[1] Kak, S.C. (1990a).  Physics Letters A 143, 293.
[2] Lashley, K.S. (1963).  Brain Mechanisms and Learning.  New York: Dover).
[3] Penfield, P. & Roberts, L. (1959).  Speech and Brain Mechanisms.  Princeton: Princeton
       University Press).
[4] Optican, L.M. & Richmond, B.J. (1987).  J. Neurophysiol. 57, 162.
                      Don't Just Stand There, Optimize Something!
                    Daniel Levine, University of Texas at Arlington
       Perspectives on optimization in a variety of disciplines, including physics, biology,
psychology, and economics, are reviewed.  The major debate is over whether optimization
is a description of nature, a normative prescription, both or neither.  The presenter leans
toward a belief that optimization is a normative prescription and not a description of nature.
       In neural network theory, the attempt to explain all behavior as the optimization of
some variable (no matter how tortuously defined the variable is!) has spawned some work
that has been seminal to the field.  This includes both the "hedonistic neuron" theory of
Harry Klopf, which led to some important work in conditioning theory and robotics, and the
"dynamic programming" of Paul Werbos which led to back propagation networks.  Yet if all
human behavior is optimal, this means that no improvement is possible on wasteful wars,
environmental destruction, and unjust social systems.
       The presenter will review work on the effects of frontal lobe damage, specifically the
dilemma of perseveration in unrewarding behavior combined with hyperattraction to novelty,
and describe these effects as prototypes of non-optimal cognitive function.  It can be argued
(David Stork, personal communication) that lesion effects do not demonstrate non-optimality
because they are the result of system malfunction.  If that is so, then such malfunction is far
more pervasive than generally believed and is not dependent on actual brain damage. 
Architectural principles such as associative learning, lateral inhibition, opponent processing,
and resonant feedback, which enable us to interact with a complex environment also
sometimes trap us in inappropriate metaphors (Lakoff and Johnson, 1980).  Even intact
frontal lobes do not always perform their executive function (Pribram, 1991) with optimal
efficiency.  
                                      References
Lakoff, G. & Johnson, M. (1980).  Metaphors We Live By.  University of Chicago Press.
Pribram, K. (1991).  Brain and Perception.  Erlbaum.
                             For What Are Brains Striving?
                       Gershom-Zvi Rosenstein, Hebrew University
       My aim is to outline a possibility of a unified approach to several yet unsolved
problems of behavioral regulation, most of them related to the puzzle of schizophrenia.  This
Income-Choice Approach (ICA), proposed originally in the seventies, was summarized only
recently in the book of the present author [1].  One of the main problems the approach was
applied to is the model of behavior disturbances.  The income (the value of the goal-function
of our model) is defined, by assumption, on the intensities of streams of impulses directed
to the reward system.  The incomd can be accumulated and spent on different activites of
the model.  The choices done by the model depend on the income they are expected to
bring.
       Now the ICA is applied to the following problems:
       The catecholamine distribution change (CDC) in the schizophrenic brain.  I try to
prove the idea that CDC is caused by the same augmented (in comparison with the norm)
stimulation of the reward system that was proposed by us earlier as a possible cause for the
behavioral disturbance.  The role of dopamine in the brain processing of information is
discussed.  The dopamine is seen as one of the forms of representation of income in the
brain.
       The main difference between the psychology of "normal" and schizophrenic subjects,
according to many researchers, is that in schizophrenics "observations prevail over
expectations."  This property can be shown to be a formal consequence of our model.  It was
used earlier to describe the behavior of schizophrenics versus normal people in delusion
formation (as Scharpantje delusion, etc.).
       ICA strongly supports the known anhedonia hypothesis of the action of neuroleptics. 
In fact, that hypothesis can be concluded from ICA if some simple and natural assumptions
were accepted.
       A hypothesis about the nature of stereotypes as an adjunctive type of behavior is
proposed.  They are seen as behaviors concerned not with the direct physiological needs of
the organism but with the regulation of activity of its reward system.  The proposition can
be tested partly in animal experiments.
       The problem of origination of so-called "positive" and "negative" symptoms in
schizophrenia is discussed.  The positive symptoms are seen as attempts and sometimes
means to produce an additional income for the brain whose external sources of income are
severely limited.  The negative symptoms are seen as behaviors chosen in the condition
whereby the quantity of income that can be used to provide these behaviors is small and
cannot be increased.
       The last part of the presentation is dedicated to the old problem of the realtionship
between "genius" and schizophrenia.  It is a continuation of material introduced in [1].  The
remark is made that the phenomenon of uric acid excess thought by some investigators too
be connected to high intellectual achievement can be realted to the uric acid excess found
to be produced by augmented stimulation of the reward system in the self-stimulation
paradigm.
                                      References
[1] Rosenstein, G.-Z. (1991).  Income and Choice in Biological Systems.  Lawrence Erlbaum
       Associates.
       
                       Non-optimality in Neurobiological Systems
                     David Stork, Ricoh California Research Center
       I will in tow ways argue strongly that neurobiological systems are "non-optimal."  I
note that "optimal" implies a match between some (human) notion of function (or
structure,...) and the implementation itself.  My first argument addresses the dubious
approach which tries to impose notions of what is being optimized, i.e., stating what the
desired function is.  For instance Gabor-function theorists claim that human visual receptive
fields attempt to optimize the product of the sensitivity bandwidths in the spatial and the
spatial-frequency domains [1].  I demonstrate how such bandwidth notions have an implied
measure, or metric, of localization; I examine the implied metric and find little or no
justification for preferring it over any of a number of other plausible metrics [2].  I also show
that the visual system has an overabundnace of visual cortical cells (by a factor of 500) than
what is implied by the Gabor approach; thus the Gabor approach makes this important fact
hard to understand.  Then I review aruments of others describing visual receptive fields as
being "optimally" tuned to visual gratings [3], and show that here too an implied metric is
unjustified [4].  These considerations lead to skepticism of the general appoach of imposing
or guessing the actual "true" function of neural systems, even in specific mathematical cases. 
Only in the most compelling cases can the function be stated confidently.
       My second criticism of the notion of optimality is that even if in such extreme cases
the neurobiological function is known, biological systems generally do not implement them
in an "optimal" way.  I demonstrate this for a non-optimal ("useless") synapse in the crayfish
tailflip circuit.  Such non-optimality can be well explained by appealing to the process of
preadaptation from evolutionary theory [5,6].  If a neural circuit (or organ, or behavior ...)
which evolves to solve one problem is later called upon to solve a different problem, then
the evolving circuit must be built upon the structure appropriate to the previous task.  Thus,
for instance the non-optimal synapse in the crayfish tail flipping circuit can be understood
as a holdover from a previous evolutionary epoch in which the circuit was used instead for
swimming..  In other words, evolutionary processes are gradual, and even if locally optimal
(i.e., optimal on relatively short time scales), they need not be optimal after longer epochs. 
(This is analogous to local minima that plague some gradient-descent methods in
mathematics.)
       Such an analysis highlights the role of evolutionary history in understanding the
structure and function of current neurobiological systems, and along with our previous
analysis, strongly argues against optimality in neurobiological systems.  I therefore concur
with the recent statement that in neural systems "... elegance of design counts for little." [7]
                                      References
[1] Daugman, J. (1985).  Uncertainty relation for resolution in space, spatial frequency, and
       orientation optimized by two-dimensional visual cortical filters.  J. Opt. Soc. Am. A
       2, 1160-1169.
[2] Stork, D. G. & Wilson, H. R., Do Gabor functions provide appropriate descriptions of
       visual cortical receptive fields.  J. Opt. Soc. Am. A 7, 1362-1373.
[3] Albrecht, D. G., DeValois, R. L., & Thorell, L. G. (1980).  Visual cortical neurons: Are
       bars or gratings the optimal stimuli?  Science 207, 88-90.
[4] Stork, D. G. & Levinson, J. Z. (1982).  Receptive fields and the optimal stimulus. 
       Science 216, 204-205.
[5] Stork, D. G., Jackson, B., & Walker, S. (1991).  "Non-optimality" via preadaptation in
       simple neural systesm.  In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen
       (Eds.), Artificial Life II.  Addison-Wesley and Santa Fe Institute, pp. 409-429.
[6] Stork, D. G. (1992, in press).  Preadaptation and principles of organization in organisms. 
       In A. Baskin & J. Mittenthal (Eds.), Principles of Organization in Organisms. 
       Addison-Wesley and Santa Fe Institute.
[7] Dumont, J. P. C. & Robertson, R. M. (1986).  Neuronal circuits: An evolutionary
       perspective.  Science 233, 849-853.
                          Why Do We Study Neural Nets on VLSI
                        Chips and Why Are Wavelets More Natural
                              for Brain-Style Computing?
                       Harold Szu, Naval Surface Warfare Center
       Neural nets, natural or artificial, are structure by the desire to accomplish certain
information processing goals.  An example of this occurs in the celebrated exclusive-OR
computation.  "You are as intelligent as you can hear and see," according to an ancient
Chinese saying.  Thus, the mismatch between human-created sensor data used for input
knowledge representation and the nature-evolved brain-style computing architectures is one
of the major impediments for neural net applications.
       After a review of classical neural nets with fixed layered architectures and small-
perturbation Hebbian learning, we will show a videotape of "live" neural nets on VLSI chips. 
These chips provide a tool, a "fishnet," to capture live neurons in order to investigate one
of the most challenging frontiers  the self-architecturing of neural nets.  The singlet and
pair correlation functions can be measured to define a hairy neuron model.  The minimum
set of three hairy neurons ("Peter, Paul, and Mary") seems to behave "intelligently" to form
a selective network.  Then, the convergence proof for self-architecturing hairy neurons will
be given.
       A more powerful tool, however, is the wavelet transform, an adaptive wide-band
Fourier analysis developed in 1985 by French oil explorers.  This transform goes beyond the
(preattentive) Gabor transform by developing (attentive C.O.N.) wavelet perception in a
noisy environment.  The utility of wavelets in brain-style computing can be recognized from
two observations.  First, the "cocktail party effect," namely, you hear what you wish to hear,
can be explained by the wavelet matched filter which can achieve a tremendous bandwidth
noise reduction.  Second, "end-cut" contour filling may be described by Gibbs overshooting
in this wavelet manner.  In other words, wavelets form a very natural way of describing real
scenes and real signals.  For this reason, it seems likely that the future of neural net applica-
tions may be in learning to do wavelet analyses by a self-learning of the "mother wavelet"
that is most appropriate for a specific dynamic input-output medium.
                         Optimal Generalisation in Artificial
                                    Neural Networks
                     Graham Tattersall, University of East Anglia
       A key property of artificial neural networks is their ability to produce a useful output
when presented with an input of previously unseen data even if the network has only been
trained on a small set of examples of the input-output function underlying the data.  This
process is called generalisation and is effectively a form of function completion.
       ANNs such as the MLP and RBF sometimes appear to work effectively as
generalisers on this type of problem, but there is now widespread recognition that the form
of generalisation which arises is very dependent on the architecture of the ANN, and is often
completely inappropriate, particularly when dealing with symbolic data.
       This paper will argue that generalisation should be done in such a way that the
chosen completion is the most probable and is consistent with the tranining examples.  These
criteria dictate that the generalisation should not depend in any way upon the architecture
or functionality of components of the generalising system, and that the generalisation will
depend entirely on the statistics of the training exemplars.
       A practical method for generalising in accordance with the probability and consistency
criteria, is to find the minimum entropy generalisation using the Shannon-Hartley relationship
between entropy and spatial bandwidth.
       The usefulness of this approach can be demonstrated using a number of binary data
functions which contain both first and higher order structure.  However, this work has shown
very clearly that, in the absence of an architecturally imposed generalisation strategy, many
function completions are equally possible unless a very large proportion of all possible
function domain points are contained in the training set.
       It therefore appears desirable to design generalising systems such as neural networks
so that they generalise, not only in accordance with the optimal generalisation criteria of
maximum probability and training set consistency, but also subject to a generalisation
strategy which is specified by the user.
       Two approaches to the imposition of a generalisation strategy are described.  In the
first method, the characteristic autocorrelation function or functions belonging to a specified
family are used as the weight set in a Kosko net.  The second method uses Wiener Filtering
to remove the "noise" implicit in an incomplete description of a function.  The transfer
function of the Wiener Filter is specific to a particular generalisation strategy.
                            E-mail Addresses of Presenters
Abdi                       abdi at utdallas.edu
Bengio                     bengio at iro.umontreal.ca
Bhaumik                    netearth!bhaumik at shakti.ernet.in
Buhmann                    jb at s1.gov
Candelaria de Ram          sylvia at nmsu.edu
Carpenter                  gail at park.bu.edu
Chance                     u0503aa at vms.ucc.okstate.edu
DeYong                     mdeyong at nmsu.edu
Elsberry                   elsberry at hellfire.pnl.gov
Golden                     golden at utdallas.edu
Grossberg                  steve at park.bu.edu
Hampson                    hampson at ics.uci.edu
Jagota                     jagota at cs.buffalo.edu
Johnson                    ecjdj at nve.mcsr.olemiss.edu
Kak                        kak at max.ee.lsu.edu
Leven                      (reach via Pribram, see below)
Levine                     b344dsl at utarlg.uta.edu
Ogmen                      elee52f at jetson.uh.edu
Parberry                   ian at hercule.csci.unt.edu
Pribram                    kpribram at ruacad.runet.edu
Prueitt                    prueitt at guvax.georgetown.edu
Rosenstein                 NONE
Stork                      stork at crc.ricoh.com
Szu                        btelfe at bagheera.nswc.navy.mil
Tattersall                 ?
    
    
More information about the Connectionists
mailing list