A New Series of Virtual Textbooks on Neural Networks
Michael A. Arbib
arbib at pollux.usc.edu
Fri Oct 6 13:50:05 EDT 1995
October 6, 1995
Yesterday, a visitor to my office, while speaking of his enthusiasm
for "The Handbook of Brain Theory and Neural Networks",
mentioned that some of his colleagues had criticized the fact that
the [266] articles [in Part III] were arranged in alphabetical order,
thus lacking the "logical order" to make the book easy to use for
teaching.
The purpose of this note is to answer such concerns.
1. The boring answer is that a Handbook is not a Textbook.
Indeed, given that the 266 articles provide such a comprehensive
overview - including detailed models of single neurons; analysis of
a wide variety of neurobiological systems; connectionist studies;
mathematical analyses of abstract neural networks; and
technological applications of adaptive, artificial neural networks
and related methodologies - it is hard to imagine a course that
would cover the whole book, no matter in what order the articles
were presented.
2. The exciting answer is that THE HANDBOOK IS A VIRTUAL
LIBRARY OF TWENTY-THREE TEXTBOOKS!!
Before the 266 articles of Part III come Part I and Part II.
Part I provides an introductory textbook level introduction
to Neural Networks.
Part II provides 23 "road maps", each of which lists the
articles on a particular theme, followed by an essay which
offers a "logical order" in which to read these articles.
Thus, the Handbook can be used to provide a "virtual textbook"
on any one of the following 23 topics:
Applications of Neural Networks
Artificial Intelligence and Neural Networks
Biological Motor Control
Biological Networks
Biological Neurons
Computability and Complexity
Connectionist Linguistics
Connectionist Psychology
Control Theory and Robotics
Cooperative Phenomena
Development and Regeneration of Neural Networks
Dynamic Systems and Optimization
Implementation of Neural Networks
Learning in Artificial Neural Networks, Deterministic
Learning in Artificial Neural Networks, Statistical
Learning in Biological Systems
Mammalian Brain Regions
Mechanisms of Neural Plasticity
Motor Pattern Generators and Neuroethology
Primate Motor Control
Self-Organization in Neural Networks
Other Sensory Systems
Vision
In each case, the instructor can follow the road map to traverse
the articles to provide full coverage of the topic, using the cross-
references to choose supplementary material from within the
Handbook, and the carefully selected list of readings at the end
of each article to choose supplementary material from the general
literature.
As an appendix to this message, I include a sample road map, that
on "Learning in Artificial Neural Networks, Deterministic". All
the road maps are available on the Web at:
http://www-mitpress.mit.edu/mitp/recent-
books/comp/handbook-brain-theo.html
If you have other queries about how best to use the Handbook, or
suggestions for improving the Handbook, please feel free to contact
me by email: arbib at pollux.usc.edu.
With best wishes
Michael Arbib
*****
APPENDIX:
The Road Map for
"Learning in Artificial Neural Networks, Deterministic"
from Part II of The Handbook of Brain Theory and Neural
Networks, (M.A. Arbib, Ed.), A Bradford Book, copyright 1995,
The MIT Press.
LEARNING IN ARTIFICIAL NEURAL NETWORKS,
DETERMINISTIC
[Articles in the Road Map, listed in Alphabetical Order.]
Adaptive Resonance Theory
Associative Networks
Backpropagation: Basics and New Developments
Convolutional Networks for Images, Speech, and Time-Series
Coulomb Potential Learning
Kolmogorov's Theorem
Learning by Symbolic and Neural Methods
Learning as Hill-Climbing in Weight Space
Learning as Adaptive Control of Synaptic Matrices
Modular Neural Net Systems, Training of
Neocognitron: A Model for Visual Pattern Recognition
Neurosmithing: Improving Neural Network Learning
Nonmonotonic Neuron Associative Memory
Pattern Recognition
Perceptrons, Adalines, and Backpropagation
Recurrent Networks: Supervised Learning
Reinforcement Learning
Topology-Modifying Neural Network Algorithms
[Articles in the Road Map, discussed in Logical Order.]
Much of our concern is with supervised learning, getting a network
to behave in a way which successfully approximates some specified
pattern of behavior or input-output relationship. In particular,
much emphasis has been placed on feedforward networks, that is,
networks which have no loops, so that the output of the net
depends on its input alone, since there is then no internal state
defined by reverberating activity. The most direct form of this is a
synaptic matrix, a one-layer neural network for which input lines
directly drive the output neurons and a "supervised Hebbian" rule
sets synapses so that the network will exhibit specified input-
output pairs in its response repertoire. This is addressed in the
article on ASSOCIATIVE NETWORKS, which notes the problems
that arise if the input patterns (the "keys" for associations) are not
orthogonal vectors. Association also extends to recurrent networks
obtained from one layer networks by feedback connections from the
output to the input, but in such systems of "dynamic memories" (e.g.,
Hopfield networks) there are no external inputs as such. Rather the
"input" is the initial state of the network, and the "output" is the
"attractor" or equilibrium state to which the network then settles.
Unfortunately, the usual "attractor network" memory model, with
neurons whose output is a sigmoid function of the linear combination
of their inputs, has many spurious memories, i.e., equilibria other
than the memorized patterns, and there is no way to decide a
memorized pattern is recalled or not. The article on
NONMONOTONIC NEURON ASSOCIATIVE MEMORY shows that, if
the output of each neuron is a nonmonotonic function of its input, the
capacity of the network can be increased, and the network does not
exhibit spurious memories: when the network fails to recall a
correct memorized pattern, the state shows a chaotic behavior
instead of falling into a spurious memory.
Historically, the earliest forms of supervised learning involved
changing synaptic weights to oppose the error in a neuron with a
binary output (the perceptron error-correction rule), or to minimize
the sum of squares of errors of output neurons in a network with real-
valued outputs (the Widrow-Hoff rule). This work is charted in
the article on PERCEPTRONS, ADALINES AND BACKPROPAGATION,
which also charts the extension of these classic ideas to
multilayered feedforward networks. Multilayered networks pose
the structural credit assignment problem: when an error is made at
the output of a network, how is credit (or blame) to be assigned to
neurons deep within the network? One of the most popular
techniques is called backpropagation, whereby the error of output
units is propagated back to yield estimates of how much a given
"hidden unit" contributed to the output error. These estimates are
used in the adjustment of synaptic weights to these units within the
network. The article on BACKPROPAGATION: BASICS AND NEW
DEVELOPMENTS places this idea in a broader mathematical and
historical framework in which backpropagation is seen as a
general method for calculating derivatives to adjust the weights of
nonlinear systems, whether or not they are neural networks. The
underlying theoretical grounding is that, given any function f: X .
Y for which X and Y are codable as input and output patterns of a
neural network, then, as shown in the article on KOLMOGOROV'S
THEOREM, f can be approximated arbitrarily well by a
feedforward network with one layer of hidden units. The catch, of
course, is that many, many hidden units may be required for a close
fit. It is often an empirical question whether there exists a
sufficiently good approximation achievable in principle by a
network of a given size P an approximation which a given learning
rule may or may not find (it may, for example, get stuck in a local
optimum rather than a global one). The article on
NEUROSMITHING: IMPROVING NEURAL NETWORK LEARNING
provides a number of "rules of thumb" to be used in applying
backpropagation in trying to find effective settings for network size
and for various coefficients in the learning rules.
One useful perspective for supervised learning views LEARNING AS
HILL-CLIMBING IN WEIGHT SPACE, so that each "experience"
adjusts the synaptic weights of the network to climb (or descend) a
metaphorical hill for which "height" at a particular point in
"weight space" corresponds to some measure of the performance of
the network (or the organism or robot of which it is a part). When
the aim is to minimize this measure, one of the basic techniques for
learning is what mathematicians call "gradient descent";
optimization theory also provides alternative methods such as,
e.g., that of conjugate gradients, which are also used in the neural
network literature. REINFORCEMENT LEARNING describes a form of
"semi-supervised" learning where the network is not provided
with an explicit form of error at each time step but rather receives
only generalized reinforcement ("you're doing well"; "that was
bad!") which yields little immediate indication of how any neuron
should change its behavior. Moreover, the reinforcement is
intermittent, thus raising the temporal credit assignment problem:
how is an action at one time to be credited for positive
reinforcement at a later time? One solution is to build an "adaptive
critic" which learns to evaluate actions of the network on the basis
of how often they occur on a path leading to positive or negative
reinforcement.
Another perspective on supervised learning is presented in
LEARNING AS ADAPTIVE CONTROL OF SYNAPTIC MATRICES,
which views learning as a control problem (controlling synaptic
matrices to yield a given network behavior) and then uses the
adjoint equations of control theory to derive synaptic adjustment
rules. Gradient descent methods have also been extended to adapt
the synaptic weights of recurrent networks, as discussed in
RECURRENT NETWORKS: SUPERVISED LEARNING, where the aim
is to match the time course of network activity, rather than the
(input, output) pairs of some training set.
The task par excellence for supervised learning is pattern
recognition, the problem of classifying objects, often represented as
vectors or as strings of symbols, into categories. Historically, the
field of pattern recognition started with early efforts in neural
networks (see PERCEPTRONS, ADALINES AND
BACKPROPAGATION). While neural networks played a less central
role in pattern recognition for some years, recent progress has made
them the method of choice for many applications. As PATTERN
RECOGNITION demonstrates, multilayer networks, when properly
designed, can learn complex mappings in high-dimensional spaces
without requiring complicated hand-crafted feature extractors. To
rely more on learning, and less on detailed engineering of feature
extractors, it is crucial to tailor the network architecture to the
task, incorporating prior knowledge to be able to learn complex
tasks without requiring excessively large networks and training
sets.
Many specific architectures have been developed to solve
particular types of learning problem. ADAPTIVE RESONANCE
THEORY (ART) bases learning on internal expectations. When the
external world fails to match an ART network's expectations or
predictions, a search process selects a new category, representing a
new hypothesis about what is important in the present
environment. The neocognitron (see NEOCOGNITRON: A MODEL
FOR VISUAL PATTERN RECOGNITION) was developed as a neural
network model for visual pattern recognition which addresses the
specific question "how can a pattern be recognized despite
variations in size and position?" by using a multilayer architecture
in which local features are replicated in many different scales and
locations. More generally, as shown in CONVOLUTIONAL
NETWORKS FOR IMAGES, SPEECH, AND TIME SERIES, shift
invariance in convolutional networks is obtained by forcing the
replication of weight configurations across space. Moreover, the
topology of the input is taken into account, enabling such networks
to force the extraction of local features by restricting the receptive
fields of hidden units to be local. COULOMB POTENTIAL LEARNING
derives its name from its functional form's likeness to a coulomb
charge potential, replacing the linear separability of a simple
perceptron with a network that is capable of constructing arbitrary
nonlinear boundaries for classification tasks.
We have already noted that networks that are too small cannot
learn the desired input to output mapping. However, networks can
also be too large. Just as a polynomial of too high a degree is not
useful for curve-fitting, a network that is too large will fail to
generalize well, and will require longer training times. Smaller
networks, with fewer free parameters, enforce a smoothness
constraint on the function found. For best performance, it is,
therefore, desirable to find the smallest network that will
"properly" fit the training data. The article TOPOLOGY-
MODIFYING NEURAL NETWORK ALGORITHMS reviews algorithms
which adjust network topology (i.e., adding or removing neurons
during the learning process) to arrive at a network appropriate to a
given task.
The last two articles in this road map take a somewhat different
viewpoint from that of adjusting the synaptic weights in a single
network. MODULAR NEURAL NET SYSTEMS, TRAINING OF presents
the idea that, although single neural networks are theoretically
capable of learning complex functions, many problems are better
solved by designing systems in which several modules cooperate
together to perform a global task, replacing the complexity of a
large neural network by the cooperation of neural network modules
whose size is kept small. The article on LEARNING BY SYMBOLIC
AND NEURAL METHODS focuses on the distinction between
symbolic learning based on producing discrete combinations of the
features used to describe examples and neural approaches which
adjust continuous, nonlinear weightings of their inputs. The article
not only compares but also combines the two approaches, showing
for example how symbolic knowledge may be used to set the initial
state of an adaptive network.
[This Road Map is then followed by one on "Learning in Artificial
Neural Networks, Statistical"]
More information about the Connectionists
mailing list