Thesis/TR available
Steven J. Nowlan
nowlan at helmholtz.sdsc.edu
Thu Jun 27 14:38:58 EDT 1991
The following technical report version of my thesis is now available from
the School of Computer Science, Carnegie Mellon University:
-------------------------------------------------------------------------------
Soft Competitive Adaptation:
Neural Network Learning Algorithms
based on Fitting Statistical Mixtures
CMU-CS-91-126
Steven J. Nowlan
School of Computer Science
Carnegie Mellon University
ABSTRACT
In this thesis, we consider learning algorithms for neural networks which are
based on fitting a mixture probability density to a set of data.
We begin with an unsupervised algorithm which is an
alternative to the classical winner-take-all competitive algorithms. Rather
than updating only the parameters of the ``winner'' on each case, the
parameters of all competitors are updated in proportion to their relative
responsibility for the case.
Use of such a ``soft'' competitive algorithm is shown to give better
performance than the more traditional algorithms, with little additional cost.
We then consider a supervised modular architecture in which a number
of simple ``expert'' networks compete to solve distinct pieces of a large
task. A soft competitive mechanism is used to determine how much an expert
learns on a case, based on how well the expert performs relative to the other
expert networks. At the same time, a separate gating network learns to weight
the output of each expert according to a prediction of its relative performance
based on the input to the system.
Experiments on a number of tasks illustrate that this architecture is capable
of uncovering interesting task decompositions and of generalizing better than a
single network with small training sets.
Finally, we consider learning algorithms in which we assume that the actual
output of the network should fall into one of a small number of classes or
clusters. The objective of learning is to make the variance of these classes as
small as possible.
In the classical decision-directed algorithm, we decide that an
output belongs to the class it is closest to and minimize the squared distance
between the output and the center (mean) of this closest class. In the
``soft'' version of this algorithm, we minimize the squared
distance between the actual output and a weighted average of the means of all
of the classes. The weighting factors are the relative probability that
the output belongs to each class. This idea may also be used to model the
weights of a network, to produce networks which generalize better from small
training sets.
-------------------------------------------------------------------------------
Unfortunately there is NOT an electronic version of this TR. Copies may be
ordered by sending a request for TR CMU-CS-91-126 to:
Computer Science Documentation
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA
15213
USA
There will be a charge of $10.00 U.S. for orders from the U.S., Canada or
Mexico and $15.00 U.S. for overseas orders to cover copying and mailing
costs (the TR is 314 pages in length). Checks and money orders should
be made payable to Carnegie Mellon University. Note that if your institution
is part of the Carnegie Mellon Technical Report Exchange Program there will
be NO charge for this TR.
REQUESTS SENT DIRECTLY TO MY E-MAIL ADDRESS WILL BE FILED IN /dev/null.
- Steve
(P.S. Please note my new e-mail address is nowlan at helmholtz.sdsc.edu).
------- End of Forwarded Message
More information about the Connectionists
mailing list