oh boy, more tech reports...

Wed Jan 18 16:19:46 EST 1989

Please e-mail requests to "kate at boulder.colorado.edu".

            Skeletonization:  A Technique for Trimming the Fat
                  from a Network via Relevance Assessment

                             Michael C. Mozer
                              Paul Smolensky

                          University of Colorado
                      Department of Computer Science
                        Tech Report # CU-CS-421-89

This paper proposes a means of using the knowledge in a network  to  deter-
mine  the  functionality  or  _relevance_ of individual units, both for the
purpose of understanding the network's behavior and improving  its  perfor-
mance.   The  basic  idea  is to iteratively train the network to a certain
performance criterion, compute a measure of relevance that identifies which
input  or  hidden units are most critical to performance, and automatically
trim the least relevant units.  This  _skeletonization_  technique  can  be
used to simplify networks by eliminating units that convey redundant infor-
mation; to improve learning performance by first learning with spare hidden
units  and  then  trimming  the unnecessary ones away, thereby constraining
generalization; and to understand the behavior  of  networks  in  terms  of
minimal "rules."

[An abridged version of this TR will appear in NIPS proceedings.]

---------------------------------------------------------------------------

And while I'm at it, some other recent junk, I mean stuff...

                   A Focused Back-Propagation Algorithm
                     for Temporal Pattern Recognition

                             Michael C. Mozer

                           University of Toronto
                       Connectionist Research Group
                         Tech Report # CRG-TR-88-3

Time is at the heart of many pattern recognition tasks, e.g., speech recog-
nition.   However,  connectionist learning algorithms to date are not well-
suited for dealing with time-varying input patterns.  This paper introduces
a  specialized  connectionist architecture and corresponding specialization
of the back-propagation learning algorithm  that  operates  efficiently  on
temporal  sequences.   The  key  feature  of the architecture is a layer of
self-connected hidden units that integrate their current value with the new
input  at  each  time step to construct a static representation of the tem-
poral input sequence.  This architecture avoids two deficiencies  found  in
other  models of sequence recognition:  first, it reduces the difficulty of
temporal credit assignment by focusing the back  propagated  error  signal;
second,  it  eliminates  the  need  for a buffer to hold the input sequence
and/or intermediate activity levels.  The latter property  is  due  to  the
fact  that  during  the  forward  (activation)  phase, incremental activity
_traces_ can be locally computed that hold all  information  necessary  for
back propagation in time.  It is argued that this architecture should scale
better than conventional recurrent architectures with respect  to  sequence
length.   The architecture has been used to implement a temporal version of
Rumelhart and McClelland's verb past-tense model.  The hidden  units  learn
to  behave something like Rumelhart and McClelland's "Wickelphones," a rich
and flexible representation of temporal information.

---------------------------------------------------------------------------

     A Connectionist Model of Selective Attention in Visual Perception

                             Michael C. Mozer

                           University of Toronto
                       Connectionist Research Group
                         Tech Report # CRG-TR-88-4

This paper describes a model of selective attention that is part of a  con-
nectionist  object  recognition system called MORSEL.  MORSEL is capable of
identifying multiple objects presented simultaneously on its "retina,"  but
because  of  capacity  limitations, MORSEL requires attention to prevent it
from trying to do too much at once.  Attentional selection is performed  by
a  network  of  simple  computing units that constructs a variable-diameter
"spotlight"  on  the  retina,  allowing  sensory  information  within   the
spotlight  to be preferentially processed.  Simulations of the model demon-
strate that attention is more critical for less familiar items and that at-
tention  can  be  used  to reduce inter-item crosstalk.  The model suggests
four distinct roles of attention in visual information processing, as  well
as  a  novel view of attentional selection that has characteristics of both
early and late selection theories.