From eric at mcc.com  Mon Jan  2 17:02:06 1989
From: eric at mcc.com (Eric Hartman)
Date: Mon, 2 Jan 89 16:02:06 CST
Subject: Tech Report Announcement
Message-ID: <8901022202.AA11713@legendre.aca.mcc.com>


The following MCC Technical Report is now available.

Requests may be sent to

 eric at mcc.com 

or

 Eric Hartman
 Microelectronics and Computer Technology Corporation
 3500 West Balcones Center Drive
 Austin, TX 78759-6509
 U.S.A.

------------------------------------------------------------------------

      Explorations of the Mean Field Theory Learning Algorithm

                Carsten Peterson* and Eric Hartman 

        Microelectronics and Computer Technology Corporation
                 3500 West Balcones Center Drive
                      Austin, TX 78759-6509

            MCC Technical Report Number: ACA-ST/HI-065-88

                            Abstract:

The mean field theory (MFT) learning algorithm is elaborated and
explored with respect to a variety of tasks. MFT is benchmarked against
the back propagation learning algorithm (BP) on two different feature
recognition problems: two-dimensional mirror symmetry and eight-dimensional
statistical pattern classification. We find that while the two algorithms
are very similar with respect to generalization properties,  MFT normally 
requires a substantially smaller number of training epochs than BP.
Since the MFT  model is bidirectional, rather than feed-forward, its use
can be extended naturally from purely functional mappings to a content
addressable memory.  A network with N visible and N hidden units
can store up to approximately 2N patterns with good content-addressability.  
We stress an implementational advantage for MFT: it is natural for VLSI 
circuitry. Also, its inherent parallelism can be exploited with fully 
synchronous updating, allowing efficient simulations on SIMD architectures.

*Present Address: Department of Theoretical Physics 
                  University of Lund 
                  Solvegatan 14A, S-22362 Lund, Sweden 


From Scott.Fahlman at B.GP.CS.CMU.EDU  Mon Jan  2 21:57:10 1989
From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU)
Date: Mon, 02 Jan 89 21:57:10 EST
Subject: Benchmarks mailing list
Message-ID: <mailman.75.1149540181.24850.connectionists@cs.cmu.edu>


Two or three weeks ago I sent a message to this mailing list announcing our
intention to set up at CMU a collection of learning benchmarks, accessible
via FTP from the Arpanet.  The hope is that this collection will help the
research community in our joint effort to characterize the speed and
quality of various learning algorithms on a variety of different learning
tasks.  There were a few problems in getting the new mailing lists set up
over the holidays, but I believe we're now ready to proceed.

I anticipate that there will be considerable discussion about the
usefulness of various benchmarks, how they should be run, results, etc.
Rather than clog the "connectionists" mailing list with these
benchmark-related messages, we have set up a new mailing list whose Arpanet
address is "nn-bench at cs.cmu.edu".  If you want to be added to this mailing
list, send an "add me" message to "nn-bench-request at cs.cmu.edu".  Please
include a valid netmail address that we can reach from the Arpanet.  If
messages to the address you give start bouncing, we'll have to delete you
from the list.

The "nn-bench-request" address is also the proper destination for "delete
me" requests, address changes, and other messages intended only for the
mailing list and data base maintainers.  Please do not send such messages
to "nn-bench" -- you will inconvenience a lot of people and make yourself
look like a fool.  At present, the mailing list maintainers are Michael
Witbrock and me.

If you just want to access the benchmark collection and not participate in
the related discussions, you don't have to join the "nn-bench" mailing
list.  Once there is a useful collection of files in one place, I will tell
people on the "connectionists" mailing list how to access them.

I suggest we wait until January 15 or so before we start discussing
substantive issues on the "nn-bench" list.  This will give people time to
join the mailing list before the fun begins.  We will archive old messages
for those who join later.

-- Scott Fahlman, CMU

From kruschke at cogsci.berkeley.edu  Tue Jan  3 03:30:12 1989
From: kruschke at cogsci.berkeley.edu (John Kruschke)
Date: Tue, 3 Jan 89 00:30:12 PST
Subject: No subject
Message-ID: <8901030830.AA09915@cogsci.berkeley.edu>


Here is the compilation of responses to my request for info on 
weight decay. 

I have kept editing to a minimum, so you can see exactly what the
author of the reply said. Where appropriate, I have included some
comments of my own, set off in square brackets.  The responses are
arranged into three broad topics: (1) Boltzmann-machine related; 
(2) back-prop related; (3) psychology related. 

Thanks to all, and happy new year!  --John


-----------------------------------------------------------------

ORIGINAL REQUEST:
 
I'm interested in all the information I can get regarding
WEIGHT DECAY in back-prop, or in other learning algorithms.

*In return* I'll collate all the info contributed and send the
complilation out to all contributors.

Info might include the following:

  REFERENCES:
  - Applications which used weight decay
  - Theoretical treatments
  Please be as complete as possible in your citation.

  FIRST-HAND EXPERIENCE
  - Application domain, details of I/O patterns, etc.
  - exact decay procedure used, and results

(Please send info directly to me: kruschke at cogsci.berkeley.edu
 Don't use the reply command.)

T H A N K S !  --John Kruschke.


-----------------------------------------------------------------
 
From:  Geoffrey Hinton <hinton at ai.toronto.edu>
Date:  Sun, 4 Dec 88 13:57:45 EST


Weight-decay is a version of what statisticians call "Ridge
Regression". 

We used weight-decay in Boltzmann machines to keep the energy barriers
small. This is described in section 6.1 of: 
   Hinton, G. E., Sejnowski, T. J., and Ackley, D. H. (1984)
   Boltzmann Machines: Constraint satisfaction networks that learn.
   Technical Report CMU-CS-84-119, Carnegie-Mellon University.

I used weight decay in the family trees example.  Weight decay was
used to improve generalization and to make the weights easier to
interpret (because, at equilibrium, the magnitude of a weight = 
its usefulness).  This is in: 
   Rumelhart, D.~E., Hinton, G.~E., and Williams, R.~J. (1986)  
   Learning representations by back-propagating errors.
   {\it Nature}, {\bf 323}, 533--536.

I used weight decay to achieve better generalization in a hard
generalization task that is reported in: 
   Hinton, G.~E. (1987)  
   Learning translation invariant recognition in a massively 
   parallel network.  In Goos, G. and Hartmanis, J., editors, 
   {\it PARLE: Parallel Architectures and Languages Europe}, 
   pages~1--13, Lecture Notes in Computer Science,  
   Springer-Verlag, Berlin.


Weight-decay can also be used to keep "fast" weights small.  The fast
weights act as a temporary context.  One use of such a context is
described in: 
   Hinton, G.~E. and Plaut, D.~C. (1987)  
   Using fast weights to deblur old memories.
   {\it Proceedings of the Ninth Annual Conference of the 
   Cognitive Science Society}, Seattle, WA.

--Geoff


-----------------------------------------------------------------
 
[In his lecture at the International Computer Science Institute, 
 Berkeley CA, on 16-DEC-88, Geoff also mentioned that weight decay is
 good for wiping out the initial values of weights so that only the 
 effects of learning remain.  

 In particular, if the change (due to learning) on two weights is the
 same for all updates, then the two weights converge to the same
 value. This is one  way to generate symmetric weights from
 non-symmetric starting values. 

 --John] 


-----------------------------------------------------------------
 
From:  Michael.Franzini at SPEECH2.CS.CMU.EDU
Date:  Sun, 4 Dec 1988 23:24-EST 

My first-hand experience confirms what I'm sure many other people have
told you: that (in general) weight decay in backprop increases
generalization. I've found that it's particulary important for small
training sets, and its effect diminishes as the training set size
increases. 

Weight decay was first used by Barak Pearlmutter.  The first mention
of weight decay is, I believe, in an early paper of Hinton's (possibly
the Plaut, Nowlan, and Hinton CMU CS tech report), and it is
attributed to "Barak Pearlmutter, Personal Communication" there. 

The version of weight decay that (i'm fairly sure) all of us at CMU
use is one in which each weight is multiplied by 0.999 every epoch.
Scott Fahlman has a more complicated version, which is described in
his QUICKPROP tech report. [QuickProp is also described in his paper
in the Proceedings of the 1988 Connectionist Models Summer School,
published by Morgan Kaufmann. --John] 

The main motivation for using it is to eliminate spurious large
weights which happen not to interfere with recognition of training
data but would interfere with recognizing testing data.  (This was
Barak's motivation for trying it in the first place.) However, I have
heard more theoretical justifications (which, unfortunately, I can't
reproduce.) 

In case Barak didn't reply to your message, you might want to contact
him directly at bap at cs.cmu.edu. 

--Mike


-----------------------------------------------------------------
 
From:  Barak.Pearlmutter at F.GP.CS.CMU.EDU
Date:  8 Dec 1988 16:36-EST 


We first used weight decay as a way to keep weights in a boltzmann
machine from growing too large.  We added a term to the thing being
minimized, G, so that 

    G' = G + 1/2 h \sum_{i<j} w_{ij}^2

where G' is our new thing to minimize.  This gives

    \partial G'/\partial w_{ij} = \partial G/\partial w_{ij} + h w_{ij}

which is just weight decay with some mathematical motivation.  As Mike
mentioned, I was the person who thought of weight decay in this
context (in the shower no less), but parameter decay has been used
forever, in adaptive control for example. 

It sort of worked okay for Boltzmann machines, but works much better
in backpropagation.  As a historic note I should mention that there
were some competing techniques for keeping weights small in Boltzmann
machines, such as Mark Derthick's "differential glommetry" in which
the effective target termperature of the wake phase is higher than
that of the sleep phase.  I don't know if there is an analogue for
this in backpropagation, but there certainly is for mean field theory
networks. 

Getting back weight decay, it was noted immediately that G has the
unit "bits" while $w_{ij}^2$ has the unit "weight^2", sort of a
problem from a dimensional analysis point of view.  Solving this
conundrum, Rick Szeliski pointed out that if we're going to transmit
our weights by telephone and know a-priori that weights have gaussian
distributions, so 

    P(w_{ij}=x) \propto e^{-1/2 h x^2}

where h is set to get the correct variance, then transmitting a weight
w will take $-1/2 h w^2$ bits, which we can add to G with dimensional
confidence. 

Of course, this argument extends to fast/slow split weights nicely; the
other guy already knows the slow weights, so we need transmit only the
fast weights.

By "ridge regression" I guess Geoff means that valleys in weight space
that cause the weights to grow asymptotically are made to tilt up after
a while, so that the asymptotic tailing off is eliminated.  It's like
adding a bowl to weight space, so minima have to be within the bowl.

An interesting side effect of weight decay is that, once we get to a
minimum, so $\partial G'/\partial w = 0$, then

    w_{ij} \propto - \partial G/\partial w_{ij}

so we can do a sort of eyeball significance analysis, since a weight's
magnitiude is proportaional to how sensitive the error is to changing
it.


-----------------------------------------------------------------
 
From:  russ%yummy at gateway.mitre.org (Russell Leighton)
Date:  Mon, 5 Dec 88 09:17:56 EST


We always use weight decay in backprop. It is partiuclarly important
in escaping local minima. Decay moves the transfer function from all
of the semi-linear (sigmoidal) nodes toward the linear region. The
important point is that all nodes move proportionally so no
information in the weights is "erased" but only scaled. When the nodes
that have trapped the system in the local minima are scaled enough,
the system moves onto a different trajectory through weight space.
Oscilations are still possible, but are less likely. 

We use decay with a process we call "shaping" (see Wieland and
Leighton, "Shaping Schedules as a Method for Accelerating Leanring",
Abstracts of the First Annual INNS Meeting, Boston, 1988) that we use
to speed learning of some difficult problems. 


ARPA: russ%yummy at gateway.mitre.org

Russell Leighton
MITRE Signal Processing Lab
7525 Colshire Dr.
McLean, Va. 22102
USA


-----------------------------------------------------------------
 
From:  James Arthur Pittman <hi.pittman at MCC.COM>
Date:  Tue, 6 Dec 88 09:34 CST

Probably he will respond to you himself, but Alex Weiland of MITRE
presented a paper at INNS in Boston on shaping, in which the order of
presentation of examples in training a back-prop net was altered to
reflect a simpler rule at first.  Over a number of epochs he gradually
changed the examples to slowly change the rule to the one desired. The
nets learned much faster than if he just tossed the examples at the
net in random order.  He told me that it would not work without weight
decay.  He said their rule-of-thumb was the decay should give the
weights a half-life of 2 to 3 dozen epochs (usually a value such as
0.9998). But I neglected to ask him if he felt that the number of
epochs or the number of presentations was important.  Perhaps if one
had a significantly different training set size, that rule-of-thumb
would be different? 

I have started some experiments simular to his shaping, using some
random variation of the training data (where the random variation
grows over time). Weiland also discussed this in his talk.  I haven't
yet compared decay with no-decay.  I did try (as a lark) using decay
with a regular (non-shaping) training, and it did worse than we
usually get (on same data and same network type/size/shape).  Perhaps
I was using a stupid decay value (0.9998 I think) for that situation. 

I hope to get back to this, but at the moment we are preparing for a
software release to our shareholders (MCC is owned by 20 or so
computer industry corporations).  In the next several weeks a lot of
people will go on Christmas vacation, so I will be able to run a bunch
of nets all at once. They call me the machine vulture. 


-----------------------------------------------------------------
 
From:  Tony Robinson <ajr at digsys.engineering.cambridge.ac.uk>
Date:  Sat, 3 Dec 88 11:10:20 GMT

Just a quick note in reply to your message to `connectionists' to say
that I have tried to use weight decay with back-prop on networks with
order 24 i/p, 24 hidden, 11 o/p units.  The problem was vowel
recognition (I think), it was about 18 months ago, and the problem was
of the unsolvable type (i.e. non-zero final energy). 

My conclusion was that weight decay only made matters worse, and my
justification (to myself) for abandoning weight decay was that you are
not even pretending to do gradient descent any more, and any good
solution formed quickly becomes garbaged by scaling the weights. 

If you want to avoid hidden units sticking on their limiting values,
why not use hidden units with no limiting values, for instance I find
the activation function f(x) = x * x works better than f(x) = 1.0 /
(1.0 + exp(- x)) anyway. 

Sorry I havn't got anything formal to offer, but I hope these notes
help. 

Tony Robinson.


-----------------------------------------------------------------
 
From: jose at tractatus.bellcore.com (Stephen J Hanson)
Date: Sat, 3 Dec 88 11:54:02 EST

Actually, "costs" or "penalty" functions are probably better terms. We
had a poster last week at NIPS that discussed some of the pitfalls and
advantages of two kinds of costs.   I can send you the paper when we
have a version available. 

Stephen J. Hanson (jose at bellcore.com)


-----------------------------------------------------------------
 
[ In a conversation in his office on 06-DEC-88, Dave Rumelhart
described to me several cost functions he has tried. 

The motive for the functions he has tried is different from the motive
for standard weight decay. Standard weight decay, 

\sum_{i,j} w_{i,j}^2 ,

is used to *distribute* weights more evenly over the given
connections, thereby increasing robustness (cf. earlier replies). 

He has tried several other cost functions in an attempt to *localize*,
or concentrate, the weights on a small subset of the given
connections.  The goal is to improve generalization.  His favorite is 

\sum_{i,j} ( w_{i,j}^2 / ( K + w_{i,j}^2 ) )

where K is a constant, around 1 or 2.  Note that this function is 
negatively accelerating, whereas standard weight decay is positively 
accelerating.  This function penalizes small weights (proportionally) 
more than large weights, just the opposite of standard weight decay.

He has also tried, with less satisfying results,

\sum ( 1 -  \exp - (\alpha w_{i,j}^2) )

and

\sum \ln ( K + w_{i,j}^2 ).

Finally, he has tried a cost function designed to make all the fan-in 
weights of a single unit decay, when possible.  That is, the unit is
effectively cut out of the network.  The function is

\sum_i  (\sum_j w_{i,j}^2) / ( K +  \sum_j w_{i,j}^2 ).

Each weight is thereby penalized (inversely) proportionally to the
total fan-in weight of its node. 

--John ]


-----------------------------------------------------------------
 
[ This is also a relevant place to mention my paper in the Proceedings
of the 1988 Connectionist Models Summer School, "Creating local and
distributed bottlenecks in back-propagation networks". I have since
developed those ideas, and have expressed the localized bottleneck
method as gradient descent on an additional cost term.  The cost term
is quite general, and some forms of decay are simply special cases of
it.  --John] 


-----------------------------------------------------------------
 
From: john moody <moody-john at YALE.ARPA>
Date: Sun, 11 Dec 88 22:54:11 EST

Scalettar and Zee did some interesting work on weight decay with back prop
for associative memory. They found that a Unary Representation emerged (see
Baum, Moody, and Wilczek; Bio Cybernetics Aug or Sept 88 for info on Unary
Reps). Contact Tony Zee at UCSB (805)961-4111 for info on weight decay paper.

--John Moody


-----------------------------------------------------------------
 
From: gluck at psych.Stanford.EDU (Mark Gluck)
Date: Sat, 10 Dec 88 16:51:29 PST

I'd appreciate a copy of your weight decay collation. I have a paper in MS
form which illustrates how adding weight decay to the linear-LMS one-layer
net improves its ability to predict human generalization in classification
learning. 

mark gluck
dept of psych
stanford univ,
stanford, ca 94305


-----------------------------------------------------------------
 
From:  INAM000 <INAM%MCGILLB.bitnet at jade.berkeley.edu>  (Tony Marley) 
Date:  SUN 04 DEC 1988 11:16:00 EST

I have been exploring some ideas re COMPETITIVE LEARNING with "noisy
weights" in modeling simple psychophysics.  The task is the classical
one of identifying one of N signals by a simple (verbal) response
-e.g. the stimuli might be squares of different sizes, and one has to
identify the presented one by  saying the appropriate integer.  We
know from classical experiments that people cannot perform this task
perfectly once N gets larger than about 7, but performance degrades
smoothly for larger N. 

I have been developing simulations where the mapping is learnt by
competitive learning, with the weights decaying/varying over time when
they are not reset by relevant inputs.  I have not got too many
results to date, as I have been taking the psychological data
seriously, which means worrying about reaction times, sequential
effects, "end effects" (stimuli at the end of the range more
accurately identified), range effects (increasing the stimulus range
has little effect), etc.. 

Tony Marley


-----------------------------------------------------------------
 
From:  aboulanger at bbn.com  (Albert Boulanger)
Date:  Fri, 2 Dec 88 19:43:14 EST

This one concerns the Hopfield model.  In
    James D Keeler,
    "Basin of Attraction of Neural Network Models", 
    Snowbird Conference Proceedings (1986), 259-264,
it is shown that the basins of attraction become very complicated as
the number of stored patterns increase. He uses a weight modification
method called "unlearning" to smooth out these basins.


Albert Boulanger
BBN Systems & Technologies Corp.
aboulanger at bbn.com


-----------------------------------------------------------------
 
From:  Joerg Kindermann <unido!gmdzi!joerg at uunet.UU.NET>
Date:  Mon, 5 Dec 88 08:21:03 -0100

We used a form of weight decay not for learning but for recall in
multilayer feedforward networks. See the following abstract. Input
patterns are treated as ``weights'' coming from a constant valued
external unit. 

If you would like a copy of the technical report, please send e-mail to
  joerg at gmdzi.uucp
or write to:
  Dr. Joerg Kindermann
  Gesellschaft fuer Mathematik und Datenverarbeitung
  Schloss Birlinghoven
  Postfach 1240
  D-5205 St. Augustin 1
  WEST GERMANY

     Detection of Minimal Microfeatures by Internal Feedback
                    J. Kindermann & A. Linden
                                 
                             Abstract
                                   
We define the notion of minimal microfeatures and introduce a new
method of internal feedback for multilayer networks. Error signals are
used to modify the input of a net. When combined with input DECAY,
internal feedback allows the detection of sets of minimal
microfeatures, i.e. those subpatterns which the network actually uses
for discrimination. Additional noise on the training data increases
the number of minimal microfeatures for a given pattern. The detection
of minimal microfeatures is a first step towards a subsymbolic system
with the capability of self-explanation. The paper provides examples
from the domain of letter recognition.


-----------------------------------------------------------------
 
From:  Helen M. Gigley <hgigley at note.nsf.gov>
Date:  Mon, 05 Dec 88 11:03:23 -0500


I am responding to your request even though my use of decay is not
with respect to learning in connectionist-like models. My focus has
been on a functioning system that can be lesioned. 

One question I have is what is the behavioral association to weight
decay? What aspects of learning is it intended to reflect.  I can
understand that activity decay over time of each cell is meaningful
and reflects a cellular property, but what is weight decay in
comparable terms? 

Now, I will send you offprints if you would like of my work and am
including a list of several publications which you may be able to
peruse.  The model, HOPE, is a hand-tuned structural connectionist
model that is designed to enable lesioning without redesign or
reprogramming to study possible processing causes of aphasia.  Decay
factors as an integral part of dynamic time-dependent processes are
one of several aspects of processing in a neural environment which
potentially affect the global processing results even though they are
defined only locally.  If I can be of any additional help please let
me know. 

Helen Gigley 


References:

Gigley, H.M. Neurolinguistically Constrained Simulation of Sentence
Comprehension:  Integrating Artificial Intelligence and Brain Theorym
Ph.D. Dissertation, UMass/Amherst, 1982.  Available University
Microfilms, Ann Arbor, MI. 

Gigley, H.M.  HOPE--AI and the dynamic process of language behavior. 
in Cognition and Brain Theory 6(1) :39-88, 1983. 

Gigley, H.M.  Grammar viewed as a functioning part of of a cognitive
system. Proceedings of ACL 23rd Annual Meeting, Chicago, 1985 . 

Gigley, H.M.  Computational Neurolinguistics -- What is it all about? 
in IJCAI Proceedings, Los Angeles, 1985. 

Gigley, H.M.  Studies in Artificial Aphasia--experiments in processing
change.  In Journal of Computer Methods and Programs in Biomedicine,
22 (1): 43-50, 1986. 

Gigley, H.M.  Process Synchronization, Lexical Ambiguity Resolution,
and Aphasia.  In Steven L. Small, Garrison Cottrell, and Michael
Tanenhaus (eds.)  Lexical Ambiguity Resolution, Morgen Kaumann, 1988. 


-----------------------------------------------------------------

From:  bharucha at eleazar.Dartmouth.EDU (Jamshed Bharucha)
Date:  Tue, 13 Dec 88 16:56:00 EST

I haven't tried weight decay but am curious about it. I am working on
back-prop learning of musical sequences using a Jordan-style net. The
network develops a musical schema after learning lots of sequences
that have culture-specific regularities. I.e., it learns to generate
expectancies for tones following a sequential context. I'm interested
in knowing how to implement forgetting, whether short term or long
term. 

Jamshed.

-----------------------------------------------------------------

From will at ida.org  Tue Jan  3 10:50:14 1989
From: will at ida.org (Craig Will)
Date: Tue, 3 Jan 89 10:50:14 EST
Subject: Copies of DARPA Request for Proposals Available
Message-ID: <8901031550.AA16284@csed-1>


      Copies of DARPA Request for Proposals Available


     Copies of the DARPA Neural Network Request  for  Propo-
sals  are  now  available  (free) upon request.  This is the
same text as that published  December  16  in  the  Commerce
Business  Daily,  but  reformatted  and with bigger type for
easier reading.  This version was sent as a 4-page  "Special
supplementary issue" to subscribers of Neural Network Review
in the United States.

     To get a copy  mailed  to  you,  send  your  US  postal
address to either:

                       Michele Clouse
                  clouse at ida.org (milnet)

or:
                   Neural Network Review
                       P. O. Box 427
                   Dunn Loring, VA  22027


From harnad at Princeton.EDU  Wed Jan  4 10:12:06 1989
From: harnad at Princeton.EDU (Stevan Harnad)
Date: Wed, 4 Jan 89 10:12:06 EST
Subject: Connectionist Concepts: BBS Call for Commentators
Message-ID: <8901041512.AA11296@psycho.Princeton.EDU>

Below is the abstract of a forthcoming target article to appear in
Behavioral and Brain Sciences (BBS), an international,
interdisciplinary journal that provides Open Peer Commentary on important
and controversial current research in the biobehavioral and cognitive
sciences. Commentators must be current BBS Associates or nominated by a 
current BBS Associate. To be considered as a commentator on this article,
to suggest other appropriate commentators, or for information about how
to become a BBS Associate, please send email to:
	 harnad at confidence.princeton.edu              or write to:
BBS, 20 Nassau Street, #240, Princeton NJ 08542  [tel: 609-921-7771]
____________________________________________________________________

        THE CONNECTIONIST CONSTRUCTION OF CONCEPTS

	Adrian Cussins, New College, Oxford


Keywords: connectionism, representation, cognition, perception,
nonconceptual content, concepts, learning, objectivity, semantics

Computational modelling of cognition depends on an underlying theory
of representation. Classical cognitive science has exploited the
syntax/semantics theory of representation derived from formal
logic. As a consequence, the kind of psychological explanation
supported by classical cognitive science is "conceptualist":
psychological phenomena are modelled in terms of relations between
concepts and between the sensors/effectors and concepts. This kind of
explanation is inappropriate according to Smolensky's "Proper
Treatment of Connectionism" [BBS 11(1) 1988]. Is there an alternative
theory of representation that retains the advantages of classical
theory but does not force psychological explanation into the
conceptualist mold? I outline such an alternative by introducing an
experience-based notion of nonconceptual content and by showing how a
complex construction out of nonconceptual content can satisfy
classical constraints on cognition. Cognitive structure is not
interconceptual but intraconceptual. The theory of representational
structure within concepts allows psychological phenomena to be
explained as the progressive emergence of objectivity. This can be
modelled computationally by transformations of nonconceptual content
which progressively decrease its perspective-dependence through the
formation of a cognitive map.

Stevan Harnad ARPA/INTERNET harnad at confidence.princeton.edu harnad at princeton.edu
harnad at mind.princeton.edu   srh at flash.bellcore.com   harnad at elbereth.rutgers.edu
CSNET:    harnad%mind.princeton.edu at relay.cs.net     UUCP: harnad at princeton.uucp
BITNET:   harnad at pucc.bitnet   harnad1 at umass.bitnet        Phone: (609)-921-7771

From will at ida.org  Wed Jan  4 10:59:54 1989
From: will at ida.org (Craig Will)
Date: Wed, 4 Jan 89 10:59:54 EST
Subject: Copies of DARPA Req for Prop Available
Message-ID: <8901041559.AA13970@csed-1>


      Copies of DARPA Request for Proposals Available


     Copies of the DARPA Neural Network Request  for  Propo-
sals  are  now  available  (free) upon request.  This is the
same text as that published  December  16  in  the  Commerce
Business  Daily,  but  reformatted  and with bigger type for
easier reading.  This version was sent as a 4-page  "Special
supplementary issue" to subscribers of Neural Network Review
in the United States.

     To get a copy  mailed  to  you,  send  your  US  postal
address to either:

                       Michele Clouse
                  clouse at ida.org (milnet)

or:
                   Neural Network Review
                       P. O. Box 427
                   Dunn Loring, VA  22027


From harnad at Princeton.EDU  Wed Jan  4 10:18:00 1989
From: harnad at Princeton.EDU (Stevan Harnad)
Date: Wed, 4 Jan 89 10:18:00 EST
Subject: Speech Perception: BBS Multiple Book Review
Message-ID: <8901041518.AA11306@psycho.Princeton.EDU>

Below is the abstract of a book that will be multiply reviewed in
Behavioral and Brain Sciences (BBS), an international,
interdisciplinary journal that provides Open Peer Commentary on important
and controversial current research in the biobehavioral and cognitive
sciences. Reviewers must be current BBS Associates or nominated by a 
current BBS Associate. To be considered as a reviewer for this book,
to suggest other appropriate reviewers, or for information about how
to become a BBS Associate, please send email to:
	 harnad at confidence.princeton.edu              or write to:
BBS, 20 Nassau Street, #240, Princeton NJ 08542  [tel: 609-921-7771]
____________________________________________________________________

              BBS Multiple Book review of:
SPEECH PERCEPTION BY EAR AND EYE: A PARADIGM FOR PSYCHOLOGICAL INQUIRY
        (Hillsdale NJ: LE Erlbaum Associates 1987)
     
              Dominic William Massaro
            Program in Experimental Psychology
           University of California, Santa Cruz

Keywords: speech perception; vision; audition; categorical perception;
connectionist models; fuzzy logic; sensory impairment; decision making

This book is about the processing of information, particularly in
face-to-face spoken communication where both audible and visible
information are available. Experimental tasks were designed to
manipulate many of these sources of information independently and to
test mathematical fuzzy logical and other models of performance and the
underlying stages of information processing. Multiple sources of
information are evaluated and integrated to achieve speech perception.
Graded information seems to be derived about the degree to which an
input fits a given category rather than just all-or-none categorical
information. Sources of information are evaluated independently, with
the integration process insuring that the least ambiguous sources have
the most impact on the judgment. The processes underlying
speech-perception also occur in a variety of other behaviors, ranging
from categorization to sentence interpretation, decision making and
forming impressions about people.
-----
Stevan Harnad      INTERNET harnad at confidence.princeton.edu harnad at princeton.edu
harnad at mind.princeton.edu   srh at flash.bellcore.com   harnad at elbereth.rutgers.edu
CSNET:    harnad%mind.princeton.edu at relay.cs.net     UUCP: harnad at princeton.uucp
BITNET:   harnad at pucc.bitnet   harnad1 at umass.bitnet        Phone: (609)-921-7771

From mesard at BBN.COM  Thu Jan  5 09:37:12 1989
From: mesard at BBN.COM (mesard@BBN.COM)
Date: Thu, 05 Jan 89 09:37:12 -0500
Subject: Tech Report Announcement 
In-Reply-To: Your message of Mon, 02 Jan 89 16:02:06 -0600.
             <8901022202.AA11713@legendre.aca.mcc.com> 
Message-ID: <mailman.76.1149540182.24850.connectionists@cs.cmu.edu>

Please send me a copy of the tech report

      Explorations of the Mean Field Theory Learning Algorithm

Thanks.

Wayne Mesard            Mesard at BBN.COM
70 Fawcett St.
Cambridge, MA  02138
617-873-1878


From gluck at psych.Stanford.EDU  Thu Jan  5 10:20:17 1989
From: gluck at psych.Stanford.EDU (Mark Gluck)
Date: Thu, 5 Jan 89 07:20:17 PST
Subject: Human Learning & Connectionist Models
Message-ID: <mailman.77.1149540182.24850.connectionists@cs.cmu.edu>


I would grateful to receive information about people using
connectionist/neural-net approaches within cognitive psychology
to model human learning and memory data. Citations to published
work, information about work in progress, and copies of
reprints or preprints would be most welcome and appreciated.

Mark Gluck
Dept. of Psychology
Jordan Hall; Bldg. 420
Stanford University
Stanford, CA 94305

  (415) 725-2434
  gluck at psych.stanford.edu.


From kanderso at BBN.COM  Thu Jan  5 16:30:15 1989
From: kanderso at BBN.COM (kanderso@BBN.COM)
Date: Thu, 05 Jan 89 16:30:15 -0500
Subject: No subject
In-Reply-To: Your message of Tue, 03 Jan 89 00:30:12 -0800.
             <8901030830.AA09915@cogsci.berkeley.edu> 
Message-ID: <mailman.78.1149540182.24850.connectionists@cs.cmu.edu>


I enjoyed John's summary of weight decay, but it raised a few
questions.  Just as John did, i'll be glad to summarize the responses
to the group.

1.  <hinton at ai.toronto.edu> mentioned that " Weight-decay is a version
of what statisticians call "Ridge Regression"."  What do you mean by
"version" is is exactly the same, or just slightly?  I think i know
what Ridge Regression is, but i don't see an obvious strong
connection.  I see a weak one, and after i think about it more maybe i'll
say something about it.

The ideas behind Ridge regression probably came from Levenberg and
Marquardt who used it in nonlinear least squares:

Levenberg K., A Method for the solution of certain nonlinear problems
  in least squares, Q. Appl. Math, Vol 2, pages 164-168, 1944.

Marquardt, D.W., An algorithm for least squares estimation of
  non-linear parameters, J. Soc. Industrial and Applied Math.,
  11:431-441, 1963.

2.  John quoted Dave Rumelhart as saying that standard weight decay
distributes weights more evenly over the  given connections, thereby
increasing robustness.  Why does smearing out large weights increase
robustness?  What does robustness mean here, the ability to generalize?

k

From dreyfus at cogsci.berkeley.edu  Thu Jan  5 21:04:34 1989
From: dreyfus at cogsci.berkeley.edu (Hubert L. Dreyfus)
Date: Thu, 5 Jan 89 18:04:34 PST
Subject: Connectionist Concepts: BBS Call for Commentators
Message-ID: <8901060204.AA02484@cogsci.berkeley.edu>

Stevan:
    Stuart and I would like to write a joint comment on
Cussins' paper.  Please send us the latest version by
e-mail or regular mail whichever you prefer.
    Hubert Dreyfus

From daugman%charybdis at harvard.harvard.edu  Fri Jan  6 10:41:42 1989
From: daugman%charybdis at harvard.harvard.edu (j daugman)
Date: Fri, 6 Jan 89 10:41:42 EST
Subject: Neural Networks in Natural and Artificial Vision
Message-ID: <mailman.79.1149540182.24850.connectionists@cs.cmu.edu>


For preparation of 1989 conference tutorials and reviews,
I would be grateful to receive any available p\reprints
reporting research on neural network models of human /
biological vision and applications in artificial vision.
Thanks in advance.

John Daugman
Harvard University
950 William James Hall
Cambridge, Mass.  02138

From josh at flash.bellcore.com  Fri Jan  6 14:32:55 1989
From: josh at flash.bellcore.com (Joshua Alspector)
Date: Fri, 6 Jan 89 14:32:55 EST
Subject: VLSI Implementations of Neural Networks
Message-ID: <8901061932.AA07422@flash.bellcore.com>

I will be giving a tuturial on the above topic at the Custom Integrated
Circuits Conference.  Vu grafs are due at the end of February and I
would like to include as complete a description as possible of current
efforts in the VLSI implementation of neural networks.  I would appreciate
receiving any preprints or hard copies of vu grafs regarding any work
you are doing.  E-mail reports are also acceptable.  Please send to:


Joshua Alspector
Bellcore, MRE 2E-378
445 South St.
Morristown, NJ 07960-1910

From neural!jsd  Fri Jan  6 12:45:14 1989
From: neural!jsd (John Denker)
Date: Fri, 6 Jan 89 12:45:14 EST
Subject: confidence / runner-up activation
Message-ID: <8901061744.AA10566@neural.UUCP>

Yes, we've been using the activation level of the runner-up neurons
to provide confidence information in our character recognizer for some time.
The work was reported at the last San Diego mtg and at the last Denver mtg.
--- jsd (John Denker)

From netlist at psych.Stanford.EDU  Tue Jan 10 09:43:16 1989
From: netlist at psych.Stanford.EDU (Mark Gluck)
Date: Tue, 10 Jan 89 06:43:16 PST
Subject: Stanford Adaptive Networks Colloquium
Message-ID: <mailman.80.1149540182.24850.connectionists@cs.cmu.edu>

        Stanford University Interdisciplinary Colloquium Series:
                           ADAPTIVE NETWORKS
                        AND THEIR APPLICATIONS
  Co-sponsored by the Departments of Psychology and Electrical Engineering

                     Winter Quarter 1989 Schedule
                     ----------------------------


Jan. 12th (Thursday, 3:30pm):
-----------------------------
STEVEN PINKER                                                  CONNECTIONISM AND
Department of Brain & Cognitive Sciences             THE FACTS OF HUMAN LANGUAGE
Massachusetts Institute of Technology
  email: steve at psyche.mit.edu               (with commentary by David Rumelhart)

Jan. 24th (Tuesday, 3:30pm):
----------------------------
LARRY MALONEY                                             LEARNING BY ASSERTION:
Department of Psychology                      CALIBRATING A SIMPLE VISUAL SYSTEM
New York University
 email: ltm at xp.psych.nyu.edu

Feb. 9th (Thursday, 3:30pm):
----------------------------
CARVER MEAD                                       VLSI MODELS OF NEURAL NETWORKS
Moore Professor of Computer Science
California Institute of Technology

Feb. 21st (Tuesday, 3:30pm):
----------------------------
PIERRE BALDI                            ON SPACE AND TIME IN NEURAL COMPUTATIONS
Jet Propulsion Laboratory
California Institute of Technology
 email: pfbaldi at caltech.bitnet

Mar. 14th (Tuesday, 3:30pm):
----------------------------
ALAN LAPEDES                        NONLINEAR SIGNAL PROCESSING WITH NEURAL NETS
Theoretical Division - MS B213
Los Alamos National Laboratory
 email: asl at lanl.gov

                        Additional Information
                        ----------------------

The talks (including discussion) last about one hour and fifteen
minutes. Following each talk, there will be a reception.  Unless
otherwise noted, all talks will be held in room 380-380F, which
is in the basement of the Mathematical Sciences buildings.  To be
placed on an electronic-mail distribution list for information
about these and other adaptive network events in the Stanford
area, send email to netlist at psych.stanford.edu. For additional
information, contact: Mark Gluck, Department of Psychology,
Bldg. 420, Stanford University, Stanford, CA 94305 (phone
415-725-2434 or email to gluck at psych.stanford.edu). Program Committe:
Committee: Bernard Widrow (E.E.), David Rumelhart, Misha Pavel,
Mark Gluck (Psychology).  This series is supported by the Departments 
of Psychology and Electrical Engineering and by a gift from 
the Thomson-CSF Corporation.

    Coming this Spring: D. Parker, B. McNaughton, G. Lynch & R. Granger

From hinton at ai.toronto.edu  Tue Jan 10 10:09:11 1989
From: hinton at ai.toronto.edu (Geoffrey Hinton)
Date: Tue, 10 Jan 89 10:09:11 EST
Subject: new tech report
Message-ID: <89Jan10.100924est.10956@ephemeral.ai.toronto.edu>


The following report can be obtained by sending an email request to
carol at ai.toronto.edu   If this fails try carol%ai.toronto.edu at relay.cs.net
Please do not send email to me about it (so don't use "reply" or "answer").


"Deterministic Boltzmann Learning Performs Steepest Descent in Weight-space."
				       
			      Geoffrey E. Hinton
			Department of Computer Science
			    University of Toronto
				       
			 Technical report CRG-TR-89-1

				   ABSTRACT

The Boltzmann machine learning procedure has been successfully applied in
deterministic networks of analog units that use a mean field approximation to
efficiently simulate a truly stochastic system {Peterson and Anderson, 1987}.
This type of ``deterministic Boltzmann machine'' (DBM) learns much faster than
the equivalent ``stochastic Boltzmann machine'' (SBM), but since the learning
procedure for DBM's is only based on an analogy with SBM's, there is no
existing proof that it performs gradient descent in any function, and it has
only been justified by simulations.  By using the appropriate interpretation
for the way in which a DBM represents the probability of an output vector
given an input vector, it is shown that the DBM performs steepest descent in
the same function as the original SBM, except at rare discontinuities.  A very
simple way of forcing the weights to become symmetrical is also described, and
this makes the DBM more biologically plausible than back-propagation.


From netlist at psych.Stanford.EDU  Wed Jan 11 09:29:01 1989
From: netlist at psych.Stanford.EDU (Mark Gluck)
Date: Wed, 11 Jan 89 06:29:01 PST
Subject: Thurs (1/12): Steven Pinker on Language Models
Message-ID: <mailman.81.1149540182.24850.connectionists@cs.cmu.edu>

            Stanford University Interdisciplinary Colloquium Series:
                   Adaptive Networks and their Applications

                        Jan. 12th (Thursday, 3:30pm):
                        -----------------------------
     
********************************************************************************
    
STEVEN PINKER                                                  CONNECTIONISM AND
Department of Brain & Cognitive Sciences             THE FACTS OF HUMAN LANGUAGE
Massachusetts Institute of Technology
  email: steve at psyche.mit.edu               (with commentary by David Rumelhart)
     
********************************************************************************
     
                               Abstract
     
                                        
     Connectionist modeling holds the promise of making important
     contributions to our understanding of human language. For example,
     such models can explore the role of parallel processing, constraint
     satisfaction, neurologically realistic architectures, and efficient
     pattern-matching in linguistic processes.
     
     However, the current connectionist program of language modeling seems
     to be motivated by a different set of goals: reviving classical
     associationism, elminating levels of linguistic representation, and
     maximizing the role of top-down, knowledge-driven processing.
     
     I present evidence (developed in collaboration with Alan Prince) that
     these goals are ill-advised, because the empirical assumptions they
     make about human language are simply false.  Specifically, evidence
     from adults' and children's abilities with morphology, semantics, and
     syntax suggests that people possess formal linguistic rules and
     autonomous linguistic representations, which are not based on the
     statistical correlations among microfeatures that current
     connectionist models rely on so heavily.
     
     Moreover, I suggest that treating the existence of
     mentally-represented rules and representations as an empirical
     question will lead to greater progress than rejecting them on a priori
     methodological grounds. The data suggest that some linguistic
     processes are saliently rule-like, and call for a suitable
     symbol-processing architecture, whereas others are associative, and
     can be insightfully modeled using connectionist mechanisms. Thus
     taking the facts of human language seriously can lead to an
     interesting rapprochement between standard psycholinguistics and
     connectionist modeling.
     
     
                          Additional Information
                          ----------------------

Location: Room 380-380F, which can be reached through the lower level
       between the Psychology and Mathematical Sciences buildings. 
Technical Level: These talks will be technically oriented and are intended 
       for persons actively working in related areas. They are not intended
       for the newcomer seeking general introductory material. 
Mailing lists: To be added to the network mailing list, netmail to
       netlist at psych.stanford.edu. For additional information, or contact
       Mark Gluck (gluck at psych.stanford.edu).
     
  Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and
       Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ.
    

From unido!gmdzi!joerg at uunet.UU.NET  Thu Jan 12 04:30:50 1989
From: unido!gmdzi!joerg at uunet.UU.NET (Joerg Kindermann)
Date: Thu, 12 Jan 89 08:30:50 -0100
Subject: CALL FOR PARTICIPATION
Message-ID: <8901120730.AA03021@gmdzi.UUCP>

			  Workshop ``DANIP''

	 Distributed Adaptive Neural Information Processing.

			    24.-25.4.1989
	Gesellschaft fuer Mathematik und Datenverarbeitung mbH
			    Sankt Augustin

Neural information processing is constantly gaining increasing attention in
many scientific areas. As a consequence the first ``Workshop Konnektionismus''
at the GMD was organized in February 1988. It gave an overview of research
activities in neural networks and their applications to Artificial
Intelligence.  Now, almost a year later, the time has come to focus on the
state of neural information processing itself.

The aim of the workshop is to discuss TECHNICAL aspects of information
processing in neural networks on the basis of personal contributions in one of
the following areas:


  -  new or improved learning algorithms (including evaluations)
  -  self organization of structured (non-localist) neural networks
  -  time series analysis by means of neural networks
  -  adaptivity, e.g the problem of relearning 
  -  adequate coding of information for neural processing
  -  generalization
  -  weight interpretation (correlative and other)}

Presentations which report on ``work in progress'' are encouraged.  The size of
the workshop will be limited to 15 contributions of 30 minutes in length.  A
limited number of additional participants may attend the workshop and take part
in the discussions.

To apply for the workshop as a contributor, please send information about your
contribution (1-2 pages in English or a relevant publication).

If you want to participate without giving an oral presentation, please include
a description of your background in the field of neural networks.

Proceedings on the basis of workshop contributions will be published after the
workshop.

SCHEDULE:

28 February 1989:    deadline for submission of applications
20 March 1989:       notification of acceptance
24 - 25 April 1989:  workshop ``DANIP''
31 July 1989:        deadline for submission of full papers
                     to be included in the proceedings

Applications should be sent to the following address:

	     Dr. Joerg Kindermann   or   Alexander Linden
		     Gesellschaft fuer Mathematik
		      und Datenverarbeitung mbH
		       - Schloss Birlinghoven -
		Postfach 1240 D-5205 Sankt Augustin 1
			     WEST GERMANY

                     e-mail: joerg at gmdzi al at gmdzi


From pwh at ece-csc.ncsu.edu  Fri Jan 13 17:28:39 1989
From: pwh at ece-csc.ncsu.edu (Paul Hollis)
Date: Fri, 13 Jan 89 17:28:39 EST
Subject: No subject
Message-ID: <8901132228.AA05092@ece-csc.ncsu.edu>


                              NEURAL NETWORKS

                              CALL FOR PAPERS

             International Joint Conference on Neural Networks
                             June 18-22, 1989
                             Washington, D.C.


The 1989  IEEE/INNS  International  Joint  Conference  on  Neural  Networks
(IJCNN-89)  will  be  held  at the Sheraton Washington Hotel in Washington,
D.C., USA from June 18-22, 1989.  IJCNN-89 is the first conference in a new
series  devoted  to the technology and science of neurocomputing and neural
networks in all of their aspects. The series  replaces  the  previous  IEEE
ICNN  and  INNS  Annual Meeting series and is jointly sponsored by the IEEE
Technical Activities Board Neural Network Committee and  the  International
Neural Network Society (INNS).  IJCNN-89 will be the only major neural net-
work meeting of 1989 (IEEE ICNN-89 and the 1989 INNS  Annual  Meeting  have
both  been cancelled).  Thus, it behooves all members of the neural network
community who have important new results for presentation to prepare  their
papers now and submit them by the IJCNN-89 deadline of 1 FEBRUARY 1989. The
Conference Proceedings will be distributed AT THE REGISTRATION DESK to  all
regular  conference registrants as well as to all student registrants.  The
conference will include a day of tutorials (June 18), the exhibit hall (the
neurocomputing  industry's  primary  annual trade show), plenary talks, and
social events.  Mark your calendar today and plan to attend IJCNN-89 -- the
definitive annual progress report on the neurocomputing revolution!


DEADLINE FOR SUBMISSION OF PAPERS for IJCNN-89 is FEBRUARY 1, 1989.  Papers
of 8 pages or less are solicited in the following areas:

-Real World Applications                    -Associative Memory
-Supervised Learning Theory                 -Image Analysis
-Reinforcement Learning Theory              -Self-Organization
-Robotics and Control                       -Neurobiological Models
-Optical Neurocomputers                     -Vision
-Optimization                               -Electronic Neurocomputers
-Neural Network Architectures & Theory      -Speech Recognition


FULL PAPERS in camera-ready form (1 original on Author's Kit  forms  and  5
reduced  8  1/2" x 11" copies) should be submitted to Nomi Feldman, Confer-
ence Coordinator, at the address below. For more details,  or  to   request
your  IEEE Author's Kit, call or write:

               Nomi Feldman, IJCNN-89 Conference Coordinator
                  3770 Tansy Street, San Diego, CA  92121
                              (619) 453-6222

From rudnick at cse.ogc.edu  Sat Jan 14 18:05:27 1989
From: rudnick at cse.ogc.edu (Mike Rudnick)
Date: Sat, 14 Jan 89 15:05:27 PST
Subject: genetic search and neural nets
Message-ID: <8901142305.AA07774@ogccse.OGC.EDU>

I am a phd candidate in computer science at Oregon Graduate Center.
My research interest is in using genetic search to tackle artificial
neural network (ANN) scaling issues.  My particular orientation is to
view minimizing interconnections as a central issue, partly motivated
by VLSI implementation issues.

I am starting a mailing list for those interested in applying
genetic search to/with/for ANNs.  Mail a request to
Neuro-evolution-request at cse.ogc.edu to have your name added to the
list.

A bibliography of work relating artificial neural networks (ANNs) and
genetic search is available.  It is organized/oriented for someone
familiar with the ANN literature but unfamiliar with the genetic
search literature.  Send a request to Neuro-evolution-request at cse.ogc.edu
for a copy.  If there is sufficient interest I will post the
bibliography here.

--------------------------------------------------------------------------
Mike Rudnick			CSnet:	rudnick at cse.ogc.edu
Computer Science & Eng. Dept.	ARPAnet:  rudnick%cse.ogc.edu at relay.cs.net
Oregon Graduate Center		BITNET:  rudnick%cse.ogc.edu at relay.cs.net
19600 N.W. von Neumann Dr.	UUCP:	{tektronix,verdix}!ogccse!rudnick
Beaverton, OR. 97006-1999	(503) 690-1121 X7390
--------------------------------------------------------------------------

From sontag at fermat.rutgers.edu  Tue Jan 17 14:08:03 1989
From: sontag at fermat.rutgers.edu (sontag@fermat.rutgers.edu)
Date: Tue, 17 Jan 89 14:08:03 EST
Subject: Kolmogorov's superposition theorem
Message-ID: <8901171908.AA00964@control.rutgers.edu>

*** I am posting this for Professor Rui de Figuereido, a researcher in Control
    Theory and Circuits who does not subscribe to this list.  Please direct 
    cc's of all responses to his e-mail address (see below).
							-eduardo s. ***

   KOLMOGOROV'S SUPERPOSITION THEOREM AND ARTIFICIAL NEURAL NETWORKS

                    Rui J. P. de Figueiredo
           Dept. of Electrical and Computer Engineering
              Rice University, Houston, TX 77251-1892
                   e-mail: rui at zeta.rice.edu
                

The implementation of the Kolmogorov-Arnold-Sprecher Superposition Theorem 
[1-3] in terms of artificial neural networks was first presented and fully 
discussed by me in 1980 [4]. I also discussed, then [4], applications of
these structures to statistical pattern recognition and image and multi-
dimensional signal processing. However, I did not use the words "neural 
networks" in defining the underlying networks. For this reason, the current
researchers on neural nets including Robert Hecht-Nielsen [5] do not seem to
be aware of my contribution [4]. I hope that this note will help correct
history.

  Incidentally, there is a misprint in [4]. In [4], please insert "no" in
the statement before eqn.(4). That statement should read: "Sprecher showed
that lambda can be any nonzero number which satisfies no equation ..."

[1] A.K.Kolmogorov, "On the representation of continuous functions of several
    variables by superposition of continuous functions of one variable and 
    addition," Dokl.Akad.Nauk.SSSR,Vol.114,pp.369-373,1957.
[2] V.I.Arnol'd, "On functions of three variables," Dokl.Akad.Nauk.SSSR, 
    Vol.114,pp.953-956,1957.
[3] D.A.Sprecher, "An improvement in the superposition theorem of Kolmogorov,"
    J.Math.Anal.Appl.,Vol.38,pp.208-213,1972.
[4] Rui J.P.de Figueiredo, "Implications and applications of Kolmogorov's
    superposition theorem,"IEEE Trans.Auto.Contr.,Vol.AC-25,pp.1227-1231,1980.
[5] R.Hecht-Nielsen, "Kolmogorov's mapping neural network existence theorem,"
    IEEE 1st Int.Conf.on Neural Networks, San Diego,CA,June 21-24,1987,paper
    III-11.

From ncr-fc!avery at ncr-sd.sandiego.ncr.com  Tue Jan 17 19:43:43 1989
From: ncr-fc!avery at ncr-sd.sandiego.ncr.com (ncr-fc!avery@ncr-sd.sandiego.ncr.com)
Date: Tue, 17 Jan 89 17:43:43 MST
Subject: new address
Message-ID: <8901180043.AA19084@ncr-fc.FtCollins.NCR.com>

I have a new e-mail address. Not the one in the relpy field but this one.

avery%ncr-fc at ncr-sd.sandiego.ncr.com

Will you please get me back on the discussion group.


From MUMME%IDCSVAX.BITNET at CUNYVM.CUNY.EDU  Tue Jan 17 23:22:00 1989
From: MUMME%IDCSVAX.BITNET at CUNYVM.CUNY.EDU (MUMME%IDCSVAX.BITNET@CUNYVM.CUNY.EDU)
Date: Tue, 17 Jan 89 20:22 PST
Subject: Tech. Report Available
Message-ID: <mailman.82.1149540182.24850.connectionists@cs.cmu.edu>

The following tech. report is available from the University of Illinois
Dept. of Computer Science:


UIUCDCS-R-88-1485

STORAGE CAPACITY OF THE LINEAR ASSOCIATOR:  BEGINNINGS OF A THEORY
                   OF COMPUTATIONAL MEMORY

                             by

                        Dean C. Mumme
                          May, 1988

                           ABSTRACT


This thesis presents a characterization of a simple connectionist-system,
the linear-associator, as both a memory and a classifier.  Toward this end,
a theory of memory based on information-theory is devised.  The principles
of the information-theory of memory are then used in conjunction with the
dynamics of the linear-associator to discern its storage capacity and
classification capabilities as they scale with system size.  To determine
storage capacity, a set of M vector-pairs called "items" are stored in an
associator with N connection-weights.  The number of bits of information
stored by the system is then determined to be about (N/2)logM.  The
maximum number of items storable is found to be half the number of weights
so that the information capacity of the system is quantified to be
(N/2)logN.

Classification capability is determined by allowing vectors not stored by
the associator to appear its input.  Conditions necessary for the associator
to make a correct response are derived from constraints of information theory
and the geometry of the space of input-vectors.  Results include derivation of
the information-throughput of the associator, the amount of information that
that must be present in an input-vector and the number of vectors that can
be classified by an associator of a given size with a given storage load.

Figures of merit are obtained that allow comparison of capabilities of
general memory/classifier systems.  For an associator with a simple
non-linarity on its output, the merit figures are evaluated and shown to be
suboptimal.  Constant attention is devoted to relative parameter size required
to obtain the derived performance characteristics.  Large systems are shown
to perform nearest the optimum performance limits and suggestions are made
concerning system architecture needed for best results.  Finally, avenues for
extension of the theory to more general systems are indicated.


This tech. report is essentially my Ph.D. thesis completed last May and can
be obtained by sending e-mail to:

                erna at a.cs.uiuc.edu

Please do not send requests to me since I now live in Idaho and don't have
access to the tech. reports.

When replying to this notice, please do not use REPLY or send a note to
"CONNECTIONISTS...".   Send your request directly to Erna.

Comments, questions and suggestions about the work can be sent directly to
me at the address below.

Thank You!

Dean C. Mumme                         bitnet:  mumme at idcsvax
Dept. of Computer Science
University of Idaho
Moscow, ID 83843

From poggio at wheaties.ai.mit.edu  Tue Jan 17 22:47:17 1989
From: poggio at wheaties.ai.mit.edu (Tomaso Poggio)
Date: Tue, 17 Jan 89 22:47:17 EST
Subject: Kolmogorov's superposition theorem
In-Reply-To: sontag@fermat.rutgers.edu's message of Tue, 17 Jan 89 14:08:03 EST <8901171908.AA00964@control.rutgers.edu>
Message-ID: <8901180347.AA21088@rice-chex.ai.mit.edu>

Kolmogorov 's theorem and its relation to networks are discussed in
Biol. Cyber., 37, 167-186, 1979. (On the representation of multi-input
systems: computational properties of polynomial algorithms, Poggio and
Reichardt). There are references there to older papers (see especially
the two nice papers by H. Abelson).

From mozer%neuron at boulder.Colorado.EDU  Wed Jan 18 16:19:46 1989
From: mozer%neuron at boulder.Colorado.EDU (Michael C. Mozer)
Date: Wed, 18 Jan 89 14:19:46 MST
Subject: oh boy, more tech reports...
Message-ID: <8901182119.AA00413@neuron>


Please e-mail requests to "kate at boulder.colorado.edu".


            Skeletonization:  A Technique for Trimming the Fat
                  from a Network via Relevance Assessment

                             Michael C. Mozer
                              Paul Smolensky

                          University of Colorado
                      Department of Computer Science
                        Tech Report # CU-CS-421-89

This paper proposes a means of using the knowledge in a network  to  deter-
mine  the  functionality  or  _relevance_ of individual units, both for the
purpose of understanding the network's behavior and improving  its  perfor-
mance.   The  basic  idea  is to iteratively train the network to a certain
performance criterion, compute a measure of relevance that identifies which
input  or  hidden units are most critical to performance, and automatically
trim the least relevant units.  This  _skeletonization_  technique  can  be
used to simplify networks by eliminating units that convey redundant infor-
mation; to improve learning performance by first learning with spare hidden
units  and  then  trimming  the unnecessary ones away, thereby constraining
generalization; and to understand the behavior  of  networks  in  terms  of
minimal "rules."

[An abridged version of this TR will appear in NIPS proceedings.]

---------------------------------------------------------------------------

And while I'm at it, some other recent junk, I mean stuff...


                   A Focused Back-Propagation Algorithm
                     for Temporal Pattern Recognition

                             Michael C. Mozer

                           University of Toronto
                       Connectionist Research Group
                         Tech Report # CRG-TR-88-3

Time is at the heart of many pattern recognition tasks, e.g., speech recog-
nition.   However,  connectionist learning algorithms to date are not well-
suited for dealing with time-varying input patterns.  This paper introduces
a  specialized  connectionist architecture and corresponding specialization
of the back-propagation learning algorithm  that  operates  efficiently  on
temporal  sequences.   The  key  feature  of the architecture is a layer of
self-connected hidden units that integrate their current value with the new
input  at  each  time step to construct a static representation of the tem-
poral input sequence.  This architecture avoids two deficiencies  found  in
other  models of sequence recognition:  first, it reduces the difficulty of
temporal credit assignment by focusing the back  propagated  error  signal;
second,  it  eliminates  the  need  for a buffer to hold the input sequence
and/or intermediate activity levels.  The latter property  is  due  to  the
fact  that  during  the  forward  (activation)  phase, incremental activity
_traces_ can be locally computed that hold all  information  necessary  for
back propagation in time.  It is argued that this architecture should scale
better than conventional recurrent architectures with respect  to  sequence
length.   The architecture has been used to implement a temporal version of
Rumelhart and McClelland's verb past-tense model.  The hidden  units  learn
to  behave something like Rumelhart and McClelland's "Wickelphones," a rich
and flexible representation of temporal information.

---------------------------------------------------------------------------

     A Connectionist Model of Selective Attention in Visual Perception

                             Michael C. Mozer

                           University of Toronto
                       Connectionist Research Group
                         Tech Report # CRG-TR-88-4

This paper describes a model of selective attention that is part of a  con-
nectionist  object  recognition system called MORSEL.  MORSEL is capable of
identifying multiple objects presented simultaneously on its "retina,"  but
because  of  capacity  limitations, MORSEL requires attention to prevent it
from trying to do too much at once.  Attentional selection is performed  by
a  network  of  simple  computing units that constructs a variable-diameter
"spotlight"  on  the  retina,  allowing  sensory  information  within   the
spotlight  to be preferentially processed.  Simulations of the model demon-
strate that attention is more critical for less familiar items and that at-
tention  can  be  used  to reduce inter-item crosstalk.  The model suggests
four distinct roles of attention in visual information processing, as  well
as  a  novel view of attentional selection that has characteristics of both
early and late selection theories.

From Scott.Fahlman at B.GP.CS.CMU.EDU  Wed Jan 18 13:54:02 1989
From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU)
Date: Wed, 18 Jan 89 13:54:02 EST
Subject: Benchmark collection
Message-ID: <mailman.83.1149540182.24850.connectionists@cs.cmu.edu>


The mailing list "nn-bench at cs.cmu.edu" is now in operation.  I believe that
all the "add me" requests received prior to 1/17/89 have been serviced.  Of
course, it's possible that we messed up some of the requests.  If
you sent in a request more than a couple of days ago and if you have not
yet seen any "nn-bench" mail, please contact "nn-bench-request at cs.cmu.edu"
and we'll investigate.  New requests should be sent to that same address.
The list currently has about 80 subscribers, plus two rebroadcast sites.

-- Scott Fahlman, CMU


From pollack at cis.ohio-state.edu  Fri Jan 20 15:40:09 1989
From: pollack at cis.ohio-state.edu (Jordan B. Pollack)
Date: Fri, 20 Jan 89 15:40:09 EST
Subject: Technical Report: LAIR 89-JP-NIPS
Message-ID: <8901202040.AA13239@orange.cis.ohio-state.edu>

Preprint of a NIPS paper is now available.

Request LAIR 89-JP-NIPS From:

  Randy Miller
  CIS Dept/Ohio State University
  2036 Neil Ave
  Columbus, OH  43210

or respond to this message but MODIFY THE To: AND Cc: LINES!!!!!

------------------------------------------------------------------------------

	IMPLICATIONS OF RECURSIVE DISTRIBUTED REPRESENTATIONS

			  Jordan B. Pollack
		      Laboratory for AI Research
			Ohio State University
			  Columbus, OH 43210

I will describe my recent results on the automatic development of
fixed-width recursive distributed representations of variable-sized
hierarchal data structures.  One implication of this work is that
certain types of AI-style data-structures can now be represented in
fixed-width analog vectors.  Simple inferences can be performed using
the type of pattern associations that neural networks excel at.
Another implication arises from noting that these representations
become self-similar in the limit.  Once this door to chaos is opened,
many interesting new questions about the representational basis of
intelligence emerge, and can (and will) be discussed.


From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU  Sat Jan 21 00:06:08 1989
From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (thanasis kehagias)
Date: Sat, 21 Jan 89 00:06:08 EST
Subject: No subject
Message-ID: <mailman.84.1149540182.24850.connectionists@cs.cmu.edu>


i was reading through the abstracts of the Boston 1988 INNS conference
and noticed H. Bourlard and C. Welleken's paper on the relations between
Hidden Markov Models and Multi Layer Perceptron. Does anybody have any
pointers to papers on the subject by the same (preferrably) or other
authors? or the e-mail address of these two authors?


Thanasis Kehagias

From netlist at psych.Stanford.EDU  Sun Jan 22 18:16:33 1989
From: netlist at psych.Stanford.EDU (Mark Gluck)
Date: Sun, 22 Jan 89 15:16:33 PST
Subject: Thurs (1/12): Steven Pinker on Language Models
Message-ID: <mailman.85.1149540182.24850.connectionists@cs.cmu.edu>

            Stanford University Interdisciplinary Colloquium Series:
                   Adaptive Networks and their Applications

                        Jan. 24th (Tuesday, 3:30pm):
                        -----------------------------
     
********************************************************************************
    
        Learning by Assertion: Calibrating a Simple Visual System

                            LARRY MALONEY
                      Deptartment of Psychology
                     6 Washington Place; 8th Floor
                         New York University
                         New York, NY  10003

                     email: ltm at xp.psych.nyu.edu

********************************************************************************
     
                               Abstract

An ideal visual system is calibrated if its estimates reflect the
actual state of the scene: Straight lines, for example, should be judged 
to be straight. If an ideal visual system is modeled as a neural network,
then it is calibrated only if the weights linking elements of the
the network are assigned correct values. I describe a method 
(`Learning by Assertion') for calibrating an ideal visual system by 
adjusting the weights. The method requires no explicit feedback or prior 
knowledge concerning the contents of the environment. This work is
relevant to biological visual development and calibration, to the 
calibration of machine vision systems, and to the design of adaptive
network algorithms.

     
                          Additional Information
                          ----------------------

Location: Room 380-380F, which can be reached through the lower level
       between the Psychology and Mathematical Sciences buildings. 
Technical Level: These talks will be technically oriented and are intended 
       for persons actively working in related areas. They are not intended
       for the newcomer seeking general introductory material. 
Mailing lists: To be added to the network mailing list, netmail to
       netlist at psych.stanford.edu. For additional information, or contact
       Mark Gluck (gluck at psych.stanford.edu).
     
  Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and
       Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ.
    

From netlist at psych.Stanford.EDU  Sun Jan 22 18:23:16 1989
From: netlist at psych.Stanford.EDU (Mark Gluck)
Date: Sun, 22 Jan 89 15:23:16 PST
Subject: (Tues. 1/24): Larry Maloney on Visual Calibration
Message-ID: <mailman.86.1149540182.24850.connectionists@cs.cmu.edu>

            Stanford University Interdisciplinary Colloquium Series:
                   Adaptive Networks and their Applications

                        Jan. 24th (Tuesday, 3:30pm):
                        -----------------------------
     
********************************************************************************
    
        Learning by Assertion: Calibrating a Simple Visual System

                            LARRY MALONEY
                      Deptartment of Psychology
                     6 Washington Place; 8th Floor
                         New York University
                         New York, NY  10003

                     email: ltm at xp.psych.nyu.edu

********************************************************************************
     
                               Abstract

An ideal visual system is calibrated if its estimates reflect the
actual state of the scene: Straight lines, for example, should be judged 
to be straight. If an ideal visual system is modeled as a neural network,
then it is calibrated only if the weights linking elements of the
the network are assigned correct values. I describe a method 
(`Learning by Assertion') for calibrating an ideal visual system by 
adjusting the weights. The method requires no explicit feedback or prior 
knowledge concerning the contents of the environment. This work is
relevant to biological visual development and calibration, to the 
calibration of machine vision systems, and to the design of adaptive
network algorithms.

     
                          Additional Information
                          ----------------------

Location: Room 380-380F, which can be reached through the lower level
       between the Psychology and Mathematical Sciences buildings. 
Technical Level: These talks will be technically oriented and are intended 
       for persons actively working in related areas. They are not intended
       for the newcomer seeking general introductory material. 
Mailing lists: To be added to the network mailing list, netmail to
       netlist at psych.stanford.edu. For additional information, or contact
       Mark Gluck (gluck at psych.stanford.edu).
     
  Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and
       Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ.
    

From rsun at cs.brandeis.edu  Sun Jan 22 17:02:48 1989
From: rsun at cs.brandeis.edu (Ron Sun)
Date: Sun, 22 Jan 89 17:02:48 est
Subject: Technical Report: LAIR 89-JP-NIPS
Message-ID: <mailman.87.1149540182.24850.connectionists@cs.cmu.edu>

                           
Please send this TR to

Ron Sun
Brandeis U
CS
Waltham, MA 02254


Thank you.


From koch%HAMLET.BITNET at VMA.CC.CMU.EDU  Mon Jan 23 14:29:31 1989
From: koch%HAMLET.BITNET at VMA.CC.CMU.EDU (Christof Koch)
Date: Mon, 23 Jan 89 11:29:31 PST
Subject: Gimme a break!
Message-ID: <890123112923.20203114@Hamlet.Caltech.Edu>

re. "Call for papers IJCNN, the only major neural network meeting of 1989 [sic]"


Neural Information Processing Systems 1989 at Denver will be held this year from
November 28-th until November 30-th followed by a workshop on December 1/2.

This is the third annual meeting held under the auspices of the IEEE, Society of

Neuroscience, and APS.

For further information contact Scott Kirkpatrick, General Chairman
(kirk at ibm.com) or wait for the Call for Papers which is in preparation.


Christof

From jbower at bek-mc.caltech.edu  Mon Jan 23 16:20:39 1989
From: jbower at bek-mc.caltech.edu (Jim Bower)
Date: Mon, 23 Jan 89 13:20:39 pst
Subject: NIPS 89
Message-ID: <8901232120.AA14266@bek-mc.caltech.edu>


 To whom it may concern:
 	A few days ago there was an announcement on the connectionist
 network that only one "major" neural network meeting would be held in
 1989.  While "major" in past meeting announcements for the INNS and the
 IEEE San Diego meetings has seemed most often to be equated with total
 attendance, and size of the exhibit area,  an equally important measure
 might be the overall quality of the work presented and therefore, the
 importance of the meeting to the field.  Accordingly, the previous
 announcement should probably be amended to include the fact that the
 Third annual Neural Information Processing Systems (NIPS) meeting will
 be held in late 1989 in Denver.  While the objective of this meeting is not
 to be the biggest meeting ever, and submitted papers are refereed, authors
 might consider submitting important results to this meeting anyway.  A
 call for papers will be announced, as usual, on this network.


Jim Bower

From jbower at bek-mc.caltech.edu  Mon Jan 23 16:17:05 1989
From: jbower at bek-mc.caltech.edu (Jim Bower)
Date: Mon, 23 Jan 89 13:17:05 pst
Subject: NIPS
Message-ID: <8901232117.AA14257@bek-mc.caltech.edu>


 To whom it may concern:
 	A few days ago there was an announcement on the connectionist
 network that only one "major" neural network meeting would be held in
 1989.  While "major" in past meeting announcements for the INNS and the
 IEEE San Diego meetings has seemed most often to be equated with total
 attendance, and size of the exhibit area,  an equally important measure
 might be the overall quality of the work presented and therefore, the
 importance of the meeting to the field.  Accordingly, the previous
 announcement should probably be amended to include the fact that the
 Third annual Neural Information Processing Systems (NIPS) meeting will
 be held in late 1989 in Denver.  While the objective of this meeting is not
 to be the biggest meeting ever, and submitted papers are refereed, authors
 might consider submitting important results to this meeting anyway.  A
 call for papers will be announced, as usual, on this network.


Jim Bower

From Dave.Touretzky at B.GP.CS.CMU.EDU  Mon Jan 23 18:46:25 1989
From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU)
Date: Mon, 23 Jan 89 18:46:25 EST
Subject: message from Jim Bower
Message-ID: <331.601602385@DST.BOLTZ.CS.CMU.EDU>

================================================================
Date: Sun, 22 Jan 89 20:37:57 pst
From: jbower at bek-mc.caltech.edu (Jim Bower)
To: Connectionists-Request at q.cs.cmu.edu
Subject: NIPS 89


 To whom it may concern:
 	A few days ago there was an announcement on the connectionist
 network that only one "major" neural network meeting would be held in
 1989.  While "major" in past meeting announcements for the INNS and the
 IEEE San Diego meetings has seemed most often to be equated with total
 attendance, and size of the exhibit area,  an equally important measure
 might be the overall quality of the work presented and therefore, the
 importance of the meeting to the field.  Accordingly, the previous
 announcement should probably be amended to include the fact that the
 Third annual Neural Information Processing Systems (NIPS) meeting will
 be held in late 1989 in Denver.  While the objective of this meeting is not
 to be the biggest meeting ever, and submitted papers are refereed, authors
 might consider submitting important results to this meeting anyway.  A
 call for papers will be announced, as usual, on this network.


From movellan%garnet.Berkeley.EDU at violet.berkeley.edu  Mon Jan 23 23:32:11 1989
From: movellan%garnet.Berkeley.EDU at violet.berkeley.edu (movellan%garnet.Berkeley.EDU@violet.berkeley.edu)
Date: Mon, 23 Jan 89 20:32:11 pst
Subject: Weight Decay
Message-ID: <8901240432.AA18293@garnet.berkeley.edu>

Referring to the compilation about weight decay from John:  I
cannot see the analogy between weight decay and ridge regression.
 
 
The weight solutions in a linear network (Ordinary Least Squares)
 
are the solutions to (I'I) W = I'T where:  
 
I is the input matrix (rows are # of patterns in epoch and
columns are # of input units in net). T is the teacher matrix 
(rows are # of patterns in epoch and columns  are # of 
teacher units in net). W is the matrix of weights (net is linear
with only one layer!). 
 
The weight solutions in ridge regression would be given by  
(I'I + k<1>) W = I'T. Where k is a "shrinkage" constant and <1> 
represents the identity matrix. Notice that k<1> has the same 
effect as increasing the variances of the inputs (Diagonal of 
I'I) without increasing their covariances (rest of the I'I 
matrix). The final effect is biasing the W solutions but reducing
the extreme variability to which they are subject when I'I is 
near singular (multicollinearity). Obviously collinearity may be
a problem in nets with a large # of hidden units. I am presently 
studying how and why collinearity in the hidden layer affects 
generalization and whether ridge solutions may help in this 
situation. I cannot see though how these ridge solutions relate 
to weight decay.
 
-Javier

From ILPG0 at ccuab1.uab.es  Tue Jan 24 09:23:00 1989
From: ILPG0 at ccuab1.uab.es (CORTO MALTESE)
Date: Tue, 24 Jan 89 14:23 GMT
Subject: Suscription
Message-ID: <mailman.88.1149540182.24850.connectionists@cs.cmu.edu>

Dear list owner,
        I should be grateful if you can add my name in the list of
suscriptors of Connectionists.  My name is O. S. Vilageliu, and the E.  Mail
address:
                ilpg0 at ccuab1.uab.es
        I thank you beforehand,
        Sincerely yours,

                                        Olga Soler

From pollack at cis.ohio-state.edu  Tue Jan 24 11:51:15 1989
From: pollack at cis.ohio-state.edu (Jordan B. Pollack)
Date: Tue, 24 Jan 89 11:51:15 EST
Subject: Gimme a break!
In-Reply-To: Christof Koch's message of Mon, 23 Jan 89 11:29:31 PST <890123112923.20203114@Hamlet.Caltech.Edu>
Message-ID: <8901241651.AA02067@toto.cis.ohio-state.edu>


Speaking of NIPS versus IJCNN, At least NIPS is pronouncable, even though,
as Terry S pointed out, Nabisco already holds it as a trademark.

If the international joint conference is to be as lasting a success
as, say, IJCAI, then its acronym should smoothly roll off the tongue.

Here are some of the alternatives I've just come up with:

Minor variations:

JINNC (Jink) Permute the word order
IJCONN Same name, but include the "ON"
ICONN  Leave out the "Joint" (for a drug free meeting?)
ICONS  International Conf. on Neural Systems (Hey! This is even a Word!)

The most elegant name is simply
NN  "Neural Networks", which can be spoken as either
"N Squared"   signifying both its size and technical nature, or
"Double-N",   signifying both the need for a big spread
              and the yearly "round-up" of research results like cattle...

Of course the search for acronyms usually generates useless debris:

NIPSOID  Neural Information Processing Systems On an International Dimension
MANIC  Most (of the) Artificially Neural International Community
DNE    (Sounds like DNA?) Dear Neural Enthusiast...
BNANA  Big Network of Artificial Neural Aficionados
ARTIST  Adaptive Resonance Theory as International Science and Technology
IBSH  I better Stop Here.


From kanderso at BBN.COM  Tue Jan 24 13:54:04 1989
From: kanderso at BBN.COM (kanderso@BBN.COM)
Date: Tue, 24 Jan 89 13:54:04 -0500
Subject: Weight Decay 
In-Reply-To: Your message of Mon, 23 Jan 89 20:32:11 -0800.
             <8901240432.AA18293@garnet.berkeley.edu> 
Message-ID: <mailman.89.1149540182.24850.connectionists@cs.cmu.edu>

  Date: Mon, 23 Jan 89 20:32:11 pst
  From: movellan%garnet.Berkeley.EDU at violet.berkeley.edu
  Message-Id: <8901240432.AA18293 at garnet.berkeley.edu>
  To: connectionists at cs.cmu.edu
  Subject: Weight Decay
  
  Referring to the compilation about weight decay from John:  I
  cannot see the analogy between weight decay and ridge regression.
   
   
  The weight solutions in a linear network (Ordinary Least Squares)
   
  are the solutions to (I'I) W = I'T where:  
   
  I is the input matrix (rows are # of patterns in epoch and
  columns are # of input units in net). T is the teacher matrix 
  (rows are # of patterns in epoch and columns  are # of 
  teacher units in net). W is the matrix of weights (net is linear
  with only one layer!). 
   
  The weight solutions in ridge regression would be given by  
  (I'I + k<1>) W = I'T. Where k is a "shrinkage" constant and <1> 
  represents the identity matrix. Notice that k<1> has the same 
  effect as increasing the variances of the inputs (Diagonal of 
  I'I) without increasing their covariances (rest of the I'I 
  matrix). The final effect is biasing the W solutions but reducing
  the extreme variability to which they are subject when I'I is 
  near singular (multicollinearity). Obviously collinearity may be
  a problem in nets with a large # of hidden units. I am presently 
  studying how and why collinearity in the hidden layer affects 
  generalization and whether ridge solutions may help in this 
  situation. I cannot see though how these ridge solutions relate 
  to weight decay.
   
  -Javier

Yes i was confused by this too.  Here is what the connection seems to
be.  Say we are trying to minimize an energy function E(w) of the
weight vector for our network.  If we add a constraint that also
attempts to minimize the length of w we would add a term kw'w to our
energy function.  Taking your linear least squares problem we would have

E = (T-IW)'(T-IW) + kW'W

dE/dW = I'IW - I'T + kW

setting dE/dW = 0 gives

[I'I +k<1>]W = I'T, ie. Ridge Regression.

W = [I'I + k<1>]^-1 I'T

The covariance matrix is [I'I + k<1>]^-1 so the effect of increasing k

1.  Make the matrix more invertable.

2.  Reduces the covariance so that new training data will have less
effect on your weights.

3.  You loose some resolution in weight space.

I agree that collinearity is probably very important, and i'll be glad
to discuss that off line.

k

From jose at tractatus.bellcore.com  Wed Jan 25 10:02:09 1989
From: jose at tractatus.bellcore.com (Stephen J Hanson)
Date: Wed, 25 Jan 89 10:02:09 EST
Subject: Weight Decay
Message-ID: <8901251502.AA05090@tractatus.bellcore.com>

actually I think the connection is more general--ridge regression
is a special case of variance techniques in regression called "biased regression"
(including principle components), biases are introduced in order to 
remove effects of collinearity as has been discussed and to attempt to achieve estimators that
may have a lower variance then the theoretical best least squares
unbiased estimator ("blue") since when assumptions of linearity and 
independence are violated LSE are not particularly attractive and
will not necessarily achieve "blue"s.  Conseqently nonlinear regression
and ordinary linear least squares regression with collinear variables
may be able to achieve lower variance estimators by entertaining biases.
In the nonlinear case a bias term would enter as a "constraint"
to be mininmized with Error (y-yhat) sup 2.  This constriant is actually
a term that can push weights differentially towards zero--and in terms
of regression is bias in terms of neural networks--weight decay.  Ridge regression is a 
specific case in terms of linear lse where the off diagonal terms of the correlation matrix are given 
less weight by adding a small constant to the diagonal in order to reduce
the collinearity problem--it is still controversial in statistical arenas--not
everyone subcribes to the notion of introducing biases--since it is hard 
a-priori to know what bias might be optimal for a given problem.

I have a paper with Lori Pratt that describes this relationship
more generally that had been given at the last NIPS and should be
available soon as a tech report.

	Steve Hanson

From rui at rice.edu  Wed Jan 25 18:34:38 1989
From: rui at rice.edu (Rui DeFigueiredo)
Date: Wed, 25 Jan 89 17:34:38 CST
Subject: No subject
Message-ID: <8901252334.AA01804@zeta.rice.edu>


	- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


	In-Reply-To:  poggio at wheaties.ai.mit.edu's message of Tue, 17 Jan 89 22:47:17 EST
	Subject:  Kolmogorov's superposition theorem


		  Kolmogorov's theorem and its relation to networks
		  are discussed in Biol. Cyber., 37, 167-186, 1979.
		  (On the representation of multi-input systems:
		  computational properties of polynomial algorithms,
		  Poggio and Reichardt).  There are references there
		  to older papers (see especially the two nice papers
		  by H. Abelson).


	- - - - - - - - - - - - end of message - - - - - - - - - - - - 
	

	Comment:


			Poggio and Reichardt's paper, "On the
	representation of multi-input systems: Computational
	properties of polynomial algorithms" (Biol. Cyber., 37,
	167-186, 1980) appeared, not earlier but, in the same
	year as deFigueiredo's, "Implications and applications of
	Kolmogorov's superposition theorem" (IEEE Trans. on
	Automatic Control, AC-25, 1227-1231, 1980).


From Scott.Fahlman at B.GP.CS.CMU.EDU  Thu Jan 26 12:55:49 1989
From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU)
Date: Thu, 26 Jan 89 12:55:49 EST
Subject: DARPA Program announcement (long, 2 of 3)
Message-ID: <mailman.90.1149540182.24850.connectionists@cs.cmu.edu>


Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO),
1400 Wilson Blvd., Arlington, VA  22209-2308
A--BROAD AGENCY ANNOUNCEMENT(BAA#89-04): NEURAL NETWORKS:  HARDWARE TECHNOLOGY
BASE DEVELOPMENT SOL BAA#89-04 DUE 030189 POC Douglas M. Pollock, Contracts,
(202)694-1771; Dr. Barbara L. Yoon, Technical, (202)694-1303.  The Defense 
Advanced Research Projects Agency, Defense Sciences Office, DARPA/DSO, is
interested in receiving proposals to develop hardware system components 
that capitalize on the inherent massive parallelism and expected robustness
of neural network models.  The objective of the present effort is to lay the
groundwork for future construction of full-scale artificial neural network
computing machines through the development of advanced hardware implementation
technologies.  DARPA does no intend to build full-scale machines at this stage
of the program.  Areas of interest include modifiable-weight synaptic
connections, neuron processing unit devices, and scalable neural net
architecture designs.  The technologies proposed may be analog or digital,
using silicon or other materials, and may be electronic, optoelectronic, 
optical, or other.  The technology should be robust to manufacturing and 
environmental variability.  It should be flexible and modular to accommodate
evolving neural network system architectures and to allow for scale-up to
large-sized systems through assembly/interconnection of smaller subsystems.
It should be appropriate for future compact, low-power systems.  It must
accommodate the high fan-out/high fan-in properties characteristic of
artificial neural network systems with high density interconnects, and it
must have high throughput capability to achieve rapid processing of large
volumes of data.  Only those proposals that clearly delineate how the 
objective enumerated above are to be achieved and that demonstrate extensive
prior experience in hardware design and fabrication will be favorably
considered.  If the proposal addresses a component technology, proposers
should provide a detailed description of the interface features required
for integration into a working artificial neural network system.  Whether
the proposed technology is adapted to a specific neural net model or,
conversely, is applicable to a broad range of models, the proposer
should clearly define the specific features of the proposed hardware
that underlie its particular applicability.  To the extent that availability
of the proposed technology will facilitate the implementation of advanced 
systems other that artificial neural network systems, that potential impact
should be described.  Hardware developers are encouraged to work in close
coordination with neural network modelers to better understand the range
of current projected architectural requirements. DARPA will also entertain
a limited number of proposals to develop near-term prototypes with high
potential for demonstrating the expected power of artificial neural networks.
This effort is a part of the DARPA program on Neural
Networks, the total funding for which is anticipated to be $33M over a 28
month period.  Proposals for projects covering less than 28 months are 
encouraged.  Proposals may be submitted any time through 4PM, March 1, 1989.
The proposal must contain the information listed below.  (1) The name, address,
and telephone number of the individual or organization submitting the proposal;
(2) A brief title that clearly identifies the application being addressed,
a concise descriptive summary of the proposed research, a supporting detailed
statement of the technical approach, and a description of the facilities to
be employed in this research.  Cooperative arrangements among industries,
universities, and other institutions are encouraged whenever this is 
advantageous to executing the proposed research.  Proprietary portions to
the technical proposal should be specifically identified.  Such proprietary
information will be treated with strict confidentiality; (3) The names, titles,
and proposed roles of this principal investigators and other key personnel
to be employed in the conduct of this research, with brief, resumes
that describe their pertinent accomplishments and publications; (4) A cost
proposal on SF1411 (or its equivalent) describing total costs, and an itemized
list of costs for labor, expendable and non-expendable equipment and
supplies, travel, subcontractors, consultants, and fees; (5) A schedule 
listing anticipated spending rates and program milestones; (6) The signature
of the individual (if applying on his own behalf) or of an official duly
authorized to commit the organization in business and financial affairs.  
Proposals should address a single application.  The technical content of the
proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2
x 11 inches), exclusive of figures, tables, references, resumes, and cost
proposal.  Proposals should contain a statement of validity for at least
150 days beyond the closing date of this announcement.  Evaluation of proposals
received in response to the BAA will be accomplished through a peer or
scientific review.  Selection of proposals will be based on the following 
evaluation criteria, listed in descending order of relative importance;
(1) Contribution of the proposed work to the stated objectives of the
program; (2) The soundness of the technical approach; (3) The uniqueness
and innovative content; (4) The qualifications of the principal and supporting
investigators; (5) The institution's capabilities and facilities; and (6) The
reasonableness of the proposed costs.  Selection will be based primarily on 
scientific or technical merit, importance to the program and fund availability.
Cost realism and reasonableness will only be significant in deciding between 
two technically equal proposals.  Fifteen copies of proposals should be sub-
mitted to:  Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, 
Arlington, VA  2209-2308.  Technical questions should be addressed to Dr. Yoon,
telephone (202)694-1303.  This CBD notice itself constitutes the Broad
Agency Announcement as contemplated in FAR 6.102(d)(2).  No additional written
information is available, nor will a formal RFP or other solicitation
regarding this announcement be issued.  Requests for same will be disregarded.
The Government reserves the right to select for award all, some or none of the
proposals received in response to this announcement.  All responsible sources
may submit a proposal which shall be considered by DARPA.


From poggio at wheaties.ai.mit.edu  Thu Jan 26 13:01:23 1989
From: poggio at wheaties.ai.mit.edu (Tomaso Poggio)
Date: Thu, 26 Jan 89 13:01:23 EST
Subject: No subject
In-Reply-To: Rui DeFigueiredo's message of Wed, 25 Jan 89 17:34:38 CST <8901252334.AA01804@zeta.rice.edu>
Message-ID: <8901261801.AA15158@wheat-chex.ai.mit.edu>

...

From Scott.Fahlman at B.GP.CS.CMU.EDU  Thu Jan 26 13:00:34 1989
From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU)
Date: Thu, 26 Jan 89 13:00:34 EST
Subject: DARPA Program Announcement (long, 3 of 3) 
Message-ID: <mailman.91.1149540182.24850.connectionists@cs.cmu.edu>


Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO),
1400 Wilson Blvd., Arlington, VA  22209-2308
A--BROAD AGENCY ANNOUNCEMENT(BAA#89-03):  NEURAL NETWORKS: THEORY AND MODELING 
SOL BAA#89-03 DUE 030189 POC Douglas M. Pollock, Contracts(202)694-1771;
Dr. Barbara L. Yoon, Technical(202)694-1303.  The Defense Advanced Research
Projects Agency, Defense Sciences Office, DARPA/DSO, is interested in receiving
proposals to develop and analyze new artificial neural network system
architectures/structures and training procedures; define the requirements
for scale-up to large-sized artificial neural networks; and characterize
the properties, limitations, and data requirements of new and existing
artificial neural network systems.  Proposers are encouraged to submit 
proposals that deal with, but are not limited to, any combination of the
following thrusts within these areas: (1) New artificial neural architectures
with one or more of the following features: (a) Potential for addressing
real-time sensory data processing and real-time sensorimonitor control; (b) 
Networks that incorporate features of sensory, motor, and perceptual processing
in biological systems; (c) Nodal elements with increased processing
capability, including sensitivity to temporal variations in synaptic
inputs; (d) Modular networks composed of multiple interconnected subnets;
(e) Hybrid systems combining neural and conventional information processing
techniques; (f) Mechanisms to achieve modifications of network behavior in
response to external consequences of initial actions;
(g) Mechanisms that exhibit selective attention; (h) Strategies
for developing conceptual systems and internal data representations well
adapted to specific tasks; (i) Means for recognizing and producing 
sequences of temporal patterns.  (2) Faster, more efficient training 
procedures that: (a) Are robust to noisy data and able to accommodate
delayed feedback; (b) Minimize the need for external intervention for
feedback; (c) Identify optimal choices of initial classification features
or categories; (d) Generate internal models of the external world to guide
appropriate responses to external stimuli. (3) Theoretical analyses that
address; (a) Data representations; (b) Scaling properties for new and
existing systems; (c) Matching of system complexity to the nature and
amount of training data; (d) Tolerance to nodal element and synaptic
failure; (e) Stability and convergence of new and existing systems; (f)
Relationships between neural networks and conventional approaches.  DARPA
will also entertain a limited number of proposals to address special
applications with high potential for demonstrating the expected power of
artificial neural networks.This effort is a part of the DARPA program on Neural
Networks, the total funding for which is anticipated to be $33M over a 28
month period.  Proposals for projects covering less than 28 months are 
encouraged.  Proposals may be submitted any time through 4PM, March 1, 1989.
The proposal must contain the information listed below.  (1) The name, address,
and telephone number of the individual or organization submitting the proposal;
(2) A brief title that clearly identifies the application being addressed,
a concise descriptive summary of the proposed research, a supporting detailed
statement of the technical approach, and a description of the facilities to
be employed in this research.  Cooperative arrangements among industries,
universities, and other institutions are encouraged whenever this is 
advantageous to executing the proposed research.  Proprietary portions to
the technical proposal should be specifically identified.  Such proprietary
information will be treated with strict confidentiality; (3) The names, titles,
and proposed roles of this principal investigators and other key personnel
to be employed in the conduct of this research, with brief, resumes
that describe their pertinent accomplishments and publications; (4) A cost
proposal on SF1411 (or its equivalent) describing total costs, and an itemized
list of costs for labor, expendable and non-expendable equipment and
supplies, travel, subcontractors, consultants, and fees; (5) A schedule 
listing anticipated spending rates and program milestones; (6) The signature
of the individual (if applying on his own behalf) or of an official duly
authorized to commit the organization in business and financial affairs.  
Proposals should address a single application.  The technical content of the
proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2
x 11 inches), exclusive of figures, tables, references, resumes, and cost
proposal.  Proposals should contain a statement of validity for at least
150 days beyond the closing date of this announcement.  Evaluation of proposals
received in response to the BAA will be accomplished through a peer or
scientific review.  Selection of proposals will be based on the following 
evaluation criteria, listed in descending order of relative importance;
(1) Contribution of the proposed work to the stated objectives of the
program; (2) The soundness of the technical approach; (3) The uniqueness
and innovative content; (4) The qualifications of the principal and supporting
investigators; (5) The institution's capabilities and facilities; and (6) The
reasonableness of the proposed costs.  Selection will be based primarily on 
scientific or technical merit, importance to the program and fund availability.
Cost realism and reasonableness will only be significant in deciding between 
two technically equal proposals.  Fifteen copies of proposals should be sub-
mitted to:  Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, 
Arlington, VA  2209-2308.  Technical questions should be addressed to Dr. Yoon,
telephone (202)694-1303.  This CBD notice itself constitutes the Broad
Agency Announcement as contemplated in FAR 6.102(d)(2).  No additional written
information is available, nor will a formal RFP or other solicitation
regarding this announcement be issued.  Requests for same will be disregarded.
The Government reserves the right to select for award all, some or none of the
proposals received in response to this announcement.  All responsible sources
may submit a proposal which shall be considered by DARPA.

From Scott.Fahlman at B.GP.CS.CMU.EDU  Thu Jan 26 12:49:35 1989
From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU)
Date: Thu, 26 Jan 89 12:49:35 EST
Subject: DARPA Program announcement (long, 1 of 3)
Message-ID: <mailman.92.1149540182.24850.connectionists@cs.cmu.edu>


Barbara Yoon at DARPA has apparently been flooded with requests for the
three DARPA program announcements in the neural network area.  To lighten
the load, she asked us to send out the full text of these announcements to
members of this mailing list.  The text in this and the following two
messages is copied verbatim from the Commerce Business Daily.  We have
resisted the temptation to insert paragraph breaks to improve readability.

I apologize for dumping so much text on people who alrady have copies of
the announcements or who are not interested, but this seems the best way to
get the word out to a large set of potentially interested people.  Please
don't contact us about this program -- the appropriate phone numbers and
addresses are listed in the announcements.

-- Scott Fahlman, CMU

===========================================================================

Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO),
1400 Wilson Blvd., Arlington, VA 22209-2308

A--BROAD AGENCY ANNOUNCEMENT (BAA#89-02):  NEURAL NETWORKS:  COMPARATIVE
PERFORMANCE MEASUREMENTS SOL BAA#89-02 DUE 030189 POC Douglas M. Pollock, 
Contracts, (202)694-1771; Dr. Barbara L. Yoon, Technical, (202)694-1303.
The Defense Advanced Research Projects Agency, Defense Sciences Office, 
DARPA/DSO is interested in receiving proposals to construct and test software
simulations of artificial neural networks (or software simulations of hybrid
systems incorporating artificial neural networks) that perform defined,
complex classification tasks in the following application areas: (1) Automatic
target recognition; (2) Continuous speech recognition; (3) Sonar signal
discrimination; and (4) Seismic signal discrimination.  The objectives of this
program are to advance the state-of-the-art in application of artificial neural
network approaches to classification problems; to investigate the optimal
role of artificial neural networks in hybrid  classification systems; and
to measure the projected performance of artificial neural networks (or hybrid
systems containing neural networks) in order to support a comparison with
the performance of alternative, competing technologies.  DARPA will provide
application developers with a standard set of training data, appropriate to
the application, to be used as the basis for training (or otherwise developing)
their classification systems.  The systems developed will then be evaluated
independently in classification of standard sets of test data, distinct from
the training set.  The four application tasks are more fully described below.
(1) Automatic target recognition: (a) Given a multi-spectral training set of
time-correlated images of up to ten land vehicles (which may be partially
obscured and in cluttered environments) with ground truth provided, identify
and classify these vehicles in a new set of images (outside the training set);
(b) Given images of two or more new land vehicles, recognize these vehicles
as distinct from the original set and distinguish them from one another (with
no system reprogramming or retraining); (c) Given a new training set of data 
on air vehicles, with system reprogramming and/or retraining, modify the system
to identify and classify this new class of targets.  (2) Continuous speech
recognition:  (a) Given a training set of 2800 spoken English sentences (with
a 1000 word vocabulary), transcribe to written text spoken English sentences
from a test set (outside the training set); (b) With no system reprogramming
or retraining, transcribe to text spoken English sentences using vocabulary
outside the initial vocabulary (given only the phonetic spelling of the new
words); (c) Given training data on spoken foreign language sentences (with
characteristics similar to those of the English sentence data base described
in application (2)(a) above), with system programming and/or retraining, modify
the system to transcribe to text spoken foreign language sentences.  (3) Sonar
signal discrimination:  (a) Given a training set of several acoustic signature
transients and passive marine acoustic signals (both signal types in noisy
environments), detect and classify each signal type in a test set (outside
the training set); (b) Given two or more new passive marine acoustic signals,
with no system reprogramming or retraining, recognize these signals as distinct
from the original set and distinguish them from one another; (c) Given a new 
training set of data on underwater echoes from active sonar returns, with
system reprogramming and/or retraining, modify the system to detect and 
classify each signal type in this new class of signals and distinguish them
from the original set of acoustic signals.  (4) Seismic signal discrimination:
(a) Given a training set of seismic signals (and associated parameters) from
different types of seismic events of varying magnitudes, each event recorded
at two or more seismic stations with ground truth provided, classify (as to
signal type), locate, and estimate the magnitude of similar events in a test
set of seismic signals (outside the training set); (b) Given one or more new
types of seismic signals, recognize these signals as distinct from the original
set (with no system reprogramming or retraining); (c) Given a new training
set of seismic signals from seismic stations located in different geological 
regions from the original stations, with system reprogramming and/or retrain-
ing, modify the system to classify and characterize this new set of signals.
The criteria for evaluating the performance of the classification systems will
include: (a) Classification accuracy (the appropriate accuracy metric for the
task addressed, e.g., percentage or correct detections, identifications, and/or
classifications, including false alarms where applicable; or total error 
rates); (b) System development time (the time required to develop and train the
system); (c) Fault tolerance (the percentage of original performance when 
subjected to failure of some of the processing elements); (d) Generality (the
accuracy of the system for new input data significantly outside the range of
training data); (e) Adaptability (the time and effort required to modify the
system to address similar classification problems with different classes of
data); (f) Computational efficiency (the period solution speed when optimally
implemented in hardware); (g) Size and power requirements (the projected
size and power requirements of the computational hardware); (h) Performance
vs training data (the rate of improvement in performance with increasing size of
the training data set).  This effort is a part of the DARPA program on Neural
Networks, the total funding for which is anticipated to be $33M over a 28
month period.  Proposals for projects covering less than 28 months are 
encouraged.  Proposals may be submitted any time through 4PM, March 1, 1989.
The proposal must contain the information listed below.  (1) The name, address,
and telephone number of the individual or organization submitting the proposal;
(2) A brief title that clearly identifies the application being addressed,
a concise descriptive summary of the proposed research, a supporting detailed
statement of the technical approach, and a description of the facilities to
be employed in this research.  Cooperative arrangements among industries,
universities, and other institutions are encouraged whenever this is 
advantageous to executing the proposed research.  Proprietary portions to
the technical proposal should be specifically identified.  Such proprietary
information will be treated with strict confidentiality; (3) The names, titles,
and proposed roles of this principal investigators and other key personnel
to be employed in the conduct of this research, with brief, resumes
that describe their pertinent accomplishments and publications; (4) A cost
proposal on SF1411 (or its equivalent) describing total costs, and an itemized
list of costs for labor, expendable and non-expendable equipment and
supplies, travel, subcontractors, consultants, and fees; (5) A schedule 
listing anticipated spending rates and program milestones; (6) The signature
of the individual (if applying on his own behalf) or of an official duly
authorized to commit the organization in business and financial affairs.  
Proposals should address a single application.  The technical content of the
proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2
x 11 inches), exclusive of figures, tables, references, resumes, and cost
proposal.  Proposals should contain a statement of validity for at least
150 days beyond the closing date of this announcement.  Evaluation of proposals
received in response to the BAA will be accomplished through a peer or
scientific review.  Selection of proposals will be based on the following 
evaluation criteria, listed in descending order of relative importance;
(1) Contribution of the proposed work to the stated objectives of the
program; (2) The soundness of the technical approach; (3) The uniqueness
and innovative content; (4) The qualifications of the principal and supporting
investigators; (5) The institution's capabilities and facilities; and (6) The
reasonableness of the proposed costs.  Selection will be based primarily on 
scientific or technical merit, importance to the program and fund availability.
Cost realism and reasonableness will only be significant in deciding between 
two technically equal proposals.  Fifteen copies of proposals should be sub-
mitted to:  Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, 
Arlington, VA  2209-2308.  Technical questions should be addressed to Dr. Yoon,
telephone (202)694-1303.  This CBD notice itself constitutes the Broad
Agency Announcement as contemplated in FAR 6.102(d)(2).  No additional written
information is available, nor will a formal RFP or other solicitation
regarding this announcement be issued.  Requests for same will be disregarded.
The Government reserves the right to select for award all, some or none of the
proposals received in response to this announcement.  All responsible sources
may submit a proposal which shall be considered by DARPA.

From pwh at ece-csc.ncsu.edu  Thu Jan 26 17:31:04 1989
From: pwh at ece-csc.ncsu.edu (Paul Hollis)
Date: Thu, 26 Jan 89 17:31:04 EST
Subject: No subject
Message-ID: <8901262231.AA03761@ece-csc.ncsu.edu>


     REVISED SUBMISSION DEADLINE FOR IJCNN-89 PAPERS--FEBRUARY 15, 1989

             International Joint Conference on Neural Networks
                             June 18-22, 1989
                             Washington, D.C.


DEADLINE FOR SUBMISSION OF PAPERS for IJCNN-89 has been revised to FEBRUARY
15, 1989.  Papers of 8 pages or less are solicited in the following areas:


-Real World Applications                    -Associative Memory
-Supervised Learning Theory                 -Image Analysis
-Reinforcement Learning Theory              -Self-Organization
-Robotics and Control                       -Neurobiological Models
-Optical Neurocomputers                     -Vision
-Speech Processing and Recognition          -Electronic Neurocomputers
-Neural Network Architectures & Theory      -Optimization 


FULL PAPERS in camera-ready form (1 original on Author's Kit  forms  and  5
reduced  8  1/2" x 11" copies) should be submitted to Nomi Feldman, Confer-
ence  Coordinator, at the address below.  For more details,  or to  request
your IEEE Author's Kit, call or write:

               Nomi Feldman, IJCNN-89 Conference Coordinator
                  3770 Tansy Street, San Diego, CA  92121
                              (619) 453-6222

From REXB%PURCCVM.BITNET at VMA.CC.CMU.EDU  Fri Jan 27 12:55:00 1989
From: REXB%PURCCVM.BITNET at VMA.CC.CMU.EDU (Rex C. Bontrager)
Date: Fri, 27 Jan 1989 12:55 EST
Subject: INNS membership
Message-ID: <mailman.93.1149540182.24850.connectionists@cs.cmu.edu>

Who do I contact regarding INNS membership?
(more precisely, to whom do I send my money?)


Rex C. Bontrager
Bitnet:    rexb at purccvm
Internet:  rexb at vm.cc.purdue.edu
Phone:     (317) 494-1787 ext. 256

From neural!yann  Wed Jan 25 15:13:58 1989
From: neural!yann (Yann le Cun)
Date: Wed, 25 Jan 89 15:13:58 -0500
Subject: Weight Decay 
Message-ID: <8901252012.AA00971@neural.UUCP>


Consider a single layer linear network with N inputs. 
When the number of training pattern is smaller than N , the
set of solutions (in weight space) is a proper linear subspace.
adding weight decay will select the minimum norm solution in this subspace
(if the weight decay coefficient is decreased with time).
The minimum norm solution happens to be the solution given by the 
pseudo-inverse technique (cf Kohonen), and the solution which
optimally cancels out uncorrelated zero mean additive noise on the input.

- Yann Le Cun


From reggia at mimsy.umd.edu  Fri Jan 27 19:41:19 1989
From: reggia at mimsy.umd.edu (James A. Reggia)
Date: Fri, 27 Jan 89 19:41:19 EST
Subject: call for papers
Message-ID: <8901280041.AA04500@mimsy.umd.edu>


                    CALL FOR PAPERS

The 13th Annual Symposium on Computer Applications in Medical
Care will have a tract this year on applications of neural
models (connectionist models, etc.) in medicine. The Symposium
will be held in Washington DC, as in previous years, on
November 5 - 8, 1989.

Submissions are refereed and if accepted, appear in the
Symposium Proceedings.  Deadline for submission of manuscripts
(six copies, double spaced, max. of 5000 words) is March 3, 1989.
For further information and/or a copy of the detailed call
for papers, contact:

       SCAMC
       Office of Continuing Medical Education
       George Washington University Medical Center
       2300 K Street, NW
       Washington, DC 20037

The detailed call for papers includes author information sheets
that must be returned with a manuscript.

From elman at amos.ling.ucsd.edu  Sat Jan 28 01:24:24 1989
From: elman at amos.ling.ucsd.edu (Jeff Elman)
Date: Fri, 27 Jan 89 22:24:24 PST
Subject: UCSD Cog Sci faculty opening
Message-ID: <8901280624.AA11066@amos.ling.ucsd.edu>


                       ASSISTANT PROFESSOR
                        COGNITIVE SCIENCE
               UNIVERSITY OF CALIFORNIA, SAN DIEGO

The Department of Cognitive Science at UCSD  expects  to  receive
permission  to hire one person for a tenure-track position at the
Assistant Professor level. The Department takes a  broadly  based
approach  to  the  study of cognition, including its neurological
basis,  in   individuals   and   social   groups,   and   machine
intelligence.    We  seek  someone  whose  interests  cut  across
conventional disciplines.   Interests  in  theory,  computational
modeling (especially PDP), or applications are encouraged.

Candidates  should  send  a  vita,  reprints,  a   short   letter
describing   their   background  and  interests,  and  names  and
addresses of at least three references to:

Search Committee
Cognitive Science, C-015-E
University of California, San Diego
La Jolla, CA 92093

Applications must be received prior to March  15,  1989.   Salary
will be commensurate with experience and qualifications, and will
be based upon UC pay schedules.

Women and minorities are especially  encouraged  to  apply.   The
University   of   California,   San   Diego   is  an  Affirmative
Action/Equal Opportunity Employer.


From Dave.Touretzky at B.GP.CS.CMU.EDU  Sat Jan 28 07:14:37 1989
From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU)
Date: Sat, 28 Jan 89 07:14:37 EST
Subject: INNS membership 
In-Reply-To: Your message of Fri, 27 Jan 89 12:55:00 -0500.
Message-ID: <462.601992877@DST.BOLTZ.CS.CMU.EDU>

PLEASE:

Do not send requests for general information (like how to join INNS) to the
CONNECTIONISTS list!  This list is intended for serious scientific
discussion only.  If you need help with an address or something equally
trivial, send mail to connectionists-request if you must.  Better yet, use
the Neuron Digest.  Don't waste people's time on CONNECTIONISTS.

-- Dave

From norman%cogsci at ucsd.edu  Sun Jan 29 13:36:36 1989
From: norman%cogsci at ucsd.edu (Donald A Norman-UCSD Cog Sci Dept)
Date: Sun, 29 Jan 89 10:36:36 PST
Subject: addendum to UCSD Cog Sci faculty opening
Message-ID: <8901291836.AA22314@sdics.COGSCI>


Jef Ellman's posting of the job at UCSD in the Cognitive Science
Department was legally and technically accurate, but he should have
added one important sentence:

Get the application -- or at least, a letter of interest -- to us immediately.

We are very late in getting the word out, and decisions will have to
be made quickly.  The sooner we know of the pool of applicants, the better.
(Actually, I now discover one inaccuracy -- the ad says we "expect to
receive permission to hire ..."   In fact, we now do have that permission.

If you have future interests -- say you are interested not now, but in
a year or two or three -- that too is important for us to know, so
tell us.

don norman


From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU  Sun Jan 29 22:39:30 1989
From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (thanasis kehagias)
Date: Sun, 29 Jan 89 22:39:30 EST
Subject: speech list?
Message-ID: <mailman.94.1149540182.24850.connectionists@cs.cmu.edu>

does anyone know of a mailing list where speech questions are discussed?
(not necessarily as related to connectionist methods; just speech
questions in general).


thanks a lot, Thanasis

From pwh at ece-csc.ncsu.edu  Mon Jan 30 14:48:25 1989
From: pwh at ece-csc.ncsu.edu (Paul Hollis)
Date: Mon, 30 Jan 89 14:48:25 EST
Subject: IJCNN Call for Papers Amendment
Message-ID: <8901301948.AA25787@ece-csc.ncsu.edu>


                  Amendment to IJCNN call for papers

Sorry...Upon reflection the wording in the IJCNN call for  papers  did  not
convey the proper meaning.  Perhaps a better way to say it would have been,
"IJCNN-89 is replacing both the ICNN and INNS meetings in 1989." The intent
was  for people to realize that if they planned to submit to either ICNN or
INNS or both in 1989, the joint conference is the only  opportunity  to  do
so.   Part  of  the  reason  for extending the deadline is to allow for the
short notice (no INNS call for papers had previously been issued, since the
merger  of  the two conferences just occurred). The original text was meant
to imply the above and nothing more.  No offense should  be  taken  because
none was intended.

By the way, I was at last year's NIPS conference  and  thought  it  was  an
excellent conference. I plan to be there again next year.

Also there has been some confusion over the revised deadline for paper sub-
missions to IJCNN.  The revised deadline STILL STANDS as FEBRUARY 15.

P.S. Following the precedent set at the IJCAI, my pronunciation of IJCNN is
idge-kin. The acronyms were good though!


Wes Snyder, Co-Chairman of the Organization Committee, IJCNN-89


                      January 30, 1989


From eric at mcc.com  Mon Jan  2 17:02:06 1989
From: eric at mcc.com (Eric Hartman)
Date: Mon, 2 Jan 89 16:02:06 CST
Subject: Tech Report Announcement
Message-ID: <8901022202.AA11713@legendre.aca.mcc.com>


The following MCC Technical Report is now available.

Requests may be sent to

 eric at mcc.com 

or

 Eric Hartman
 Microelectronics and Computer Technology Corporation
 3500 West Balcones Center Drive
 Austin, TX 78759-6509
 U.S.A.

------------------------------------------------------------------------

      Explorations of the Mean Field Theory Learning Algorithm

                Carsten Peterson* and Eric Hartman 

        Microelectronics and Computer Technology Corporation
                 3500 West Balcones Center Drive
                      Austin, TX 78759-6509

            MCC Technical Report Number: ACA-ST/HI-065-88

                            Abstract:

The mean field theory (MFT) learning algorithm is elaborated and
explored with respect to a variety of tasks. MFT is benchmarked against
the back propagation learning algorithm (BP) on two different feature
recognition problems: two-dimensional mirror symmetry and eight-dimensional
statistical pattern classification. We find that while the two algorithms
are very similar with respect to generalization properties,  MFT normally 
requires a substantially smaller number of training epochs than BP.
Since the MFT  model is bidirectional, rather than feed-forward, its use
can be extended naturally from purely functional mappings to a content
addressable memory.  A network with N visible and N hidden units
can store up to approximately 2N patterns with good content-addressability.  
We stress an implementational advantage for MFT: it is natural for VLSI 
circuitry. Also, its inherent parallelism can be exploited with fully 
synchronous updating, allowing efficient simulations on SIMD architectures.

*Present Address: Department of Theoretical Physics 
                  University of Lund 
                  Solvegatan 14A, S-22362 Lund, Sweden 


From Scott.Fahlman at B.GP.CS.CMU.EDU  Mon Jan  2 21:57:10 1989
From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU)
Date: Mon, 02 Jan 89 21:57:10 EST
Subject: Benchmarks mailing list
Message-ID: <mailman.75.1149591159.29955.connectionists@cs.cmu.edu>


Two or three weeks ago I sent a message to this mailing list announcing our
intention to set up at CMU a collection of learning benchmarks, accessible
via FTP from the Arpanet.  The hope is that this collection will help the
research community in our joint effort to characterize the speed and
quality of various learning algorithms on a variety of different learning
tasks.  There were a few problems in getting the new mailing lists set up
over the holidays, but I believe we're now ready to proceed.

I anticipate that there will be considerable discussion about the
usefulness of various benchmarks, how they should be run, results, etc.
Rather than clog the "connectionists" mailing list with these
benchmark-related messages, we have set up a new mailing list whose Arpanet
address is "nn-bench at cs.cmu.edu".  If you want to be added to this mailing
list, send an "add me" message to "nn-bench-request at cs.cmu.edu".  Please
include a valid netmail address that we can reach from the Arpanet.  If
messages to the address you give start bouncing, we'll have to delete you
from the list.

The "nn-bench-request" address is also the proper destination for "delete
me" requests, address changes, and other messages intended only for the
mailing list and data base maintainers.  Please do not send such messages
to "nn-bench" -- you will inconvenience a lot of people and make yourself
look like a fool.  At present, the mailing list maintainers are Michael
Witbrock and me.

If you just want to access the benchmark collection and not participate in
the related discussions, you don't have to join the "nn-bench" mailing
list.  Once there is a useful collection of files in one place, I will tell
people on the "connectionists" mailing list how to access them.

I suggest we wait until January 15 or so before we start discussing
substantive issues on the "nn-bench" list.  This will give people time to
join the mailing list before the fun begins.  We will archive old messages
for those who join later.

-- Scott Fahlman, CMU

From kruschke at cogsci.berkeley.edu  Tue Jan  3 03:30:12 1989
From: kruschke at cogsci.berkeley.edu (John Kruschke)
Date: Tue, 3 Jan 89 00:30:12 PST
Subject: No subject
Message-ID: <8901030830.AA09915@cogsci.berkeley.edu>


Here is the compilation of responses to my request for info on 
weight decay. 

I have kept editing to a minimum, so you can see exactly what the
author of the reply said. Where appropriate, I have included some
comments of my own, set off in square brackets.  The responses are
arranged into three broad topics: (1) Boltzmann-machine related; 
(2) back-prop related; (3) psychology related. 

Thanks to all, and happy new year!  --John


-----------------------------------------------------------------

ORIGINAL REQUEST:
 
I'm interested in all the information I can get regarding
WEIGHT DECAY in back-prop, or in other learning algorithms.

*In return* I'll collate all the info contributed and send the
complilation out to all contributors.

Info might include the following:

  REFERENCES:
  - Applications which used weight decay
  - Theoretical treatments
  Please be as complete as possible in your citation.

  FIRST-HAND EXPERIENCE
  - Application domain, details of I/O patterns, etc.
  - exact decay procedure used, and results

(Please send info directly to me: kruschke at cogsci.berkeley.edu
 Don't use the reply command.)

T H A N K S !  --John Kruschke.


-----------------------------------------------------------------
 
From:  Geoffrey Hinton <hinton at ai.toronto.edu>
Date:  Sun, 4 Dec 88 13:57:45 EST


Weight-decay is a version of what statisticians call "Ridge
Regression". 

We used weight-decay in Boltzmann machines to keep the energy barriers
small. This is described in section 6.1 of: 
   Hinton, G. E., Sejnowski, T. J., and Ackley, D. H. (1984)
   Boltzmann Machines: Constraint satisfaction networks that learn.
   Technical Report CMU-CS-84-119, Carnegie-Mellon University.

I used weight decay in the family trees example.  Weight decay was
used to improve generalization and to make the weights easier to
interpret (because, at equilibrium, the magnitude of a weight = 
its usefulness).  This is in: 
   Rumelhart, D.~E., Hinton, G.~E., and Williams, R.~J. (1986)  
   Learning representations by back-propagating errors.
   {\it Nature}, {\bf 323}, 533--536.

I used weight decay to achieve better generalization in a hard
generalization task that is reported in: 
   Hinton, G.~E. (1987)  
   Learning translation invariant recognition in a massively 
   parallel network.  In Goos, G. and Hartmanis, J., editors, 
   {\it PARLE: Parallel Architectures and Languages Europe}, 
   pages~1--13, Lecture Notes in Computer Science,  
   Springer-Verlag, Berlin.


Weight-decay can also be used to keep "fast" weights small.  The fast
weights act as a temporary context.  One use of such a context is
described in: 
   Hinton, G.~E. and Plaut, D.~C. (1987)  
   Using fast weights to deblur old memories.
   {\it Proceedings of the Ninth Annual Conference of the 
   Cognitive Science Society}, Seattle, WA.

--Geoff


-----------------------------------------------------------------
 
[In his lecture at the International Computer Science Institute, 
 Berkeley CA, on 16-DEC-88, Geoff also mentioned that weight decay is
 good for wiping out the initial values of weights so that only the 
 effects of learning remain.  

 In particular, if the change (due to learning) on two weights is the
 same for all updates, then the two weights converge to the same
 value. This is one  way to generate symmetric weights from
 non-symmetric starting values. 

 --John] 


-----------------------------------------------------------------
 
From:  Michael.Franzini at SPEECH2.CS.CMU.EDU
Date:  Sun, 4 Dec 1988 23:24-EST 

My first-hand experience confirms what I'm sure many other people have
told you: that (in general) weight decay in backprop increases
generalization. I've found that it's particulary important for small
training sets, and its effect diminishes as the training set size
increases. 

Weight decay was first used by Barak Pearlmutter.  The first mention
of weight decay is, I believe, in an early paper of Hinton's (possibly
the Plaut, Nowlan, and Hinton CMU CS tech report), and it is
attributed to "Barak Pearlmutter, Personal Communication" there. 

The version of weight decay that (i'm fairly sure) all of us at CMU
use is one in which each weight is multiplied by 0.999 every epoch.
Scott Fahlman has a more complicated version, which is described in
his QUICKPROP tech report. [QuickProp is also described in his paper
in the Proceedings of the 1988 Connectionist Models Summer School,
published by Morgan Kaufmann. --John] 

The main motivation for using it is to eliminate spurious large
weights which happen not to interfere with recognition of training
data but would interfere with recognizing testing data.  (This was
Barak's motivation for trying it in the first place.) However, I have
heard more theoretical justifications (which, unfortunately, I can't
reproduce.) 

In case Barak didn't reply to your message, you might want to contact
him directly at bap at cs.cmu.edu. 

--Mike


-----------------------------------------------------------------
 
From:  Barak.Pearlmutter at F.GP.CS.CMU.EDU
Date:  8 Dec 1988 16:36-EST 


We first used weight decay as a way to keep weights in a boltzmann
machine from growing too large.  We added a term to the thing being
minimized, G, so that 

    G' = G + 1/2 h \sum_{i<j} w_{ij}^2

where G' is our new thing to minimize.  This gives

    \partial G'/\partial w_{ij} = \partial G/\partial w_{ij} + h w_{ij}

which is just weight decay with some mathematical motivation.  As Mike
mentioned, I was the person who thought of weight decay in this
context (in the shower no less), but parameter decay has been used
forever, in adaptive control for example. 

It sort of worked okay for Boltzmann machines, but works much better
in backpropagation.  As a historic note I should mention that there
were some competing techniques for keeping weights small in Boltzmann
machines, such as Mark Derthick's "differential glommetry" in which
the effective target termperature of the wake phase is higher than
that of the sleep phase.  I don't know if there is an analogue for
this in backpropagation, but there certainly is for mean field theory
networks. 

Getting back weight decay, it was noted immediately that G has the
unit "bits" while $w_{ij}^2$ has the unit "weight^2", sort of a
problem from a dimensional analysis point of view.  Solving this
conundrum, Rick Szeliski pointed out that if we're going to transmit
our weights by telephone and know a-priori that weights have gaussian
distributions, so 

    P(w_{ij}=x) \propto e^{-1/2 h x^2}

where h is set to get the correct variance, then transmitting a weight
w will take $-1/2 h w^2$ bits, which we can add to G with dimensional
confidence. 

Of course, this argument extends to fast/slow split weights nicely; the
other guy already knows the slow weights, so we need transmit only the
fast weights.

By "ridge regression" I guess Geoff means that valleys in weight space
that cause the weights to grow asymptotically are made to tilt up after
a while, so that the asymptotic tailing off is eliminated.  It's like
adding a bowl to weight space, so minima have to be within the bowl.

An interesting side effect of weight decay is that, once we get to a
minimum, so $\partial G'/\partial w = 0$, then

    w_{ij} \propto - \partial G/\partial w_{ij}

so we can do a sort of eyeball significance analysis, since a weight's
magnitiude is proportaional to how sensitive the error is to changing
it.


-----------------------------------------------------------------
 
From:  russ%yummy at gateway.mitre.org (Russell Leighton)
Date:  Mon, 5 Dec 88 09:17:56 EST


We always use weight decay in backprop. It is partiuclarly important
in escaping local minima. Decay moves the transfer function from all
of the semi-linear (sigmoidal) nodes toward the linear region. The
important point is that all nodes move proportionally so no
information in the weights is "erased" but only scaled. When the nodes
that have trapped the system in the local minima are scaled enough,
the system moves onto a different trajectory through weight space.
Oscilations are still possible, but are less likely. 

We use decay with a process we call "shaping" (see Wieland and
Leighton, "Shaping Schedules as a Method for Accelerating Leanring",
Abstracts of the First Annual INNS Meeting, Boston, 1988) that we use
to speed learning of some difficult problems. 


ARPA: russ%yummy at gateway.mitre.org

Russell Leighton
MITRE Signal Processing Lab
7525 Colshire Dr.
McLean, Va. 22102
USA


-----------------------------------------------------------------
 
From:  James Arthur Pittman <hi.pittman at MCC.COM>
Date:  Tue, 6 Dec 88 09:34 CST

Probably he will respond to you himself, but Alex Weiland of MITRE
presented a paper at INNS in Boston on shaping, in which the order of
presentation of examples in training a back-prop net was altered to
reflect a simpler rule at first.  Over a number of epochs he gradually
changed the examples to slowly change the rule to the one desired. The
nets learned much faster than if he just tossed the examples at the
net in random order.  He told me that it would not work without weight
decay.  He said their rule-of-thumb was the decay should give the
weights a half-life of 2 to 3 dozen epochs (usually a value such as
0.9998). But I neglected to ask him if he felt that the number of
epochs or the number of presentations was important.  Perhaps if one
had a significantly different training set size, that rule-of-thumb
would be different? 

I have started some experiments simular to his shaping, using some
random variation of the training data (where the random variation
grows over time). Weiland also discussed this in his talk.  I haven't
yet compared decay with no-decay.  I did try (as a lark) using decay
with a regular (non-shaping) training, and it did worse than we
usually get (on same data and same network type/size/shape).  Perhaps
I was using a stupid decay value (0.9998 I think) for that situation. 

I hope to get back to this, but at the moment we are preparing for a
software release to our shareholders (MCC is owned by 20 or so
computer industry corporations).  In the next several weeks a lot of
people will go on Christmas vacation, so I will be able to run a bunch
of nets all at once. They call me the machine vulture. 


-----------------------------------------------------------------
 
From:  Tony Robinson <ajr at digsys.engineering.cambridge.ac.uk>
Date:  Sat, 3 Dec 88 11:10:20 GMT

Just a quick note in reply to your message to `connectionists' to say
that I have tried to use weight decay with back-prop on networks with
order 24 i/p, 24 hidden, 11 o/p units.  The problem was vowel
recognition (I think), it was about 18 months ago, and the problem was
of the unsolvable type (i.e. non-zero final energy). 

My conclusion was that weight decay only made matters worse, and my
justification (to myself) for abandoning weight decay was that you are
not even pretending to do gradient descent any more, and any good
solution formed quickly becomes garbaged by scaling the weights. 

If you want to avoid hidden units sticking on their limiting values,
why not use hidden units with no limiting values, for instance I find
the activation function f(x) = x * x works better than f(x) = 1.0 /
(1.0 + exp(- x)) anyway. 

Sorry I havn't got anything formal to offer, but I hope these notes
help. 

Tony Robinson.


-----------------------------------------------------------------
 
From: jose at tractatus.bellcore.com (Stephen J Hanson)
Date: Sat, 3 Dec 88 11:54:02 EST

Actually, "costs" or "penalty" functions are probably better terms. We
had a poster last week at NIPS that discussed some of the pitfalls and
advantages of two kinds of costs.   I can send you the paper when we
have a version available. 

Stephen J. Hanson (jose at bellcore.com)


-----------------------------------------------------------------
 
[ In a conversation in his office on 06-DEC-88, Dave Rumelhart
described to me several cost functions he has tried. 

The motive for the functions he has tried is different from the motive
for standard weight decay. Standard weight decay, 

\sum_{i,j} w_{i,j}^2 ,

is used to *distribute* weights more evenly over the given
connections, thereby increasing robustness (cf. earlier replies). 

He has tried several other cost functions in an attempt to *localize*,
or concentrate, the weights on a small subset of the given
connections.  The goal is to improve generalization.  His favorite is 

\sum_{i,j} ( w_{i,j}^2 / ( K + w_{i,j}^2 ) )

where K is a constant, around 1 or 2.  Note that this function is 
negatively accelerating, whereas standard weight decay is positively 
accelerating.  This function penalizes small weights (proportionally) 
more than large weights, just the opposite of standard weight decay.

He has also tried, with less satisfying results,

\sum ( 1 -  \exp - (\alpha w_{i,j}^2) )

and

\sum \ln ( K + w_{i,j}^2 ).

Finally, he has tried a cost function designed to make all the fan-in 
weights of a single unit decay, when possible.  That is, the unit is
effectively cut out of the network.  The function is

\sum_i  (\sum_j w_{i,j}^2) / ( K +  \sum_j w_{i,j}^2 ).

Each weight is thereby penalized (inversely) proportionally to the
total fan-in weight of its node. 

--John ]


-----------------------------------------------------------------
 
[ This is also a relevant place to mention my paper in the Proceedings
of the 1988 Connectionist Models Summer School, "Creating local and
distributed bottlenecks in back-propagation networks". I have since
developed those ideas, and have expressed the localized bottleneck
method as gradient descent on an additional cost term.  The cost term
is quite general, and some forms of decay are simply special cases of
it.  --John] 


-----------------------------------------------------------------
 
From: john moody <moody-john at YALE.ARPA>
Date: Sun, 11 Dec 88 22:54:11 EST

Scalettar and Zee did some interesting work on weight decay with back prop
for associative memory. They found that a Unary Representation emerged (see
Baum, Moody, and Wilczek; Bio Cybernetics Aug or Sept 88 for info on Unary
Reps). Contact Tony Zee at UCSB (805)961-4111 for info on weight decay paper.

--John Moody


-----------------------------------------------------------------
 
From: gluck at psych.Stanford.EDU (Mark Gluck)
Date: Sat, 10 Dec 88 16:51:29 PST

I'd appreciate a copy of your weight decay collation. I have a paper in MS
form which illustrates how adding weight decay to the linear-LMS one-layer
net improves its ability to predict human generalization in classification
learning. 

mark gluck
dept of psych
stanford univ,
stanford, ca 94305


-----------------------------------------------------------------
 
From:  INAM000 <INAM%MCGILLB.bitnet at jade.berkeley.edu>  (Tony Marley) 
Date:  SUN 04 DEC 1988 11:16:00 EST

I have been exploring some ideas re COMPETITIVE LEARNING with "noisy
weights" in modeling simple psychophysics.  The task is the classical
one of identifying one of N signals by a simple (verbal) response
-e.g. the stimuli might be squares of different sizes, and one has to
identify the presented one by  saying the appropriate integer.  We
know from classical experiments that people cannot perform this task
perfectly once N gets larger than about 7, but performance degrades
smoothly for larger N. 

I have been developing simulations where the mapping is learnt by
competitive learning, with the weights decaying/varying over time when
they are not reset by relevant inputs.  I have not got too many
results to date, as I have been taking the psychological data
seriously, which means worrying about reaction times, sequential
effects, "end effects" (stimuli at the end of the range more
accurately identified), range effects (increasing the stimulus range
has little effect), etc.. 

Tony Marley


-----------------------------------------------------------------
 
From:  aboulanger at bbn.com  (Albert Boulanger)
Date:  Fri, 2 Dec 88 19:43:14 EST

This one concerns the Hopfield model.  In
    James D Keeler,
    "Basin of Attraction of Neural Network Models", 
    Snowbird Conference Proceedings (1986), 259-264,
it is shown that the basins of attraction become very complicated as
the number of stored patterns increase. He uses a weight modification
method called "unlearning" to smooth out these basins.


Albert Boulanger
BBN Systems & Technologies Corp.
aboulanger at bbn.com


-----------------------------------------------------------------
 
From:  Joerg Kindermann <unido!gmdzi!joerg at uunet.UU.NET>
Date:  Mon, 5 Dec 88 08:21:03 -0100

We used a form of weight decay not for learning but for recall in
multilayer feedforward networks. See the following abstract. Input
patterns are treated as ``weights'' coming from a constant valued
external unit. 

If you would like a copy of the technical report, please send e-mail to
  joerg at gmdzi.uucp
or write to:
  Dr. Joerg Kindermann
  Gesellschaft fuer Mathematik und Datenverarbeitung
  Schloss Birlinghoven
  Postfach 1240
  D-5205 St. Augustin 1
  WEST GERMANY

     Detection of Minimal Microfeatures by Internal Feedback
                    J. Kindermann & A. Linden
                                 
                             Abstract
                                   
We define the notion of minimal microfeatures and introduce a new
method of internal feedback for multilayer networks. Error signals are
used to modify the input of a net. When combined with input DECAY,
internal feedback allows the detection of sets of minimal
microfeatures, i.e. those subpatterns which the network actually uses
for discrimination. Additional noise on the training data increases
the number of minimal microfeatures for a given pattern. The detection
of minimal microfeatures is a first step towards a subsymbolic system
with the capability of self-explanation. The paper provides examples
from the domain of letter recognition.


-----------------------------------------------------------------
 
From:  Helen M. Gigley <hgigley at note.nsf.gov>
Date:  Mon, 05 Dec 88 11:03:23 -0500


I am responding to your request even though my use of decay is not
with respect to learning in connectionist-like models. My focus has
been on a functioning system that can be lesioned. 

One question I have is what is the behavioral association to weight
decay? What aspects of learning is it intended to reflect.  I can
understand that activity decay over time of each cell is meaningful
and reflects a cellular property, but what is weight decay in
comparable terms? 

Now, I will send you offprints if you would like of my work and am
including a list of several publications which you may be able to
peruse.  The model, HOPE, is a hand-tuned structural connectionist
model that is designed to enable lesioning without redesign or
reprogramming to study possible processing causes of aphasia.  Decay
factors as an integral part of dynamic time-dependent processes are
one of several aspects of processing in a neural environment which
potentially affect the global processing results even though they are
defined only locally.  If I can be of any additional help please let
me know. 

Helen Gigley 


References:

Gigley, H.M. Neurolinguistically Constrained Simulation of Sentence
Comprehension:  Integrating Artificial Intelligence and Brain Theorym
Ph.D. Dissertation, UMass/Amherst, 1982.  Available University
Microfilms, Ann Arbor, MI. 

Gigley, H.M.  HOPE--AI and the dynamic process of language behavior. 
in Cognition and Brain Theory 6(1) :39-88, 1983. 

Gigley, H.M.  Grammar viewed as a functioning part of of a cognitive
system. Proceedings of ACL 23rd Annual Meeting, Chicago, 1985 . 

Gigley, H.M.  Computational Neurolinguistics -- What is it all about? 
in IJCAI Proceedings, Los Angeles, 1985. 

Gigley, H.M.  Studies in Artificial Aphasia--experiments in processing
change.  In Journal of Computer Methods and Programs in Biomedicine,
22 (1): 43-50, 1986. 

Gigley, H.M.  Process Synchronization, Lexical Ambiguity Resolution,
and Aphasia.  In Steven L. Small, Garrison Cottrell, and Michael
Tanenhaus (eds.)  Lexical Ambiguity Resolution, Morgen Kaumann, 1988. 


-----------------------------------------------------------------

From:  bharucha at eleazar.Dartmouth.EDU (Jamshed Bharucha)
Date:  Tue, 13 Dec 88 16:56:00 EST

I haven't tried weight decay but am curious about it. I am working on
back-prop learning of musical sequences using a Jordan-style net. The
network develops a musical schema after learning lots of sequences
that have culture-specific regularities. I.e., it learns to generate
expectancies for tones following a sequential context. I'm interested
in knowing how to implement forgetting, whether short term or long
term. 

Jamshed.

-----------------------------------------------------------------

From will at ida.org  Tue Jan  3 10:50:14 1989
From: will at ida.org (Craig Will)
Date: Tue, 3 Jan 89 10:50:14 EST
Subject: Copies of DARPA Request for Proposals Available
Message-ID: <8901031550.AA16284@csed-1>


      Copies of DARPA Request for Proposals Available


     Copies of the DARPA Neural Network Request  for  Propo-
sals  are  now  available  (free) upon request.  This is the
same text as that published  December  16  in  the  Commerce
Business  Daily,  but  reformatted  and with bigger type for
easier reading.  This version was sent as a 4-page  "Special
supplementary issue" to subscribers of Neural Network Review
in the United States.

     To get a copy  mailed  to  you,  send  your  US  postal
address to either:

                       Michele Clouse
                  clouse at ida.org (milnet)

or:
                   Neural Network Review
                       P. O. Box 427
                   Dunn Loring, VA  22027


From harnad at Princeton.EDU  Wed Jan  4 10:12:06 1989
From: harnad at Princeton.EDU (Stevan Harnad)
Date: Wed, 4 Jan 89 10:12:06 EST
Subject: Connectionist Concepts: BBS Call for Commentators
Message-ID: <8901041512.AA11296@psycho.Princeton.EDU>

Below is the abstract of a forthcoming target article to appear in
Behavioral and Brain Sciences (BBS), an international,
interdisciplinary journal that provides Open Peer Commentary on important
and controversial current research in the biobehavioral and cognitive
sciences. Commentators must be current BBS Associates or nominated by a 
current BBS Associate. To be considered as a commentator on this article,
to suggest other appropriate commentators, or for information about how
to become a BBS Associate, please send email to:
	 harnad at confidence.princeton.edu              or write to:
BBS, 20 Nassau Street, #240, Princeton NJ 08542  [tel: 609-921-7771]
____________________________________________________________________

        THE CONNECTIONIST CONSTRUCTION OF CONCEPTS

	Adrian Cussins, New College, Oxford


Keywords: connectionism, representation, cognition, perception,
nonconceptual content, concepts, learning, objectivity, semantics

Computational modelling of cognition depends on an underlying theory
of representation. Classical cognitive science has exploited the
syntax/semantics theory of representation derived from formal
logic. As a consequence, the kind of psychological explanation
supported by classical cognitive science is "conceptualist":
psychological phenomena are modelled in terms of relations between
concepts and between the sensors/effectors and concepts. This kind of
explanation is inappropriate according to Smolensky's "Proper
Treatment of Connectionism" [BBS 11(1) 1988]. Is there an alternative
theory of representation that retains the advantages of classical
theory but does not force psychological explanation into the
conceptualist mold? I outline such an alternative by introducing an
experience-based notion of nonconceptual content and by showing how a
complex construction out of nonconceptual content can satisfy
classical constraints on cognition. Cognitive structure is not
interconceptual but intraconceptual. The theory of representational
structure within concepts allows psychological phenomena to be
explained as the progressive emergence of objectivity. This can be
modelled computationally by transformations of nonconceptual content
which progressively decrease its perspective-dependence through the
formation of a cognitive map.

Stevan Harnad ARPA/INTERNET harnad at confidence.princeton.edu harnad at princeton.edu
harnad at mind.princeton.edu   srh at flash.bellcore.com   harnad at elbereth.rutgers.edu
CSNET:    harnad%mind.princeton.edu at relay.cs.net     UUCP: harnad at princeton.uucp
BITNET:   harnad at pucc.bitnet   harnad1 at umass.bitnet        Phone: (609)-921-7771

From will at ida.org  Wed Jan  4 10:59:54 1989
From: will at ida.org (Craig Will)
Date: Wed, 4 Jan 89 10:59:54 EST
Subject: Copies of DARPA Req for Prop Available
Message-ID: <8901041559.AA13970@csed-1>


      Copies of DARPA Request for Proposals Available


     Copies of the DARPA Neural Network Request  for  Propo-
sals  are  now  available  (free) upon request.  This is the
same text as that published  December  16  in  the  Commerce
Business  Daily,  but  reformatted  and with bigger type for
easier reading.  This version was sent as a 4-page  "Special
supplementary issue" to subscribers of Neural Network Review
in the United States.

     To get a copy  mailed  to  you,  send  your  US  postal
address to either:

                       Michele Clouse
                  clouse at ida.org (milnet)

or:
                   Neural Network Review
                       P. O. Box 427
                   Dunn Loring, VA  22027


From harnad at Princeton.EDU  Wed Jan  4 10:18:00 1989
From: harnad at Princeton.EDU (Stevan Harnad)
Date: Wed, 4 Jan 89 10:18:00 EST
Subject: Speech Perception: BBS Multiple Book Review
Message-ID: <8901041518.AA11306@psycho.Princeton.EDU>

Below is the abstract of a book that will be multiply reviewed in
Behavioral and Brain Sciences (BBS), an international,
interdisciplinary journal that provides Open Peer Commentary on important
and controversial current research in the biobehavioral and cognitive
sciences. Reviewers must be current BBS Associates or nominated by a 
current BBS Associate. To be considered as a reviewer for this book,
to suggest other appropriate reviewers, or for information about how
to become a BBS Associate, please send email to:
	 harnad at confidence.princeton.edu              or write to:
BBS, 20 Nassau Street, #240, Princeton NJ 08542  [tel: 609-921-7771]
____________________________________________________________________

              BBS Multiple Book review of:
SPEECH PERCEPTION BY EAR AND EYE: A PARADIGM FOR PSYCHOLOGICAL INQUIRY
        (Hillsdale NJ: LE Erlbaum Associates 1987)
     
              Dominic William Massaro
            Program in Experimental Psychology
           University of California, Santa Cruz

Keywords: speech perception; vision; audition; categorical perception;
connectionist models; fuzzy logic; sensory impairment; decision making

This book is about the processing of information, particularly in
face-to-face spoken communication where both audible and visible
information are available. Experimental tasks were designed to
manipulate many of these sources of information independently and to
test mathematical fuzzy logical and other models of performance and the
underlying stages of information processing. Multiple sources of
information are evaluated and integrated to achieve speech perception.
Graded information seems to be derived about the degree to which an
input fits a given category rather than just all-or-none categorical
information. Sources of information are evaluated independently, with
the integration process insuring that the least ambiguous sources have
the most impact on the judgment. The processes underlying
speech-perception also occur in a variety of other behaviors, ranging
from categorization to sentence interpretation, decision making and
forming impressions about people.
-----
Stevan Harnad      INTERNET harnad at confidence.princeton.edu harnad at princeton.edu
harnad at mind.princeton.edu   srh at flash.bellcore.com   harnad at elbereth.rutgers.edu
CSNET:    harnad%mind.princeton.edu at relay.cs.net     UUCP: harnad at princeton.uucp
BITNET:   harnad at pucc.bitnet   harnad1 at umass.bitnet        Phone: (609)-921-7771

From mesard at BBN.COM  Thu Jan  5 09:37:12 1989
From: mesard at BBN.COM (mesard@BBN.COM)
Date: Thu, 05 Jan 89 09:37:12 -0500
Subject: Tech Report Announcement 
In-Reply-To: Your message of Mon, 02 Jan 89 16:02:06 -0600.
             <8901022202.AA11713@legendre.aca.mcc.com> 
Message-ID: <mailman.76.1149591159.29955.connectionists@cs.cmu.edu>

Please send me a copy of the tech report

      Explorations of the Mean Field Theory Learning Algorithm

Thanks.

Wayne Mesard            Mesard at BBN.COM
70 Fawcett St.
Cambridge, MA  02138
617-873-1878


From gluck at psych.Stanford.EDU  Thu Jan  5 10:20:17 1989
From: gluck at psych.Stanford.EDU (Mark Gluck)
Date: Thu, 5 Jan 89 07:20:17 PST
Subject: Human Learning & Connectionist Models
Message-ID: <mailman.77.1149591159.29955.connectionists@cs.cmu.edu>


I would grateful to receive information about people using
connectionist/neural-net approaches within cognitive psychology
to model human learning and memory data. Citations to published
work, information about work in progress, and copies of
reprints or preprints would be most welcome and appreciated.

Mark Gluck
Dept. of Psychology
Jordan Hall; Bldg. 420
Stanford University
Stanford, CA 94305

  (415) 725-2434
  gluck at psych.stanford.edu.


From kanderso at BBN.COM  Thu Jan  5 16:30:15 1989
From: kanderso at BBN.COM (kanderso@BBN.COM)
Date: Thu, 05 Jan 89 16:30:15 -0500
Subject: No subject
In-Reply-To: Your message of Tue, 03 Jan 89 00:30:12 -0800.
             <8901030830.AA09915@cogsci.berkeley.edu> 
Message-ID: <mailman.78.1149591159.29955.connectionists@cs.cmu.edu>


I enjoyed John's summary of weight decay, but it raised a few
questions.  Just as John did, i'll be glad to summarize the responses
to the group.

1.  <hinton at ai.toronto.edu> mentioned that " Weight-decay is a version
of what statisticians call "Ridge Regression"."  What do you mean by
"version" is is exactly the same, or just slightly?  I think i know
what Ridge Regression is, but i don't see an obvious strong
connection.  I see a weak one, and after i think about it more maybe i'll
say something about it.

The ideas behind Ridge regression probably came from Levenberg and
Marquardt who used it in nonlinear least squares:

Levenberg K., A Method for the solution of certain nonlinear problems
  in least squares, Q. Appl. Math, Vol 2, pages 164-168, 1944.

Marquardt, D.W., An algorithm for least squares estimation of
  non-linear parameters, J. Soc. Industrial and Applied Math.,
  11:431-441, 1963.

2.  John quoted Dave Rumelhart as saying that standard weight decay
distributes weights more evenly over the  given connections, thereby
increasing robustness.  Why does smearing out large weights increase
robustness?  What does robustness mean here, the ability to generalize?

k

From dreyfus at cogsci.berkeley.edu  Thu Jan  5 21:04:34 1989
From: dreyfus at cogsci.berkeley.edu (Hubert L. Dreyfus)
Date: Thu, 5 Jan 89 18:04:34 PST
Subject: Connectionist Concepts: BBS Call for Commentators
Message-ID: <8901060204.AA02484@cogsci.berkeley.edu>

Stevan:
    Stuart and I would like to write a joint comment on
Cussins' paper.  Please send us the latest version by
e-mail or regular mail whichever you prefer.
    Hubert Dreyfus

From daugman%charybdis at harvard.harvard.edu  Fri Jan  6 10:41:42 1989
From: daugman%charybdis at harvard.harvard.edu (j daugman)
Date: Fri, 6 Jan 89 10:41:42 EST
Subject: Neural Networks in Natural and Artificial Vision
Message-ID: <mailman.79.1149591159.29955.connectionists@cs.cmu.edu>


For preparation of 1989 conference tutorials and reviews,
I would be grateful to receive any available p\reprints
reporting research on neural network models of human /
biological vision and applications in artificial vision.
Thanks in advance.

John Daugman
Harvard University
950 William James Hall
Cambridge, Mass.  02138

From josh at flash.bellcore.com  Fri Jan  6 14:32:55 1989
From: josh at flash.bellcore.com (Joshua Alspector)
Date: Fri, 6 Jan 89 14:32:55 EST
Subject: VLSI Implementations of Neural Networks
Message-ID: <8901061932.AA07422@flash.bellcore.com>

I will be giving a tuturial on the above topic at the Custom Integrated
Circuits Conference.  Vu grafs are due at the end of February and I
would like to include as complete a description as possible of current
efforts in the VLSI implementation of neural networks.  I would appreciate
receiving any preprints or hard copies of vu grafs regarding any work
you are doing.  E-mail reports are also acceptable.  Please send to:


Joshua Alspector
Bellcore, MRE 2E-378
445 South St.
Morristown, NJ 07960-1910

From neural!jsd  Fri Jan  6 12:45:14 1989
From: neural!jsd (John Denker)
Date: Fri, 6 Jan 89 12:45:14 EST
Subject: confidence / runner-up activation
Message-ID: <8901061744.AA10566@neural.UUCP>

Yes, we've been using the activation level of the runner-up neurons
to provide confidence information in our character recognizer for some time.
The work was reported at the last San Diego mtg and at the last Denver mtg.
--- jsd (John Denker)

From netlist at psych.Stanford.EDU  Tue Jan 10 09:43:16 1989
From: netlist at psych.Stanford.EDU (Mark Gluck)
Date: Tue, 10 Jan 89 06:43:16 PST
Subject: Stanford Adaptive Networks Colloquium
Message-ID: <mailman.80.1149591159.29955.connectionists@cs.cmu.edu>

        Stanford University Interdisciplinary Colloquium Series:
                           ADAPTIVE NETWORKS
                        AND THEIR APPLICATIONS
  Co-sponsored by the Departments of Psychology and Electrical Engineering

                     Winter Quarter 1989 Schedule
                     ----------------------------


Jan. 12th (Thursday, 3:30pm):
-----------------------------
STEVEN PINKER                                                  CONNECTIONISM AND
Department of Brain & Cognitive Sciences             THE FACTS OF HUMAN LANGUAGE
Massachusetts Institute of Technology
  email: steve at psyche.mit.edu               (with commentary by David Rumelhart)

Jan. 24th (Tuesday, 3:30pm):
----------------------------
LARRY MALONEY                                             LEARNING BY ASSERTION:
Department of Psychology                      CALIBRATING A SIMPLE VISUAL SYSTEM
New York University
 email: ltm at xp.psych.nyu.edu

Feb. 9th (Thursday, 3:30pm):
----------------------------
CARVER MEAD                                       VLSI MODELS OF NEURAL NETWORKS
Moore Professor of Computer Science
California Institute of Technology

Feb. 21st (Tuesday, 3:30pm):
----------------------------
PIERRE BALDI                            ON SPACE AND TIME IN NEURAL COMPUTATIONS
Jet Propulsion Laboratory
California Institute of Technology
 email: pfbaldi at caltech.bitnet

Mar. 14th (Tuesday, 3:30pm):
----------------------------
ALAN LAPEDES                        NONLINEAR SIGNAL PROCESSING WITH NEURAL NETS
Theoretical Division - MS B213
Los Alamos National Laboratory
 email: asl at lanl.gov

                        Additional Information
                        ----------------------

The talks (including discussion) last about one hour and fifteen
minutes. Following each talk, there will be a reception.  Unless
otherwise noted, all talks will be held in room 380-380F, which
is in the basement of the Mathematical Sciences buildings.  To be
placed on an electronic-mail distribution list for information
about these and other adaptive network events in the Stanford
area, send email to netlist at psych.stanford.edu. For additional
information, contact: Mark Gluck, Department of Psychology,
Bldg. 420, Stanford University, Stanford, CA 94305 (phone
415-725-2434 or email to gluck at psych.stanford.edu). Program Committe:
Committee: Bernard Widrow (E.E.), David Rumelhart, Misha Pavel,
Mark Gluck (Psychology).  This series is supported by the Departments 
of Psychology and Electrical Engineering and by a gift from 
the Thomson-CSF Corporation.

    Coming this Spring: D. Parker, B. McNaughton, G. Lynch & R. Granger

From hinton at ai.toronto.edu  Tue Jan 10 10:09:11 1989
From: hinton at ai.toronto.edu (Geoffrey Hinton)
Date: Tue, 10 Jan 89 10:09:11 EST
Subject: new tech report
Message-ID: <89Jan10.100924est.10956@ephemeral.ai.toronto.edu>


The following report can be obtained by sending an email request to
carol at ai.toronto.edu   If this fails try carol%ai.toronto.edu at relay.cs.net
Please do not send email to me about it (so don't use "reply" or "answer").


"Deterministic Boltzmann Learning Performs Steepest Descent in Weight-space."
				       
			      Geoffrey E. Hinton
			Department of Computer Science
			    University of Toronto
				       
			 Technical report CRG-TR-89-1

				   ABSTRACT

The Boltzmann machine learning procedure has been successfully applied in
deterministic networks of analog units that use a mean field approximation to
efficiently simulate a truly stochastic system {Peterson and Anderson, 1987}.
This type of ``deterministic Boltzmann machine'' (DBM) learns much faster than
the equivalent ``stochastic Boltzmann machine'' (SBM), but since the learning
procedure for DBM's is only based on an analogy with SBM's, there is no
existing proof that it performs gradient descent in any function, and it has
only been justified by simulations.  By using the appropriate interpretation
for the way in which a DBM represents the probability of an output vector
given an input vector, it is shown that the DBM performs steepest descent in
the same function as the original SBM, except at rare discontinuities.  A very
simple way of forcing the weights to become symmetrical is also described, and
this makes the DBM more biologically plausible than back-propagation.


From netlist at psych.Stanford.EDU  Wed Jan 11 09:29:01 1989
From: netlist at psych.Stanford.EDU (Mark Gluck)
Date: Wed, 11 Jan 89 06:29:01 PST
Subject: Thurs (1/12): Steven Pinker on Language Models
Message-ID: <mailman.81.1149591159.29955.connectionists@cs.cmu.edu>

            Stanford University Interdisciplinary Colloquium Series:
                   Adaptive Networks and their Applications

                        Jan. 12th (Thursday, 3:30pm):
                        -----------------------------
     
********************************************************************************
    
STEVEN PINKER                                                  CONNECTIONISM AND
Department of Brain & Cognitive Sciences             THE FACTS OF HUMAN LANGUAGE
Massachusetts Institute of Technology
  email: steve at psyche.mit.edu               (with commentary by David Rumelhart)
     
********************************************************************************
     
                               Abstract
     
                                        
     Connectionist modeling holds the promise of making important
     contributions to our understanding of human language. For example,
     such models can explore the role of parallel processing, constraint
     satisfaction, neurologically realistic architectures, and efficient
     pattern-matching in linguistic processes.
     
     However, the current connectionist program of language modeling seems
     to be motivated by a different set of goals: reviving classical
     associationism, elminating levels of linguistic representation, and
     maximizing the role of top-down, knowledge-driven processing.
     
     I present evidence (developed in collaboration with Alan Prince) that
     these goals are ill-advised, because the empirical assumptions they
     make about human language are simply false.  Specifically, evidence
     from adults' and children's abilities with morphology, semantics, and
     syntax suggests that people possess formal linguistic rules and
     autonomous linguistic representations, which are not based on the
     statistical correlations among microfeatures that current
     connectionist models rely on so heavily.
     
     Moreover, I suggest that treating the existence of
     mentally-represented rules and representations as an empirical
     question will lead to greater progress than rejecting them on a priori
     methodological grounds. The data suggest that some linguistic
     processes are saliently rule-like, and call for a suitable
     symbol-processing architecture, whereas others are associative, and
     can be insightfully modeled using connectionist mechanisms. Thus
     taking the facts of human language seriously can lead to an
     interesting rapprochement between standard psycholinguistics and
     connectionist modeling.
     
     
                          Additional Information
                          ----------------------

Location: Room 380-380F, which can be reached through the lower level
       between the Psychology and Mathematical Sciences buildings. 
Technical Level: These talks will be technically oriented and are intended 
       for persons actively working in related areas. They are not intended
       for the newcomer seeking general introductory material. 
Mailing lists: To be added to the network mailing list, netmail to
       netlist at psych.stanford.edu. For additional information, or contact
       Mark Gluck (gluck at psych.stanford.edu).
     
  Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and
       Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ.
    

From unido!gmdzi!joerg at uunet.UU.NET  Thu Jan 12 04:30:50 1989
From: unido!gmdzi!joerg at uunet.UU.NET (Joerg Kindermann)
Date: Thu, 12 Jan 89 08:30:50 -0100
Subject: CALL FOR PARTICIPATION
Message-ID: <8901120730.AA03021@gmdzi.UUCP>

			  Workshop ``DANIP''

	 Distributed Adaptive Neural Information Processing.

			    24.-25.4.1989
	Gesellschaft fuer Mathematik und Datenverarbeitung mbH
			    Sankt Augustin

Neural information processing is constantly gaining increasing attention in
many scientific areas. As a consequence the first ``Workshop Konnektionismus''
at the GMD was organized in February 1988. It gave an overview of research
activities in neural networks and their applications to Artificial
Intelligence.  Now, almost a year later, the time has come to focus on the
state of neural information processing itself.

The aim of the workshop is to discuss TECHNICAL aspects of information
processing in neural networks on the basis of personal contributions in one of
the following areas:


  -  new or improved learning algorithms (including evaluations)
  -  self organization of structured (non-localist) neural networks
  -  time series analysis by means of neural networks
  -  adaptivity, e.g the problem of relearning 
  -  adequate coding of information for neural processing
  -  generalization
  -  weight interpretation (correlative and other)}

Presentations which report on ``work in progress'' are encouraged.  The size of
the workshop will be limited to 15 contributions of 30 minutes in length.  A
limited number of additional participants may attend the workshop and take part
in the discussions.

To apply for the workshop as a contributor, please send information about your
contribution (1-2 pages in English or a relevant publication).

If you want to participate without giving an oral presentation, please include
a description of your background in the field of neural networks.

Proceedings on the basis of workshop contributions will be published after the
workshop.

SCHEDULE:

28 February 1989:    deadline for submission of applications
20 March 1989:       notification of acceptance
24 - 25 April 1989:  workshop ``DANIP''
31 July 1989:        deadline for submission of full papers
                     to be included in the proceedings

Applications should be sent to the following address:

	     Dr. Joerg Kindermann   or   Alexander Linden
		     Gesellschaft fuer Mathematik
		      und Datenverarbeitung mbH
		       - Schloss Birlinghoven -
		Postfach 1240 D-5205 Sankt Augustin 1
			     WEST GERMANY

                     e-mail: joerg at gmdzi al at gmdzi


From pwh at ece-csc.ncsu.edu  Fri Jan 13 17:28:39 1989
From: pwh at ece-csc.ncsu.edu (Paul Hollis)
Date: Fri, 13 Jan 89 17:28:39 EST
Subject: No subject
Message-ID: <8901132228.AA05092@ece-csc.ncsu.edu>


                              NEURAL NETWORKS

                              CALL FOR PAPERS

             International Joint Conference on Neural Networks
                             June 18-22, 1989
                             Washington, D.C.


The 1989  IEEE/INNS  International  Joint  Conference  on  Neural  Networks
(IJCNN-89)  will  be  held  at the Sheraton Washington Hotel in Washington,
D.C., USA from June 18-22, 1989.  IJCNN-89 is the first conference in a new
series  devoted  to the technology and science of neurocomputing and neural
networks in all of their aspects. The series  replaces  the  previous  IEEE
ICNN  and  INNS  Annual Meeting series and is jointly sponsored by the IEEE
Technical Activities Board Neural Network Committee and  the  International
Neural Network Society (INNS).  IJCNN-89 will be the only major neural net-
work meeting of 1989 (IEEE ICNN-89 and the 1989 INNS  Annual  Meeting  have
both  been cancelled).  Thus, it behooves all members of the neural network
community who have important new results for presentation to prepare  their
papers now and submit them by the IJCNN-89 deadline of 1 FEBRUARY 1989. The
Conference Proceedings will be distributed AT THE REGISTRATION DESK to  all
regular  conference registrants as well as to all student registrants.  The
conference will include a day of tutorials (June 18), the exhibit hall (the
neurocomputing  industry's  primary  annual trade show), plenary talks, and
social events.  Mark your calendar today and plan to attend IJCNN-89 -- the
definitive annual progress report on the neurocomputing revolution!


DEADLINE FOR SUBMISSION OF PAPERS for IJCNN-89 is FEBRUARY 1, 1989.  Papers
of 8 pages or less are solicited in the following areas:

-Real World Applications                    -Associative Memory
-Supervised Learning Theory                 -Image Analysis
-Reinforcement Learning Theory              -Self-Organization
-Robotics and Control                       -Neurobiological Models
-Optical Neurocomputers                     -Vision
-Optimization                               -Electronic Neurocomputers
-Neural Network Architectures & Theory      -Speech Recognition


FULL PAPERS in camera-ready form (1 original on Author's Kit  forms  and  5
reduced  8  1/2" x 11" copies) should be submitted to Nomi Feldman, Confer-
ence Coordinator, at the address below. For more details,  or  to   request
your  IEEE Author's Kit, call or write:

               Nomi Feldman, IJCNN-89 Conference Coordinator
                  3770 Tansy Street, San Diego, CA  92121
                              (619) 453-6222

From rudnick at cse.ogc.edu  Sat Jan 14 18:05:27 1989
From: rudnick at cse.ogc.edu (Mike Rudnick)
Date: Sat, 14 Jan 89 15:05:27 PST
Subject: genetic search and neural nets
Message-ID: <8901142305.AA07774@ogccse.OGC.EDU>

I am a phd candidate in computer science at Oregon Graduate Center.
My research interest is in using genetic search to tackle artificial
neural network (ANN) scaling issues.  My particular orientation is to
view minimizing interconnections as a central issue, partly motivated
by VLSI implementation issues.

I am starting a mailing list for those interested in applying
genetic search to/with/for ANNs.  Mail a request to
Neuro-evolution-request at cse.ogc.edu to have your name added to the
list.

A bibliography of work relating artificial neural networks (ANNs) and
genetic search is available.  It is organized/oriented for someone
familiar with the ANN literature but unfamiliar with the genetic
search literature.  Send a request to Neuro-evolution-request at cse.ogc.edu
for a copy.  If there is sufficient interest I will post the
bibliography here.

--------------------------------------------------------------------------
Mike Rudnick			CSnet:	rudnick at cse.ogc.edu
Computer Science & Eng. Dept.	ARPAnet:  rudnick%cse.ogc.edu at relay.cs.net
Oregon Graduate Center		BITNET:  rudnick%cse.ogc.edu at relay.cs.net
19600 N.W. von Neumann Dr.	UUCP:	{tektronix,verdix}!ogccse!rudnick
Beaverton, OR. 97006-1999	(503) 690-1121 X7390
--------------------------------------------------------------------------

From sontag at fermat.rutgers.edu  Tue Jan 17 14:08:03 1989
From: sontag at fermat.rutgers.edu (sontag@fermat.rutgers.edu)
Date: Tue, 17 Jan 89 14:08:03 EST
Subject: Kolmogorov's superposition theorem
Message-ID: <8901171908.AA00964@control.rutgers.edu>

*** I am posting this for Professor Rui de Figuereido, a researcher in Control
    Theory and Circuits who does not subscribe to this list.  Please direct 
    cc's of all responses to his e-mail address (see below).
							-eduardo s. ***

   KOLMOGOROV'S SUPERPOSITION THEOREM AND ARTIFICIAL NEURAL NETWORKS

                    Rui J. P. de Figueiredo
           Dept. of Electrical and Computer Engineering
              Rice University, Houston, TX 77251-1892
                   e-mail: rui at zeta.rice.edu
                

The implementation of the Kolmogorov-Arnold-Sprecher Superposition Theorem 
[1-3] in terms of artificial neural networks was first presented and fully 
discussed by me in 1980 [4]. I also discussed, then [4], applications of
these structures to statistical pattern recognition and image and multi-
dimensional signal processing. However, I did not use the words "neural 
networks" in defining the underlying networks. For this reason, the current
researchers on neural nets including Robert Hecht-Nielsen [5] do not seem to
be aware of my contribution [4]. I hope that this note will help correct
history.

  Incidentally, there is a misprint in [4]. In [4], please insert "no" in
the statement before eqn.(4). That statement should read: "Sprecher showed
that lambda can be any nonzero number which satisfies no equation ..."

[1] A.K.Kolmogorov, "On the representation of continuous functions of several
    variables by superposition of continuous functions of one variable and 
    addition," Dokl.Akad.Nauk.SSSR,Vol.114,pp.369-373,1957.
[2] V.I.Arnol'd, "On functions of three variables," Dokl.Akad.Nauk.SSSR, 
    Vol.114,pp.953-956,1957.
[3] D.A.Sprecher, "An improvement in the superposition theorem of Kolmogorov,"
    J.Math.Anal.Appl.,Vol.38,pp.208-213,1972.
[4] Rui J.P.de Figueiredo, "Implications and applications of Kolmogorov's
    superposition theorem,"IEEE Trans.Auto.Contr.,Vol.AC-25,pp.1227-1231,1980.
[5] R.Hecht-Nielsen, "Kolmogorov's mapping neural network existence theorem,"
    IEEE 1st Int.Conf.on Neural Networks, San Diego,CA,June 21-24,1987,paper
    III-11.

From ncr-fc!avery at ncr-sd.sandiego.ncr.com  Tue Jan 17 19:43:43 1989
From: ncr-fc!avery at ncr-sd.sandiego.ncr.com (ncr-fc!avery@ncr-sd.sandiego.ncr.com)
Date: Tue, 17 Jan 89 17:43:43 MST
Subject: new address
Message-ID: <8901180043.AA19084@ncr-fc.FtCollins.NCR.com>

I have a new e-mail address. Not the one in the relpy field but this one.

avery%ncr-fc at ncr-sd.sandiego.ncr.com

Will you please get me back on the discussion group.


From MUMME%IDCSVAX.BITNET at CUNYVM.CUNY.EDU  Tue Jan 17 23:22:00 1989
From: MUMME%IDCSVAX.BITNET at CUNYVM.CUNY.EDU (MUMME%IDCSVAX.BITNET@CUNYVM.CUNY.EDU)
Date: Tue, 17 Jan 89 20:22 PST
Subject: Tech. Report Available
Message-ID: <mailman.82.1149591159.29955.connectionists@cs.cmu.edu>

The following tech. report is available from the University of Illinois
Dept. of Computer Science:


UIUCDCS-R-88-1485

STORAGE CAPACITY OF THE LINEAR ASSOCIATOR:  BEGINNINGS OF A THEORY
                   OF COMPUTATIONAL MEMORY

                             by

                        Dean C. Mumme
                          May, 1988

                           ABSTRACT


This thesis presents a characterization of a simple connectionist-system,
the linear-associator, as both a memory and a classifier.  Toward this end,
a theory of memory based on information-theory is devised.  The principles
of the information-theory of memory are then used in conjunction with the
dynamics of the linear-associator to discern its storage capacity and
classification capabilities as they scale with system size.  To determine
storage capacity, a set of M vector-pairs called "items" are stored in an
associator with N connection-weights.  The number of bits of information
stored by the system is then determined to be about (N/2)logM.  The
maximum number of items storable is found to be half the number of weights
so that the information capacity of the system is quantified to be
(N/2)logN.

Classification capability is determined by allowing vectors not stored by
the associator to appear its input.  Conditions necessary for the associator
to make a correct response are derived from constraints of information theory
and the geometry of the space of input-vectors.  Results include derivation of
the information-throughput of the associator, the amount of information that
that must be present in an input-vector and the number of vectors that can
be classified by an associator of a given size with a given storage load.

Figures of merit are obtained that allow comparison of capabilities of
general memory/classifier systems.  For an associator with a simple
non-linarity on its output, the merit figures are evaluated and shown to be
suboptimal.  Constant attention is devoted to relative parameter size required
to obtain the derived performance characteristics.  Large systems are shown
to perform nearest the optimum performance limits and suggestions are made
concerning system architecture needed for best results.  Finally, avenues for
extension of the theory to more general systems are indicated.


This tech. report is essentially my Ph.D. thesis completed last May and can
be obtained by sending e-mail to:

                erna at a.cs.uiuc.edu

Please do not send requests to me since I now live in Idaho and don't have
access to the tech. reports.

When replying to this notice, please do not use REPLY or send a note to
"CONNECTIONISTS...".   Send your request directly to Erna.

Comments, questions and suggestions about the work can be sent directly to
me at the address below.

Thank You!

Dean C. Mumme                         bitnet:  mumme at idcsvax
Dept. of Computer Science
University of Idaho
Moscow, ID 83843

From poggio at wheaties.ai.mit.edu  Tue Jan 17 22:47:17 1989
From: poggio at wheaties.ai.mit.edu (Tomaso Poggio)
Date: Tue, 17 Jan 89 22:47:17 EST
Subject: Kolmogorov's superposition theorem
In-Reply-To: sontag@fermat.rutgers.edu's message of Tue, 17 Jan 89 14:08:03 EST <8901171908.AA00964@control.rutgers.edu>
Message-ID: <8901180347.AA21088@rice-chex.ai.mit.edu>

Kolmogorov 's theorem and its relation to networks are discussed in
Biol. Cyber., 37, 167-186, 1979. (On the representation of multi-input
systems: computational properties of polynomial algorithms, Poggio and
Reichardt). There are references there to older papers (see especially
the two nice papers by H. Abelson).

From mozer%neuron at boulder.Colorado.EDU  Wed Jan 18 16:19:46 1989
From: mozer%neuron at boulder.Colorado.EDU (Michael C. Mozer)
Date: Wed, 18 Jan 89 14:19:46 MST
Subject: oh boy, more tech reports...
Message-ID: <8901182119.AA00413@neuron>


Please e-mail requests to "kate at boulder.colorado.edu".


            Skeletonization:  A Technique for Trimming the Fat
                  from a Network via Relevance Assessment

                             Michael C. Mozer
                              Paul Smolensky

                          University of Colorado
                      Department of Computer Science
                        Tech Report # CU-CS-421-89

This paper proposes a means of using the knowledge in a network  to  deter-
mine  the  functionality  or  _relevance_ of individual units, both for the
purpose of understanding the network's behavior and improving  its  perfor-
mance.   The  basic  idea  is to iteratively train the network to a certain
performance criterion, compute a measure of relevance that identifies which
input  or  hidden units are most critical to performance, and automatically
trim the least relevant units.  This  _skeletonization_  technique  can  be
used to simplify networks by eliminating units that convey redundant infor-
mation; to improve learning performance by first learning with spare hidden
units  and  then  trimming  the unnecessary ones away, thereby constraining
generalization; and to understand the behavior  of  networks  in  terms  of
minimal "rules."

[An abridged version of this TR will appear in NIPS proceedings.]

---------------------------------------------------------------------------

And while I'm at it, some other recent junk, I mean stuff...


                   A Focused Back-Propagation Algorithm
                     for Temporal Pattern Recognition

                             Michael C. Mozer

                           University of Toronto
                       Connectionist Research Group
                         Tech Report # CRG-TR-88-3

Time is at the heart of many pattern recognition tasks, e.g., speech recog-
nition.   However,  connectionist learning algorithms to date are not well-
suited for dealing with time-varying input patterns.  This paper introduces
a  specialized  connectionist architecture and corresponding specialization
of the back-propagation learning algorithm  that  operates  efficiently  on
temporal  sequences.   The  key  feature  of the architecture is a layer of
self-connected hidden units that integrate their current value with the new
input  at  each  time step to construct a static representation of the tem-
poral input sequence.  This architecture avoids two deficiencies  found  in
other  models of sequence recognition:  first, it reduces the difficulty of
temporal credit assignment by focusing the back  propagated  error  signal;
second,  it  eliminates  the  need  for a buffer to hold the input sequence
and/or intermediate activity levels.  The latter property  is  due  to  the
fact  that  during  the  forward  (activation)  phase, incremental activity
_traces_ can be locally computed that hold all  information  necessary  for
back propagation in time.  It is argued that this architecture should scale
better than conventional recurrent architectures with respect  to  sequence
length.   The architecture has been used to implement a temporal version of
Rumelhart and McClelland's verb past-tense model.  The hidden  units  learn
to  behave something like Rumelhart and McClelland's "Wickelphones," a rich
and flexible representation of temporal information.

---------------------------------------------------------------------------

     A Connectionist Model of Selective Attention in Visual Perception

                             Michael C. Mozer

                           University of Toronto
                       Connectionist Research Group
                         Tech Report # CRG-TR-88-4

This paper describes a model of selective attention that is part of a  con-
nectionist  object  recognition system called MORSEL.  MORSEL is capable of
identifying multiple objects presented simultaneously on its "retina,"  but
because  of  capacity  limitations, MORSEL requires attention to prevent it
from trying to do too much at once.  Attentional selection is performed  by
a  network  of  simple  computing units that constructs a variable-diameter
"spotlight"  on  the  retina,  allowing  sensory  information  within   the
spotlight  to be preferentially processed.  Simulations of the model demon-
strate that attention is more critical for less familiar items and that at-
tention  can  be  used  to reduce inter-item crosstalk.  The model suggests
four distinct roles of attention in visual information processing, as  well
as  a  novel view of attentional selection that has characteristics of both
early and late selection theories.

From Scott.Fahlman at B.GP.CS.CMU.EDU  Wed Jan 18 13:54:02 1989
From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU)
Date: Wed, 18 Jan 89 13:54:02 EST
Subject: Benchmark collection
Message-ID: <mailman.83.1149591159.29955.connectionists@cs.cmu.edu>


The mailing list "nn-bench at cs.cmu.edu" is now in operation.  I believe that
all the "add me" requests received prior to 1/17/89 have been serviced.  Of
course, it's possible that we messed up some of the requests.  If
you sent in a request more than a couple of days ago and if you have not
yet seen any "nn-bench" mail, please contact "nn-bench-request at cs.cmu.edu"
and we'll investigate.  New requests should be sent to that same address.
The list currently has about 80 subscribers, plus two rebroadcast sites.

-- Scott Fahlman, CMU


From pollack at cis.ohio-state.edu  Fri Jan 20 15:40:09 1989
From: pollack at cis.ohio-state.edu (Jordan B. Pollack)
Date: Fri, 20 Jan 89 15:40:09 EST
Subject: Technical Report: LAIR 89-JP-NIPS
Message-ID: <8901202040.AA13239@orange.cis.ohio-state.edu>

Preprint of a NIPS paper is now available.

Request LAIR 89-JP-NIPS From:

  Randy Miller
  CIS Dept/Ohio State University
  2036 Neil Ave
  Columbus, OH  43210

or respond to this message but MODIFY THE To: AND Cc: LINES!!!!!

------------------------------------------------------------------------------

	IMPLICATIONS OF RECURSIVE DISTRIBUTED REPRESENTATIONS

			  Jordan B. Pollack
		      Laboratory for AI Research
			Ohio State University
			  Columbus, OH 43210

I will describe my recent results on the automatic development of
fixed-width recursive distributed representations of variable-sized
hierarchal data structures.  One implication of this work is that
certain types of AI-style data-structures can now be represented in
fixed-width analog vectors.  Simple inferences can be performed using
the type of pattern associations that neural networks excel at.
Another implication arises from noting that these representations
become self-similar in the limit.  Once this door to chaos is opened,
many interesting new questions about the representational basis of
intelligence emerge, and can (and will) be discussed.


From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU  Sat Jan 21 00:06:08 1989
From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (thanasis kehagias)
Date: Sat, 21 Jan 89 00:06:08 EST
Subject: No subject
Message-ID: <mailman.84.1149591159.29955.connectionists@cs.cmu.edu>


i was reading through the abstracts of the Boston 1988 INNS conference
and noticed H. Bourlard and C. Welleken's paper on the relations between
Hidden Markov Models and Multi Layer Perceptron. Does anybody have any
pointers to papers on the subject by the same (preferrably) or other
authors? or the e-mail address of these two authors?


Thanasis Kehagias

From netlist at psych.Stanford.EDU  Sun Jan 22 18:16:33 1989
From: netlist at psych.Stanford.EDU (Mark Gluck)
Date: Sun, 22 Jan 89 15:16:33 PST
Subject: Thurs (1/12): Steven Pinker on Language Models
Message-ID: <mailman.85.1149591159.29955.connectionists@cs.cmu.edu>

            Stanford University Interdisciplinary Colloquium Series:
                   Adaptive Networks and their Applications

                        Jan. 24th (Tuesday, 3:30pm):
                        -----------------------------
     
********************************************************************************
    
        Learning by Assertion: Calibrating a Simple Visual System

                            LARRY MALONEY
                      Deptartment of Psychology
                     6 Washington Place; 8th Floor
                         New York University
                         New York, NY  10003

                     email: ltm at xp.psych.nyu.edu

********************************************************************************
     
                               Abstract

An ideal visual system is calibrated if its estimates reflect the
actual state of the scene: Straight lines, for example, should be judged 
to be straight. If an ideal visual system is modeled as a neural network,
then it is calibrated only if the weights linking elements of the
the network are assigned correct values. I describe a method 
(`Learning by Assertion') for calibrating an ideal visual system by 
adjusting the weights. The method requires no explicit feedback or prior 
knowledge concerning the contents of the environment. This work is
relevant to biological visual development and calibration, to the 
calibration of machine vision systems, and to the design of adaptive
network algorithms.

     
                          Additional Information
                          ----------------------

Location: Room 380-380F, which can be reached through the lower level
       between the Psychology and Mathematical Sciences buildings. 
Technical Level: These talks will be technically oriented and are intended 
       for persons actively working in related areas. They are not intended
       for the newcomer seeking general introductory material. 
Mailing lists: To be added to the network mailing list, netmail to
       netlist at psych.stanford.edu. For additional information, or contact
       Mark Gluck (gluck at psych.stanford.edu).
     
  Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and
       Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ.
    

From netlist at psych.Stanford.EDU  Sun Jan 22 18:23:16 1989
From: netlist at psych.Stanford.EDU (Mark Gluck)
Date: Sun, 22 Jan 89 15:23:16 PST
Subject: (Tues. 1/24): Larry Maloney on Visual Calibration
Message-ID: <mailman.86.1149591159.29955.connectionists@cs.cmu.edu>

            Stanford University Interdisciplinary Colloquium Series:
                   Adaptive Networks and their Applications

                        Jan. 24th (Tuesday, 3:30pm):
                        -----------------------------
     
********************************************************************************
    
        Learning by Assertion: Calibrating a Simple Visual System

                            LARRY MALONEY
                      Deptartment of Psychology
                     6 Washington Place; 8th Floor
                         New York University
                         New York, NY  10003

                     email: ltm at xp.psych.nyu.edu

********************************************************************************
     
                               Abstract

An ideal visual system is calibrated if its estimates reflect the
actual state of the scene: Straight lines, for example, should be judged 
to be straight. If an ideal visual system is modeled as a neural network,
then it is calibrated only if the weights linking elements of the
the network are assigned correct values. I describe a method 
(`Learning by Assertion') for calibrating an ideal visual system by 
adjusting the weights. The method requires no explicit feedback or prior 
knowledge concerning the contents of the environment. This work is
relevant to biological visual development and calibration, to the 
calibration of machine vision systems, and to the design of adaptive
network algorithms.

     
                          Additional Information
                          ----------------------

Location: Room 380-380F, which can be reached through the lower level
       between the Psychology and Mathematical Sciences buildings. 
Technical Level: These talks will be technically oriented and are intended 
       for persons actively working in related areas. They are not intended
       for the newcomer seeking general introductory material. 
Mailing lists: To be added to the network mailing list, netmail to
       netlist at psych.stanford.edu. For additional information, or contact
       Mark Gluck (gluck at psych.stanford.edu).
     
  Co-Sponsored by: Departments of Electrical Engineering (B. Widrow) and
       Psychology (D. Rumelhart, M. Pavel, M. Gluck), Stanford Univ.
    

From rsun at cs.brandeis.edu  Sun Jan 22 17:02:48 1989
From: rsun at cs.brandeis.edu (Ron Sun)
Date: Sun, 22 Jan 89 17:02:48 est
Subject: Technical Report: LAIR 89-JP-NIPS
Message-ID: <mailman.87.1149591159.29955.connectionists@cs.cmu.edu>

                           
Please send this TR to

Ron Sun
Brandeis U
CS
Waltham, MA 02254


Thank you.


From koch%HAMLET.BITNET at VMA.CC.CMU.EDU  Mon Jan 23 14:29:31 1989
From: koch%HAMLET.BITNET at VMA.CC.CMU.EDU (Christof Koch)
Date: Mon, 23 Jan 89 11:29:31 PST
Subject: Gimme a break!
Message-ID: <890123112923.20203114@Hamlet.Caltech.Edu>

re. "Call for papers IJCNN, the only major neural network meeting of 1989 [sic]"


Neural Information Processing Systems 1989 at Denver will be held this year from
November 28-th until November 30-th followed by a workshop on December 1/2.

This is the third annual meeting held under the auspices of the IEEE, Society of

Neuroscience, and APS.

For further information contact Scott Kirkpatrick, General Chairman
(kirk at ibm.com) or wait for the Call for Papers which is in preparation.


Christof

From jbower at bek-mc.caltech.edu  Mon Jan 23 16:20:39 1989
From: jbower at bek-mc.caltech.edu (Jim Bower)
Date: Mon, 23 Jan 89 13:20:39 pst
Subject: NIPS 89
Message-ID: <8901232120.AA14266@bek-mc.caltech.edu>


 To whom it may concern:
 	A few days ago there was an announcement on the connectionist
 network that only one "major" neural network meeting would be held in
 1989.  While "major" in past meeting announcements for the INNS and the
 IEEE San Diego meetings has seemed most often to be equated with total
 attendance, and size of the exhibit area,  an equally important measure
 might be the overall quality of the work presented and therefore, the
 importance of the meeting to the field.  Accordingly, the previous
 announcement should probably be amended to include the fact that the
 Third annual Neural Information Processing Systems (NIPS) meeting will
 be held in late 1989 in Denver.  While the objective of this meeting is not
 to be the biggest meeting ever, and submitted papers are refereed, authors
 might consider submitting important results to this meeting anyway.  A
 call for papers will be announced, as usual, on this network.


Jim Bower

From jbower at bek-mc.caltech.edu  Mon Jan 23 16:17:05 1989
From: jbower at bek-mc.caltech.edu (Jim Bower)
Date: Mon, 23 Jan 89 13:17:05 pst
Subject: NIPS
Message-ID: <8901232117.AA14257@bek-mc.caltech.edu>


 To whom it may concern:
 	A few days ago there was an announcement on the connectionist
 network that only one "major" neural network meeting would be held in
 1989.  While "major" in past meeting announcements for the INNS and the
 IEEE San Diego meetings has seemed most often to be equated with total
 attendance, and size of the exhibit area,  an equally important measure
 might be the overall quality of the work presented and therefore, the
 importance of the meeting to the field.  Accordingly, the previous
 announcement should probably be amended to include the fact that the
 Third annual Neural Information Processing Systems (NIPS) meeting will
 be held in late 1989 in Denver.  While the objective of this meeting is not
 to be the biggest meeting ever, and submitted papers are refereed, authors
 might consider submitting important results to this meeting anyway.  A
 call for papers will be announced, as usual, on this network.


Jim Bower

From Dave.Touretzky at B.GP.CS.CMU.EDU  Mon Jan 23 18:46:25 1989
From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU)
Date: Mon, 23 Jan 89 18:46:25 EST
Subject: message from Jim Bower
Message-ID: <331.601602385@DST.BOLTZ.CS.CMU.EDU>

================================================================
Date: Sun, 22 Jan 89 20:37:57 pst
From: jbower at bek-mc.caltech.edu (Jim Bower)
To: Connectionists-Request at q.cs.cmu.edu
Subject: NIPS 89


 To whom it may concern:
 	A few days ago there was an announcement on the connectionist
 network that only one "major" neural network meeting would be held in
 1989.  While "major" in past meeting announcements for the INNS and the
 IEEE San Diego meetings has seemed most often to be equated with total
 attendance, and size of the exhibit area,  an equally important measure
 might be the overall quality of the work presented and therefore, the
 importance of the meeting to the field.  Accordingly, the previous
 announcement should probably be amended to include the fact that the
 Third annual Neural Information Processing Systems (NIPS) meeting will
 be held in late 1989 in Denver.  While the objective of this meeting is not
 to be the biggest meeting ever, and submitted papers are refereed, authors
 might consider submitting important results to this meeting anyway.  A
 call for papers will be announced, as usual, on this network.


From movellan%garnet.Berkeley.EDU at violet.berkeley.edu  Mon Jan 23 23:32:11 1989
From: movellan%garnet.Berkeley.EDU at violet.berkeley.edu (movellan%garnet.Berkeley.EDU@violet.berkeley.edu)
Date: Mon, 23 Jan 89 20:32:11 pst
Subject: Weight Decay
Message-ID: <8901240432.AA18293@garnet.berkeley.edu>

Referring to the compilation about weight decay from John:  I
cannot see the analogy between weight decay and ridge regression.
 
 
The weight solutions in a linear network (Ordinary Least Squares)
 
are the solutions to (I'I) W = I'T where:  
 
I is the input matrix (rows are # of patterns in epoch and
columns are # of input units in net). T is the teacher matrix 
(rows are # of patterns in epoch and columns  are # of 
teacher units in net). W is the matrix of weights (net is linear
with only one layer!). 
 
The weight solutions in ridge regression would be given by  
(I'I + k<1>) W = I'T. Where k is a "shrinkage" constant and <1> 
represents the identity matrix. Notice that k<1> has the same 
effect as increasing the variances of the inputs (Diagonal of 
I'I) without increasing their covariances (rest of the I'I 
matrix). The final effect is biasing the W solutions but reducing
the extreme variability to which they are subject when I'I is 
near singular (multicollinearity). Obviously collinearity may be
a problem in nets with a large # of hidden units. I am presently 
studying how and why collinearity in the hidden layer affects 
generalization and whether ridge solutions may help in this 
situation. I cannot see though how these ridge solutions relate 
to weight decay.
 
-Javier

From ILPG0 at ccuab1.uab.es  Tue Jan 24 09:23:00 1989
From: ILPG0 at ccuab1.uab.es (CORTO MALTESE)
Date: Tue, 24 Jan 89 14:23 GMT
Subject: Suscription
Message-ID: <mailman.88.1149591159.29955.connectionists@cs.cmu.edu>

Dear list owner,
        I should be grateful if you can add my name in the list of
suscriptors of Connectionists.  My name is O. S. Vilageliu, and the E.  Mail
address:
                ilpg0 at ccuab1.uab.es
        I thank you beforehand,
        Sincerely yours,

                                        Olga Soler

From pollack at cis.ohio-state.edu  Tue Jan 24 11:51:15 1989
From: pollack at cis.ohio-state.edu (Jordan B. Pollack)
Date: Tue, 24 Jan 89 11:51:15 EST
Subject: Gimme a break!
In-Reply-To: Christof Koch's message of Mon, 23 Jan 89 11:29:31 PST <890123112923.20203114@Hamlet.Caltech.Edu>
Message-ID: <8901241651.AA02067@toto.cis.ohio-state.edu>


Speaking of NIPS versus IJCNN, At least NIPS is pronouncable, even though,
as Terry S pointed out, Nabisco already holds it as a trademark.

If the international joint conference is to be as lasting a success
as, say, IJCAI, then its acronym should smoothly roll off the tongue.

Here are some of the alternatives I've just come up with:

Minor variations:

JINNC (Jink) Permute the word order
IJCONN Same name, but include the "ON"
ICONN  Leave out the "Joint" (for a drug free meeting?)
ICONS  International Conf. on Neural Systems (Hey! This is even a Word!)

The most elegant name is simply
NN  "Neural Networks", which can be spoken as either
"N Squared"   signifying both its size and technical nature, or
"Double-N",   signifying both the need for a big spread
              and the yearly "round-up" of research results like cattle...

Of course the search for acronyms usually generates useless debris:

NIPSOID  Neural Information Processing Systems On an International Dimension
MANIC  Most (of the) Artificially Neural International Community
DNE    (Sounds like DNA?) Dear Neural Enthusiast...
BNANA  Big Network of Artificial Neural Aficionados
ARTIST  Adaptive Resonance Theory as International Science and Technology
IBSH  I better Stop Here.


From kanderso at BBN.COM  Tue Jan 24 13:54:04 1989
From: kanderso at BBN.COM (kanderso@BBN.COM)
Date: Tue, 24 Jan 89 13:54:04 -0500
Subject: Weight Decay 
In-Reply-To: Your message of Mon, 23 Jan 89 20:32:11 -0800.
             <8901240432.AA18293@garnet.berkeley.edu> 
Message-ID: <mailman.89.1149591159.29955.connectionists@cs.cmu.edu>

  Date: Mon, 23 Jan 89 20:32:11 pst
  From: movellan%garnet.Berkeley.EDU at violet.berkeley.edu
  Message-Id: <8901240432.AA18293 at garnet.berkeley.edu>
  To: connectionists at cs.cmu.edu
  Subject: Weight Decay
  
  Referring to the compilation about weight decay from John:  I
  cannot see the analogy between weight decay and ridge regression.
   
   
  The weight solutions in a linear network (Ordinary Least Squares)
   
  are the solutions to (I'I) W = I'T where:  
   
  I is the input matrix (rows are # of patterns in epoch and
  columns are # of input units in net). T is the teacher matrix 
  (rows are # of patterns in epoch and columns  are # of 
  teacher units in net). W is the matrix of weights (net is linear
  with only one layer!). 
   
  The weight solutions in ridge regression would be given by  
  (I'I + k<1>) W = I'T. Where k is a "shrinkage" constant and <1> 
  represents the identity matrix. Notice that k<1> has the same 
  effect as increasing the variances of the inputs (Diagonal of 
  I'I) without increasing their covariances (rest of the I'I 
  matrix). The final effect is biasing the W solutions but reducing
  the extreme variability to which they are subject when I'I is 
  near singular (multicollinearity). Obviously collinearity may be
  a problem in nets with a large # of hidden units. I am presently 
  studying how and why collinearity in the hidden layer affects 
  generalization and whether ridge solutions may help in this 
  situation. I cannot see though how these ridge solutions relate 
  to weight decay.
   
  -Javier

Yes i was confused by this too.  Here is what the connection seems to
be.  Say we are trying to minimize an energy function E(w) of the
weight vector for our network.  If we add a constraint that also
attempts to minimize the length of w we would add a term kw'w to our
energy function.  Taking your linear least squares problem we would have

E = (T-IW)'(T-IW) + kW'W

dE/dW = I'IW - I'T + kW

setting dE/dW = 0 gives

[I'I +k<1>]W = I'T, ie. Ridge Regression.

W = [I'I + k<1>]^-1 I'T

The covariance matrix is [I'I + k<1>]^-1 so the effect of increasing k

1.  Make the matrix more invertable.

2.  Reduces the covariance so that new training data will have less
effect on your weights.

3.  You loose some resolution in weight space.

I agree that collinearity is probably very important, and i'll be glad
to discuss that off line.

k

From jose at tractatus.bellcore.com  Wed Jan 25 10:02:09 1989
From: jose at tractatus.bellcore.com (Stephen J Hanson)
Date: Wed, 25 Jan 89 10:02:09 EST
Subject: Weight Decay
Message-ID: <8901251502.AA05090@tractatus.bellcore.com>

actually I think the connection is more general--ridge regression
is a special case of variance techniques in regression called "biased regression"
(including principle components), biases are introduced in order to 
remove effects of collinearity as has been discussed and to attempt to achieve estimators that
may have a lower variance then the theoretical best least squares
unbiased estimator ("blue") since when assumptions of linearity and 
independence are violated LSE are not particularly attractive and
will not necessarily achieve "blue"s.  Conseqently nonlinear regression
and ordinary linear least squares regression with collinear variables
may be able to achieve lower variance estimators by entertaining biases.
In the nonlinear case a bias term would enter as a "constraint"
to be mininmized with Error (y-yhat) sup 2.  This constriant is actually
a term that can push weights differentially towards zero--and in terms
of regression is bias in terms of neural networks--weight decay.  Ridge regression is a 
specific case in terms of linear lse where the off diagonal terms of the correlation matrix are given 
less weight by adding a small constant to the diagonal in order to reduce
the collinearity problem--it is still controversial in statistical arenas--not
everyone subcribes to the notion of introducing biases--since it is hard 
a-priori to know what bias might be optimal for a given problem.

I have a paper with Lori Pratt that describes this relationship
more generally that had been given at the last NIPS and should be
available soon as a tech report.

	Steve Hanson

From rui at rice.edu  Wed Jan 25 18:34:38 1989
From: rui at rice.edu (Rui DeFigueiredo)
Date: Wed, 25 Jan 89 17:34:38 CST
Subject: No subject
Message-ID: <8901252334.AA01804@zeta.rice.edu>


	- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


	In-Reply-To:  poggio at wheaties.ai.mit.edu's message of Tue, 17 Jan 89 22:47:17 EST
	Subject:  Kolmogorov's superposition theorem


		  Kolmogorov's theorem and its relation to networks
		  are discussed in Biol. Cyber., 37, 167-186, 1979.
		  (On the representation of multi-input systems:
		  computational properties of polynomial algorithms,
		  Poggio and Reichardt).  There are references there
		  to older papers (see especially the two nice papers
		  by H. Abelson).


	- - - - - - - - - - - - end of message - - - - - - - - - - - - 
	

	Comment:


			Poggio and Reichardt's paper, "On the
	representation of multi-input systems: Computational
	properties of polynomial algorithms" (Biol. Cyber., 37,
	167-186, 1980) appeared, not earlier but, in the same
	year as deFigueiredo's, "Implications and applications of
	Kolmogorov's superposition theorem" (IEEE Trans. on
	Automatic Control, AC-25, 1227-1231, 1980).


From Scott.Fahlman at B.GP.CS.CMU.EDU  Thu Jan 26 12:55:49 1989
From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU)
Date: Thu, 26 Jan 89 12:55:49 EST
Subject: DARPA Program announcement (long, 2 of 3)
Message-ID: <mailman.90.1149591159.29955.connectionists@cs.cmu.edu>


Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO),
1400 Wilson Blvd., Arlington, VA  22209-2308
A--BROAD AGENCY ANNOUNCEMENT(BAA#89-04): NEURAL NETWORKS:  HARDWARE TECHNOLOGY
BASE DEVELOPMENT SOL BAA#89-04 DUE 030189 POC Douglas M. Pollock, Contracts,
(202)694-1771; Dr. Barbara L. Yoon, Technical, (202)694-1303.  The Defense 
Advanced Research Projects Agency, Defense Sciences Office, DARPA/DSO, is
interested in receiving proposals to develop hardware system components 
that capitalize on the inherent massive parallelism and expected robustness
of neural network models.  The objective of the present effort is to lay the
groundwork for future construction of full-scale artificial neural network
computing machines through the development of advanced hardware implementation
technologies.  DARPA does no intend to build full-scale machines at this stage
of the program.  Areas of interest include modifiable-weight synaptic
connections, neuron processing unit devices, and scalable neural net
architecture designs.  The technologies proposed may be analog or digital,
using silicon or other materials, and may be electronic, optoelectronic, 
optical, or other.  The technology should be robust to manufacturing and 
environmental variability.  It should be flexible and modular to accommodate
evolving neural network system architectures and to allow for scale-up to
large-sized systems through assembly/interconnection of smaller subsystems.
It should be appropriate for future compact, low-power systems.  It must
accommodate the high fan-out/high fan-in properties characteristic of
artificial neural network systems with high density interconnects, and it
must have high throughput capability to achieve rapid processing of large
volumes of data.  Only those proposals that clearly delineate how the 
objective enumerated above are to be achieved and that demonstrate extensive
prior experience in hardware design and fabrication will be favorably
considered.  If the proposal addresses a component technology, proposers
should provide a detailed description of the interface features required
for integration into a working artificial neural network system.  Whether
the proposed technology is adapted to a specific neural net model or,
conversely, is applicable to a broad range of models, the proposer
should clearly define the specific features of the proposed hardware
that underlie its particular applicability.  To the extent that availability
of the proposed technology will facilitate the implementation of advanced 
systems other that artificial neural network systems, that potential impact
should be described.  Hardware developers are encouraged to work in close
coordination with neural network modelers to better understand the range
of current projected architectural requirements. DARPA will also entertain
a limited number of proposals to develop near-term prototypes with high
potential for demonstrating the expected power of artificial neural networks.
This effort is a part of the DARPA program on Neural
Networks, the total funding for which is anticipated to be $33M over a 28
month period.  Proposals for projects covering less than 28 months are 
encouraged.  Proposals may be submitted any time through 4PM, March 1, 1989.
The proposal must contain the information listed below.  (1) The name, address,
and telephone number of the individual or organization submitting the proposal;
(2) A brief title that clearly identifies the application being addressed,
a concise descriptive summary of the proposed research, a supporting detailed
statement of the technical approach, and a description of the facilities to
be employed in this research.  Cooperative arrangements among industries,
universities, and other institutions are encouraged whenever this is 
advantageous to executing the proposed research.  Proprietary portions to
the technical proposal should be specifically identified.  Such proprietary
information will be treated with strict confidentiality; (3) The names, titles,
and proposed roles of this principal investigators and other key personnel
to be employed in the conduct of this research, with brief, resumes
that describe their pertinent accomplishments and publications; (4) A cost
proposal on SF1411 (or its equivalent) describing total costs, and an itemized
list of costs for labor, expendable and non-expendable equipment and
supplies, travel, subcontractors, consultants, and fees; (5) A schedule 
listing anticipated spending rates and program milestones; (6) The signature
of the individual (if applying on his own behalf) or of an official duly
authorized to commit the organization in business and financial affairs.  
Proposals should address a single application.  The technical content of the
proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2
x 11 inches), exclusive of figures, tables, references, resumes, and cost
proposal.  Proposals should contain a statement of validity for at least
150 days beyond the closing date of this announcement.  Evaluation of proposals
received in response to the BAA will be accomplished through a peer or
scientific review.  Selection of proposals will be based on the following 
evaluation criteria, listed in descending order of relative importance;
(1) Contribution of the proposed work to the stated objectives of the
program; (2) The soundness of the technical approach; (3) The uniqueness
and innovative content; (4) The qualifications of the principal and supporting
investigators; (5) The institution's capabilities and facilities; and (6) The
reasonableness of the proposed costs.  Selection will be based primarily on 
scientific or technical merit, importance to the program and fund availability.
Cost realism and reasonableness will only be significant in deciding between 
two technically equal proposals.  Fifteen copies of proposals should be sub-
mitted to:  Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, 
Arlington, VA  2209-2308.  Technical questions should be addressed to Dr. Yoon,
telephone (202)694-1303.  This CBD notice itself constitutes the Broad
Agency Announcement as contemplated in FAR 6.102(d)(2).  No additional written
information is available, nor will a formal RFP or other solicitation
regarding this announcement be issued.  Requests for same will be disregarded.
The Government reserves the right to select for award all, some or none of the
proposals received in response to this announcement.  All responsible sources
may submit a proposal which shall be considered by DARPA.


From poggio at wheaties.ai.mit.edu  Thu Jan 26 13:01:23 1989
From: poggio at wheaties.ai.mit.edu (Tomaso Poggio)
Date: Thu, 26 Jan 89 13:01:23 EST
Subject: No subject
In-Reply-To: Rui DeFigueiredo's message of Wed, 25 Jan 89 17:34:38 CST <8901252334.AA01804@zeta.rice.edu>
Message-ID: <8901261801.AA15158@wheat-chex.ai.mit.edu>

...

From Scott.Fahlman at B.GP.CS.CMU.EDU  Thu Jan 26 13:00:34 1989
From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU)
Date: Thu, 26 Jan 89 13:00:34 EST
Subject: DARPA Program Announcement (long, 3 of 3) 
Message-ID: <mailman.91.1149591160.29955.connectionists@cs.cmu.edu>


Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO),
1400 Wilson Blvd., Arlington, VA  22209-2308
A--BROAD AGENCY ANNOUNCEMENT(BAA#89-03):  NEURAL NETWORKS: THEORY AND MODELING 
SOL BAA#89-03 DUE 030189 POC Douglas M. Pollock, Contracts(202)694-1771;
Dr. Barbara L. Yoon, Technical(202)694-1303.  The Defense Advanced Research
Projects Agency, Defense Sciences Office, DARPA/DSO, is interested in receiving
proposals to develop and analyze new artificial neural network system
architectures/structures and training procedures; define the requirements
for scale-up to large-sized artificial neural networks; and characterize
the properties, limitations, and data requirements of new and existing
artificial neural network systems.  Proposers are encouraged to submit 
proposals that deal with, but are not limited to, any combination of the
following thrusts within these areas: (1) New artificial neural architectures
with one or more of the following features: (a) Potential for addressing
real-time sensory data processing and real-time sensorimonitor control; (b) 
Networks that incorporate features of sensory, motor, and perceptual processing
in biological systems; (c) Nodal elements with increased processing
capability, including sensitivity to temporal variations in synaptic
inputs; (d) Modular networks composed of multiple interconnected subnets;
(e) Hybrid systems combining neural and conventional information processing
techniques; (f) Mechanisms to achieve modifications of network behavior in
response to external consequences of initial actions;
(g) Mechanisms that exhibit selective attention; (h) Strategies
for developing conceptual systems and internal data representations well
adapted to specific tasks; (i) Means for recognizing and producing 
sequences of temporal patterns.  (2) Faster, more efficient training 
procedures that: (a) Are robust to noisy data and able to accommodate
delayed feedback; (b) Minimize the need for external intervention for
feedback; (c) Identify optimal choices of initial classification features
or categories; (d) Generate internal models of the external world to guide
appropriate responses to external stimuli. (3) Theoretical analyses that
address; (a) Data representations; (b) Scaling properties for new and
existing systems; (c) Matching of system complexity to the nature and
amount of training data; (d) Tolerance to nodal element and synaptic
failure; (e) Stability and convergence of new and existing systems; (f)
Relationships between neural networks and conventional approaches.  DARPA
will also entertain a limited number of proposals to address special
applications with high potential for demonstrating the expected power of
artificial neural networks.This effort is a part of the DARPA program on Neural
Networks, the total funding for which is anticipated to be $33M over a 28
month period.  Proposals for projects covering less than 28 months are 
encouraged.  Proposals may be submitted any time through 4PM, March 1, 1989.
The proposal must contain the information listed below.  (1) The name, address,
and telephone number of the individual or organization submitting the proposal;
(2) A brief title that clearly identifies the application being addressed,
a concise descriptive summary of the proposed research, a supporting detailed
statement of the technical approach, and a description of the facilities to
be employed in this research.  Cooperative arrangements among industries,
universities, and other institutions are encouraged whenever this is 
advantageous to executing the proposed research.  Proprietary portions to
the technical proposal should be specifically identified.  Such proprietary
information will be treated with strict confidentiality; (3) The names, titles,
and proposed roles of this principal investigators and other key personnel
to be employed in the conduct of this research, with brief, resumes
that describe their pertinent accomplishments and publications; (4) A cost
proposal on SF1411 (or its equivalent) describing total costs, and an itemized
list of costs for labor, expendable and non-expendable equipment and
supplies, travel, subcontractors, consultants, and fees; (5) A schedule 
listing anticipated spending rates and program milestones; (6) The signature
of the individual (if applying on his own behalf) or of an official duly
authorized to commit the organization in business and financial affairs.  
Proposals should address a single application.  The technical content of the
proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2
x 11 inches), exclusive of figures, tables, references, resumes, and cost
proposal.  Proposals should contain a statement of validity for at least
150 days beyond the closing date of this announcement.  Evaluation of proposals
received in response to the BAA will be accomplished through a peer or
scientific review.  Selection of proposals will be based on the following 
evaluation criteria, listed in descending order of relative importance;
(1) Contribution of the proposed work to the stated objectives of the
program; (2) The soundness of the technical approach; (3) The uniqueness
and innovative content; (4) The qualifications of the principal and supporting
investigators; (5) The institution's capabilities and facilities; and (6) The
reasonableness of the proposed costs.  Selection will be based primarily on 
scientific or technical merit, importance to the program and fund availability.
Cost realism and reasonableness will only be significant in deciding between 
two technically equal proposals.  Fifteen copies of proposals should be sub-
mitted to:  Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, 
Arlington, VA  2209-2308.  Technical questions should be addressed to Dr. Yoon,
telephone (202)694-1303.  This CBD notice itself constitutes the Broad
Agency Announcement as contemplated in FAR 6.102(d)(2).  No additional written
information is available, nor will a formal RFP or other solicitation
regarding this announcement be issued.  Requests for same will be disregarded.
The Government reserves the right to select for award all, some or none of the
proposals received in response to this announcement.  All responsible sources
may submit a proposal which shall be considered by DARPA.

From Scott.Fahlman at B.GP.CS.CMU.EDU  Thu Jan 26 12:49:35 1989
From: Scott.Fahlman at B.GP.CS.CMU.EDU (Scott.Fahlman@B.GP.CS.CMU.EDU)
Date: Thu, 26 Jan 89 12:49:35 EST
Subject: DARPA Program announcement (long, 1 of 3)
Message-ID: <mailman.92.1149591160.29955.connectionists@cs.cmu.edu>


Barbara Yoon at DARPA has apparently been flooded with requests for the
three DARPA program announcements in the neural network area.  To lighten
the load, she asked us to send out the full text of these announcements to
members of this mailing list.  The text in this and the following two
messages is copied verbatim from the Commerce Business Daily.  We have
resisted the temptation to insert paragraph breaks to improve readability.

I apologize for dumping so much text on people who alrady have copies of
the announcements or who are not interested, but this seems the best way to
get the word out to a large set of potentially interested people.  Please
don't contact us about this program -- the appropriate phone numbers and
addresses are listed in the announcements.

-- Scott Fahlman, CMU

===========================================================================

Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO),
1400 Wilson Blvd., Arlington, VA 22209-2308

A--BROAD AGENCY ANNOUNCEMENT (BAA#89-02):  NEURAL NETWORKS:  COMPARATIVE
PERFORMANCE MEASUREMENTS SOL BAA#89-02 DUE 030189 POC Douglas M. Pollock, 
Contracts, (202)694-1771; Dr. Barbara L. Yoon, Technical, (202)694-1303.
The Defense Advanced Research Projects Agency, Defense Sciences Office, 
DARPA/DSO is interested in receiving proposals to construct and test software
simulations of artificial neural networks (or software simulations of hybrid
systems incorporating artificial neural networks) that perform defined,
complex classification tasks in the following application areas: (1) Automatic
target recognition; (2) Continuous speech recognition; (3) Sonar signal
discrimination; and (4) Seismic signal discrimination.  The objectives of this
program are to advance the state-of-the-art in application of artificial neural
network approaches to classification problems; to investigate the optimal
role of artificial neural networks in hybrid  classification systems; and
to measure the projected performance of artificial neural networks (or hybrid
systems containing neural networks) in order to support a comparison with
the performance of alternative, competing technologies.  DARPA will provide
application developers with a standard set of training data, appropriate to
the application, to be used as the basis for training (or otherwise developing)
their classification systems.  The systems developed will then be evaluated
independently in classification of standard sets of test data, distinct from
the training set.  The four application tasks are more fully described below.
(1) Automatic target recognition: (a) Given a multi-spectral training set of
time-correlated images of up to ten land vehicles (which may be partially
obscured and in cluttered environments) with ground truth provided, identify
and classify these vehicles in a new set of images (outside the training set);
(b) Given images of two or more new land vehicles, recognize these vehicles
as distinct from the original set and distinguish them from one another (with
no system reprogramming or retraining); (c) Given a new training set of data 
on air vehicles, with system reprogramming and/or retraining, modify the system
to identify and classify this new class of targets.  (2) Continuous speech
recognition:  (a) Given a training set of 2800 spoken English sentences (with
a 1000 word vocabulary), transcribe to written text spoken English sentences
from a test set (outside the training set); (b) With no system reprogramming
or retraining, transcribe to text spoken English sentences using vocabulary
outside the initial vocabulary (given only the phonetic spelling of the new
words); (c) Given training data on spoken foreign language sentences (with
characteristics similar to those of the English sentence data base described
in application (2)(a) above), with system programming and/or retraining, modify
the system to transcribe to text spoken foreign language sentences.  (3) Sonar
signal discrimination:  (a) Given a training set of several acoustic signature
transients and passive marine acoustic signals (both signal types in noisy
environments), detect and classify each signal type in a test set (outside
the training set); (b) Given two or more new passive marine acoustic signals,
with no system reprogramming or retraining, recognize these signals as distinct
from the original set and distinguish them from one another; (c) Given a new 
training set of data on underwater echoes from active sonar returns, with
system reprogramming and/or retraining, modify the system to detect and 
classify each signal type in this new class of signals and distinguish them
from the original set of acoustic signals.  (4) Seismic signal discrimination:
(a) Given a training set of seismic signals (and associated parameters) from
different types of seismic events of varying magnitudes, each event recorded
at two or more seismic stations with ground truth provided, classify (as to
signal type), locate, and estimate the magnitude of similar events in a test
set of seismic signals (outside the training set); (b) Given one or more new
types of seismic signals, recognize these signals as distinct from the original
set (with no system reprogramming or retraining); (c) Given a new training
set of seismic signals from seismic stations located in different geological 
regions from the original stations, with system reprogramming and/or retrain-
ing, modify the system to classify and characterize this new set of signals.
The criteria for evaluating the performance of the classification systems will
include: (a) Classification accuracy (the appropriate accuracy metric for the
task addressed, e.g., percentage or correct detections, identifications, and/or
classifications, including false alarms where applicable; or total error 
rates); (b) System development time (the time required to develop and train the
system); (c) Fault tolerance (the percentage of original performance when 
subjected to failure of some of the processing elements); (d) Generality (the
accuracy of the system for new input data significantly outside the range of
training data); (e) Adaptability (the time and effort required to modify the
system to address similar classification problems with different classes of
data); (f) Computational efficiency (the period solution speed when optimally
implemented in hardware); (g) Size and power requirements (the projected
size and power requirements of the computational hardware); (h) Performance
vs training data (the rate of improvement in performance with increasing size of
the training data set).  This effort is a part of the DARPA program on Neural
Networks, the total funding for which is anticipated to be $33M over a 28
month period.  Proposals for projects covering less than 28 months are 
encouraged.  Proposals may be submitted any time through 4PM, March 1, 1989.
The proposal must contain the information listed below.  (1) The name, address,
and telephone number of the individual or organization submitting the proposal;
(2) A brief title that clearly identifies the application being addressed,
a concise descriptive summary of the proposed research, a supporting detailed
statement of the technical approach, and a description of the facilities to
be employed in this research.  Cooperative arrangements among industries,
universities, and other institutions are encouraged whenever this is 
advantageous to executing the proposed research.  Proprietary portions to
the technical proposal should be specifically identified.  Such proprietary
information will be treated with strict confidentiality; (3) The names, titles,
and proposed roles of this principal investigators and other key personnel
to be employed in the conduct of this research, with brief, resumes
that describe their pertinent accomplishments and publications; (4) A cost
proposal on SF1411 (or its equivalent) describing total costs, and an itemized
list of costs for labor, expendable and non-expendable equipment and
supplies, travel, subcontractors, consultants, and fees; (5) A schedule 
listing anticipated spending rates and program milestones; (6) The signature
of the individual (if applying on his own behalf) or of an official duly
authorized to commit the organization in business and financial affairs.  
Proposals should address a single application.  The technical content of the
proposals is not to exceed a total of 15 pages in length (double-spaced, 8 1/2
x 11 inches), exclusive of figures, tables, references, resumes, and cost
proposal.  Proposals should contain a statement of validity for at least
150 days beyond the closing date of this announcement.  Evaluation of proposals
received in response to the BAA will be accomplished through a peer or
scientific review.  Selection of proposals will be based on the following 
evaluation criteria, listed in descending order of relative importance;
(1) Contribution of the proposed work to the stated objectives of the
program; (2) The soundness of the technical approach; (3) The uniqueness
and innovative content; (4) The qualifications of the principal and supporting
investigators; (5) The institution's capabilities and facilities; and (6) The
reasonableness of the proposed costs.  Selection will be based primarily on 
scientific or technical merit, importance to the program and fund availability.
Cost realism and reasonableness will only be significant in deciding between 
two technically equal proposals.  Fifteen copies of proposals should be sub-
mitted to:  Barbara L. Yoon, DARPA/DSO, 1400 Wilson Blvd., 6th Floor, 
Arlington, VA  2209-2308.  Technical questions should be addressed to Dr. Yoon,
telephone (202)694-1303.  This CBD notice itself constitutes the Broad
Agency Announcement as contemplated in FAR 6.102(d)(2).  No additional written
information is available, nor will a formal RFP or other solicitation
regarding this announcement be issued.  Requests for same will be disregarded.
The Government reserves the right to select for award all, some or none of the
proposals received in response to this announcement.  All responsible sources
may submit a proposal which shall be considered by DARPA.

From pwh at ece-csc.ncsu.edu  Thu Jan 26 17:31:04 1989
From: pwh at ece-csc.ncsu.edu (Paul Hollis)
Date: Thu, 26 Jan 89 17:31:04 EST
Subject: No subject
Message-ID: <8901262231.AA03761@ece-csc.ncsu.edu>


     REVISED SUBMISSION DEADLINE FOR IJCNN-89 PAPERS--FEBRUARY 15, 1989

             International Joint Conference on Neural Networks
                             June 18-22, 1989
                             Washington, D.C.


DEADLINE FOR SUBMISSION OF PAPERS for IJCNN-89 has been revised to FEBRUARY
15, 1989.  Papers of 8 pages or less are solicited in the following areas:


-Real World Applications                    -Associative Memory
-Supervised Learning Theory                 -Image Analysis
-Reinforcement Learning Theory              -Self-Organization
-Robotics and Control                       -Neurobiological Models
-Optical Neurocomputers                     -Vision
-Speech Processing and Recognition          -Electronic Neurocomputers
-Neural Network Architectures & Theory      -Optimization 


FULL PAPERS in camera-ready form (1 original on Author's Kit  forms  and  5
reduced  8  1/2" x 11" copies) should be submitted to Nomi Feldman, Confer-
ence  Coordinator, at the address below.  For more details,  or to  request
your IEEE Author's Kit, call or write:

               Nomi Feldman, IJCNN-89 Conference Coordinator
                  3770 Tansy Street, San Diego, CA  92121
                              (619) 453-6222

From REXB%PURCCVM.BITNET at VMA.CC.CMU.EDU  Fri Jan 27 12:55:00 1989
From: REXB%PURCCVM.BITNET at VMA.CC.CMU.EDU (Rex C. Bontrager)
Date: Fri, 27 Jan 1989 12:55 EST
Subject: INNS membership
Message-ID: <mailman.93.1149591160.29955.connectionists@cs.cmu.edu>

Who do I contact regarding INNS membership?
(more precisely, to whom do I send my money?)


Rex C. Bontrager
Bitnet:    rexb at purccvm
Internet:  rexb at vm.cc.purdue.edu
Phone:     (317) 494-1787 ext. 256

From neural!yann  Wed Jan 25 15:13:58 1989
From: neural!yann (Yann le Cun)
Date: Wed, 25 Jan 89 15:13:58 -0500
Subject: Weight Decay 
Message-ID: <8901252012.AA00971@neural.UUCP>


Consider a single layer linear network with N inputs. 
When the number of training pattern is smaller than N , the
set of solutions (in weight space) is a proper linear subspace.
adding weight decay will select the minimum norm solution in this subspace
(if the weight decay coefficient is decreased with time).
The minimum norm solution happens to be the solution given by the 
pseudo-inverse technique (cf Kohonen), and the solution which
optimally cancels out uncorrelated zero mean additive noise on the input.

- Yann Le Cun


From reggia at mimsy.umd.edu  Fri Jan 27 19:41:19 1989
From: reggia at mimsy.umd.edu (James A. Reggia)
Date: Fri, 27 Jan 89 19:41:19 EST
Subject: call for papers
Message-ID: <8901280041.AA04500@mimsy.umd.edu>


                    CALL FOR PAPERS

The 13th Annual Symposium on Computer Applications in Medical
Care will have a tract this year on applications of neural
models (connectionist models, etc.) in medicine. The Symposium
will be held in Washington DC, as in previous years, on
November 5 - 8, 1989.

Submissions are refereed and if accepted, appear in the
Symposium Proceedings.  Deadline for submission of manuscripts
(six copies, double spaced, max. of 5000 words) is March 3, 1989.
For further information and/or a copy of the detailed call
for papers, contact:

       SCAMC
       Office of Continuing Medical Education
       George Washington University Medical Center
       2300 K Street, NW
       Washington, DC 20037

The detailed call for papers includes author information sheets
that must be returned with a manuscript.

From elman at amos.ling.ucsd.edu  Sat Jan 28 01:24:24 1989
From: elman at amos.ling.ucsd.edu (Jeff Elman)
Date: Fri, 27 Jan 89 22:24:24 PST
Subject: UCSD Cog Sci faculty opening
Message-ID: <8901280624.AA11066@amos.ling.ucsd.edu>


                       ASSISTANT PROFESSOR
                        COGNITIVE SCIENCE
               UNIVERSITY OF CALIFORNIA, SAN DIEGO

The Department of Cognitive Science at UCSD  expects  to  receive
permission  to hire one person for a tenure-track position at the
Assistant Professor level. The Department takes a  broadly  based
approach  to  the  study of cognition, including its neurological
basis,  in   individuals   and   social   groups,   and   machine
intelligence.    We  seek  someone  whose  interests  cut  across
conventional disciplines.   Interests  in  theory,  computational
modeling (especially PDP), or applications are encouraged.

Candidates  should  send  a  vita,  reprints,  a   short   letter
describing   their   background  and  interests,  and  names  and
addresses of at least three references to:

Search Committee
Cognitive Science, C-015-E
University of California, San Diego
La Jolla, CA 92093

Applications must be received prior to March  15,  1989.   Salary
will be commensurate with experience and qualifications, and will
be based upon UC pay schedules.

Women and minorities are especially  encouraged  to  apply.   The
University   of   California,   San   Diego   is  an  Affirmative
Action/Equal Opportunity Employer.


From Dave.Touretzky at B.GP.CS.CMU.EDU  Sat Jan 28 07:14:37 1989
From: Dave.Touretzky at B.GP.CS.CMU.EDU (Dave.Touretzky@B.GP.CS.CMU.EDU)
Date: Sat, 28 Jan 89 07:14:37 EST
Subject: INNS membership 
In-Reply-To: Your message of Fri, 27 Jan 89 12:55:00 -0500.
Message-ID: <462.601992877@DST.BOLTZ.CS.CMU.EDU>

PLEASE:

Do not send requests for general information (like how to join INNS) to the
CONNECTIONISTS list!  This list is intended for serious scientific
discussion only.  If you need help with an address or something equally
trivial, send mail to connectionists-request if you must.  Better yet, use
the Neuron Digest.  Don't waste people's time on CONNECTIONISTS.

-- Dave

From norman%cogsci at ucsd.edu  Sun Jan 29 13:36:36 1989
From: norman%cogsci at ucsd.edu (Donald A Norman-UCSD Cog Sci Dept)
Date: Sun, 29 Jan 89 10:36:36 PST
Subject: addendum to UCSD Cog Sci faculty opening
Message-ID: <8901291836.AA22314@sdics.COGSCI>


Jef Ellman's posting of the job at UCSD in the Cognitive Science
Department was legally and technically accurate, but he should have
added one important sentence:

Get the application -- or at least, a letter of interest -- to us immediately.

We are very late in getting the word out, and decisions will have to
be made quickly.  The sooner we know of the pool of applicants, the better.
(Actually, I now discover one inaccuracy -- the ad says we "expect to
receive permission to hire ..."   In fact, we now do have that permission.

If you have future interests -- say you are interested not now, but in
a year or two or three -- that too is important for us to know, so
tell us.

don norman


From ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU  Sun Jan 29 22:39:30 1989
From: ST401843%BROWNVM.BITNET at VMA.CC.CMU.EDU (thanasis kehagias)
Date: Sun, 29 Jan 89 22:39:30 EST
Subject: speech list?
Message-ID: <mailman.94.1149591160.29955.connectionists@cs.cmu.edu>

does anyone know of a mailing list where speech questions are discussed?
(not necessarily as related to connectionist methods; just speech
questions in general).


thanks a lot, Thanasis

From pwh at ece-csc.ncsu.edu  Mon Jan 30 14:48:25 1989
From: pwh at ece-csc.ncsu.edu (Paul Hollis)
Date: Mon, 30 Jan 89 14:48:25 EST
Subject: IJCNN Call for Papers Amendment
Message-ID: <8901301948.AA25787@ece-csc.ncsu.edu>


                  Amendment to IJCNN call for papers

Sorry...Upon reflection the wording in the IJCNN call for  papers  did  not
convey the proper meaning.  Perhaps a better way to say it would have been,
"IJCNN-89 is replacing both the ICNN and INNS meetings in 1989." The intent
was  for people to realize that if they planned to submit to either ICNN or
INNS or both in 1989, the joint conference is the only  opportunity  to  do
so.   Part  of  the  reason  for extending the deadline is to allow for the
short notice (no INNS call for papers had previously been issued, since the
merger  of  the two conferences just occurred). The original text was meant
to imply the above and nothing more.  No offense should  be  taken  because
none was intended.

By the way, I was at last year's NIPS conference  and  thought  it  was  an
excellent conference. I plan to be there again next year.

Also there has been some confusion over the revised deadline for paper sub-
missions to IJCNN.  The revised deadline STILL STANDS as FEBRUARY 15.

P.S. Following the precedent set at the IJCAI, my pronunciation of IJCNN is
idge-kin. The acronyms were good though!


Wes Snyder, Co-Chairman of the Organization Committee, IJCNN-89


                      January 30, 1989