From smagt at fwi.uva.nl  Wed Oct  2 04:28:55 1991
From: smagt at fwi.uva.nl (Patrick van der Smagt)
Date: Wed, 2 Oct 91 09:28:55 +0100
Subject: reprint announcement
Message-ID: <9110020828.AA20879@fwi.uva.nl>

I mentioned a paper some time ago about neural robotics control.
Popular demand made me decide to make it available by anonymous ftp
from neuroprose.
---------------------------------------------------------------------
The following reprint is available by ftp from the neuroprose
archive at archive.cis.ohio-state.edu:

        A real-time learning neural robot controller

                P. Patrick van der Smagt
                    Ben J. A. Kr\"ose

             Department of Computer Systems
                University of Amsterdam
                Kruislaan 403, 1098 SJ
               Amsterdam, The Netherlands


                        ABSTRACT

A neurally based adaptive controller for a 6 degrees of freedom
(DOF) robot manipulator with only rotary joints and a hand-held
camera is described.  The task of the system is to place the
manipulator directly above an object that is observed by the
camera (i.e., 2D hand-eye coordination).  The requirement of
adaptivity results in a system which does not make use of any
inverse kinematics formulas or other detailed knowledge of the
plant; instead, it should be self-supervising and adapt on-line.

The proposed neural system will directly translate the
preprocessed sensory data to joint displacements.  It controls
the plant in a feedback loop.  The robot arm may make a sequence
of moves before the target is reached, when in the meantime the
network learns from experience.  The network is shown to adapt
quickly (in only tens of trials) and form a correct mapping from
input to output domain.

Here's how to get the reprint from neuroprose:

        unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
        Name: anonymous
        Password: neuron
        ftp> cd pub/neuroprose
        ftp> binary
        ftp> get smagt.rtcontrol.ps.Z
        ftp> quit
        unix> uncompress smagt.rtcontrol.ps.Z
        unix> lpr smagt.rtcontrol.ps (or however you print postscript)

Questions or comments can be sent to me at:

        Patrick van der Smagt
        Department of Computer Systems
        University of Amsterdam
        Kruislaan 403, 1098 SJ
        Amsterdam, The Netherlands
        email: smagt at fwi.uva.nl
        fax: +31 20 525 7490
        phone: +31 20 525 7524

From LAUTRUP at nbivax.nbi.dk  Wed Oct  2 04:05:00 1991
From: LAUTRUP at nbivax.nbi.dk (Benny Lautrup)
Date: Wed, 2 Oct 1991 09:05 +0100 (NBI, Copenhagen)
Subject: preprint
Message-ID: <1F984C7800023236@nbivax.nbi.dk>


New preprint

    Uniqueness of Parisi's Scheme for Replica Symmetry Breaking

                  B. Lautrup
                  Computational Neural Network Center
                  The Niels Bohr Institute
                  Blegdamsvej 17
                  2100 Copenhagen, Denmark

   Abstract:
   Replica symmetry breaking in spin glass models is investigated
   using elements of the theory of permutation groups. It is shown
   how the various types of symmetry breaking gives rise to special
   algebras and that Parisi's scheme may be uniquely characterized
   by two simple conditions on these algebras, namely transposition
   symmetry and simple extensibility. An alternative to the Parisi
   scheme is shown to be unacceptable.


   The paper may be retrieved by anonymous ftp from

   nbibel.nbi.dk  (129.142.100.11)

   in the directory pub/neuroprose under the name

   lautrup.parisi.ps.Z

   It is a compressed postscript file.

Regards

Benny Lautrup

From mclennan at cs.utk.edu  Wed Oct  2 15:10:21 1991
From: mclennan at cs.utk.edu (mclennan@cs.utk.edu)
Date: Wed, 2 Oct 91 15:10:21 -0400
Subject: report available
Message-ID: <9110021910.AA12670@maclennan.cs.utk.edu>


    ** Please do not forward to other boards.  Thank you. **

The following technical report has been placed in the Neuroprose
archives at Ohio State.  Ftp instructions follow the abstract.
N.B.  The uncompressed file is long (2.07 MB), so you may have to
use the -s (symbolic link) option on lpr to print it.

      -----------------------------------------------------
      Gabor Representations of Spatiotemporal Visual Images

                         Bruce MacLennan
                   Computer Science Department
                     University of Tennessee
                       Knoxville, TN 37996
                      maclennan at cs.utk.edu

                   Technical Report CS-91-144

                            ABSTRACT:

We review Gabor's Uncertainty Principle and the limits it  places
on the representation of any signal.  Representations in terms of
Gabor elementary functions (Gaussian-modulated sinusoids), which
are optimal in terms of this uncertainty principle, are compared
with Fourier and wavelet representations.  We also review
Daugman's evidence for representations based on two-dimensional
Gabor functions in mammalian visual cortex.  We suggest three-
dimensional Gabor elementary functions as a model for motion
selectivity in complex and hypercomplex cells in visual cortex.
This model also suggests a computational role for low frequency
oscillations (such as the alpha rhythm) in visual cortex.

A preliminary version of this paper was presented at the workshop
``Foundational Methods for Behavioral and Computational Neurosci-
ences,'' Georgetown University, May 13-15, 1991.

      -----------------------------------------------------
                        FTP INSTRUCTIONS
Either use the Getps script, or do the following:

     unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary
     ftp> get maclennan.gabor.ps.Z
     ftp> quit
     unix> uncompress maclennan.gabor.ps.Z
     unix> lpr -s maclennan.gabor.ps (or however you print postscript)

If you need hardcopy, then send your request to:

                       library at cs.utk.edu

     Bruce MacLennan
     Department of Computer Science
     107 Ayres Hall
     The University of Tennessee
     Knoxville, TN 37996-1301

     (615)974-0994/5067
     FAX: (615)974-4404
     maclennan at cs.utk.edu

From M.Stannett at dcs.sheffield.ac.uk  Wed Oct  2 16:30:06 1991
From: M.Stannett at dcs.sheffield.ac.uk (M.Stannett@dcs.sheffield.ac.uk)
Date: Wed, 2 Oct 91 16:30:06 BST
Subject: Concurrent semantics
Message-ID: <9110021530.AA04587@sun5.dcs.sheffield.ac.uk>

Dear All,

		IF THIS MESSAGE ISN'T RELEVANT TO YOU,
		PLEASE PASS IT TO SOMEONE TO WHOM IT IS.

One of my major delights in computer science is the nature of concurrent
semantics, and especially the "non-interleaving" models like Mazurkiewicz
trace language and their analogues  (these are models which represent
so-called 'true' concurrency, rather than trying to flatten everything down
into sequences of actions). Nonetheless, I readily admit that the more
standard "interleaving" models are fascinating in their own right as well.
In any case, I'm certain we're all trying to solve the same problems, but
merely approaching them from slightly different angles - in ten years time,
we'll be wondering what all the disagreement was about ....

{{{ CONNECTIONISTS: concurrent semantics is concerned with working out
what complex concurrent systems are actually doing, and how properly to
represent their behaviour. Applying the standard sequential interpretations
to concurrent systems can sometimes lead to misleading results. Consequently,
I would argue that finding a deep understanding of the nature of complex
networks probably involves  exactly the same problems as are currently faced
by concurrent semantics theorists.  It might prove extremely fruitful to
see some colloborations between the two fields }}}

As far as I can work out, there seems to be only negligible contact between
the many groups working in the area.  I'd like to see  some sort of elecronic
forum for discussing ideas in the area - even if we can't work together, at
least we might be able to exchange ideas rapidly from time to time.

Please let me know if you'd be interested in joining in a sort of loosely
confederated "concurrency club" or whatever. Obviously, there's be no funding
to speak of, but then, given sufficient enthusiasm, we shouldn't need any.
(At least, not yet). Provided the task isn't TOO time-consuming, I'll happily
channel messages to interested parties for the time-being.

Thanks for reading!

Mike Stannett
( M.Stannett @ uk.ac.sheffield.dcs )


From et at eng.cam.ac.uk  Wed Oct  2 10:31:19 1991
From: et at eng.cam.ac.uk (E. Tzirkel-Hancock)
Date: Wed, 2 Oct 91 15:31:19 +0100
Subject: Technical Report Available
Message-ID: <24638.9110021431@tw700.eng.cam.ac.uk>

The following report has been placed in the neuroprose archives at
Ohio State University:

		    STABLE CONTROL OF NONLINEAR  
	           SYSTEMS USING NEURAL NETWORKS

		Eli Tzirkel-Hancock & Frank Fallside

	        Technical Report CUED/F-INFENG/TR.81 

	             Cambridge University
		    Engineering Department 
		      Trumpington Street 
		       Cambridge CB2 1PZ 
			    England 

                            Abstract

A neural network based direct control architecture is presented, 
that achieves output tracking for a class of continuous time nonlinear 
plants, for which the nonlinearities are unknown. The controller employs 
neural networks to perform approximate input/output plant linearization. 
The network parameters are adapted according to a stability principle. 
The architecture is based on a modification of a method previously 
proposed by the authors, where the modification comprises adding a 
sliding control term to the controller. This modification serves two 
purposes: first, as suggested by Sanner and Slotine, sliding control
compensates for plant uncertainties outside the state region where the 
networks are used, thus providing global stability; second, the sliding 
control compensates for inherent network approximation errors, hence 
improving tracking performance. 

A complete stability and tracking error convergence proof is given and 
the setting of the controller parameters is discussed. It is demonstrated 
that as a result of using sliding control, better use of the network's 
approximation ability can be achieved, and the asymptotic tracking error 
can be made dependent only on inherent network approximation errors and 
the frequency range of unmodeled dynamical modes. Two simulations are 
provided to demonstrate the features of the control method. 


************************ How to obtain a copy ************************

a) via FTP:

% ftp archive.cis.ohio-state.edu
..
Name (archive.cis.ohio-state.edu): anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get tzirkel.control_tr81.ps.Z
ftp> quit
% uncompress tzirkel.control_tr81.ps.Z
% lp         tzirkel.control_tr81.ps

b) via postal mail:

Request a hardcopy from

Eli Tzirkel, et at eng.cam.ac.uk
Speech Laboratory
Cambridge University Engineering Department 
Trumpington Street, Cambridge CB2 1PZ 
England 


From STIVA%IRMKANT.BITNET at vma.cc.cmu.edu  Thu Oct  3 11:41:47 1991
From: STIVA%IRMKANT.BITNET at vma.cc.cmu.edu (stefano nolfi)
Date: Thu, 03 Oct 91 11:41:47 EDT
Subject: Technical Report Available
Message-ID: <mailman.448.1149540230.24850.connectionists@cs.cmu.edu>


The following technical report is available.

Send request to STIVA at IRMKANT.BITNET

DO NOT REPLAY TO THIS MESSAGE

------------------------------------------------------------------------


                  Learning, Behavior, and Evolution

    Domenico Parisi          Stefano Nolfi          Federico Cecconi
                       Institute of Psychology
                             CNR - Rome
                    e-mail: stiva at irmkant.Bitnet


                               Abstract

We present simulations of  evolutionary processes operating on populations
of  neural networks  to  show  how  learning  and  behavior  can influence
evolution within a  strictly Darwinian framework. Learning  can accelerate
the  evolutionary process both  when  learning  tasks correlated  with the
fitness criterion  and  when random learning tasks  are used. Furthermore,
an ability  to learn  a task can emerge and  be transmitted evolutionarily
for both correlated and  uncorrelated tasks. Finally, behavior that allows
the individual to self-select the incoming stimuli can influence evolution
by  becoming  one  of  the  factors that determine the observed phenotypic
fitness on which  selective reproduction  is  based. For  all  the effects
demonstrated,  we  advance  a   consistent   explanation  in  terms  of  a
multidimensional weight space  for  neural networks, a fitness surface for
the evolutionary task,  and  a  performance surface for the learning task.


This paper will be presented at ECAL-91 - European Conference on Artificial
Life, December 1991, Paris.

From mre1 at it-research-institute.brighton.ac.uk  Thu Oct  3 09:20:50 1991
From: mre1 at it-research-institute.brighton.ac.uk (Mark Evans)
Date: Thu, 3 Oct 91 09:20:50 BST
Subject: IJCNN '91 Singapore - Request to share a room
Message-ID: <1583.9110030820@itri.bton.ac.uk>


I will be attending IJCNN '91 in Singapore on the 18-21 November where
I will be presenting a paper. I would be interested in hearing from
anyone who would like to share a twin room for the duration of the
conference. (I am about to book myself a room or I could pay you if you
have already booked a room.) I am PhD student at Brighton Polytechnic,
UK working in the field of computer vision and neural networks.

Anyone interested ?

#################################################
#                                               #
# M.R. Evans               mre1 at itri.bton.ac.uk #
# Research Assistant       mre1 at itri.uucp       #
#                                               #
# ITRI,                                         #
# Brighton Polytechnic,                         #
# Lewes Road,                                   #
# BRIGHTON,                                     #
# E. Sussex,                                    #
# BN2 4AT.                                      #
#                                               #
# Tel: +44 273 642915/642900                    #
# Fax: +44 273 606653                           #
#                                               #
#################################################


From kak at max.ee.lsu.edu  Thu Oct  3 10:38:55 1991
From: kak at max.ee.lsu.edu (Dr. S. Kak)
Date: Thu, 3 Oct 91 09:38:55 CDT
Subject: TR's available
Message-ID: <9110031438.AA14174@max.ee.lsu.edu>

Please send me a copy of your report.


Subhash Kak
Professor of Electrical & Computer Engineering
Louisiana State University
Baton Rouge, LA 70803-5901


From M.Stannett at dcs.sheffield.ac.uk  Fri Oct  4 16:12:17 1991
From: M.Stannett at dcs.sheffield.ac.uk (M.Stannett@dcs.sheffield.ac.uk)
Date: Fri, 4 Oct 91 16:12:17 BST
Subject: concurrent semantics mailing list
Message-ID: <9110041512.AA06164@sun5.dcs.sheffield.ac.uk>

Hello again!

A number of subscribers to CONNECTIONISTS have indicated they haven't come
across concurrent semantics (which may explain Chris Tofts' comments below).
I'll send you a quick summary of the subject area in a few days' time, and
try to show why it's relevant to connectionist researchers. 

Meanwhile ... two respondents have indicated that appropriate electronic fora
already exist for the discussion of concurrent semantics, while others
have demonstrated that (like me) they have no information about these
fora. Since there's no point setting up a third system in competition with
the other two I now know about, I enclose the details below. (If the other
are indeed distinct, perhaps they should consider merging ...)


---  Included message #1  ---

From: Miranda Mowbray <mjfm at com.hp.hpl.hplb>
Hello Mike,
            Yes, this is a very good idea [...] There is already a Concurrency
mailing list and archive, specially designed as a forum for rapid exchange of
ideas between different groups working in Concurrency.  It's been running for
some time now and I'm surprised you haven't heard of it.

It's run by Albert Meyer at MIT.  To join, send a message to 
concurrency at theory.lcs.mit.edu saying that you'd like to be on the mailing 
list.  You'll get information about archive files available.

This is a high quality forum and I recommend joining.  I also recommend
that you tell anyone else who replies to your message and wants to be 
in a concurrency club.  I don't see why you should go to the trouble of
setting up your own separate club when one already exists, unless your
version has specific local interests which are not catered for by Albert
Meyer's;  in any case what you *mustn't* do is set up a second forum which
will keep people ignorant of the first, after all the whole point is
to get everyone together!

Thankyou for your public-spiritedness,
                                       Yours,
                                              Miranda.


--- Included message(s) # 2/3 ---

From: Chris Tofts <cmnt at uk.ac.ed.dcs>
Subject: Re: Concurrent semantics

Hi Mike,
 interesting idea, at a symposium on complex systems in the states last year
I suggested using ideas from algebraic concurrency theory to a collection
of people working in neural nets etc, they not only seemed remarkable
uniterested but failed to see any link. It seems that any connections (sic)
will have to be exposed from the theoretical side. There already exists
a news group for concurrency which is used, are you suggesting something
other than this??
All the best,
Chris.

From: C.Tofts at uk.ac.bath.gdr

I believe its mail.concurrency, at least that's what its called in edinburgh.
Ask your local news guru,
All the best,
Chris.
--- End of included messages ---


From wray at ptolemy.arc.nasa.gov  Fri Oct  4 19:11:01 1991
From: wray at ptolemy.arc.nasa.gov (Wray Buntine)
Date: Fri, 4 Oct 91 16:11:01 PDT
Subject: tree classification code available for comparative studies
Message-ID: <9110042311.AA01252@ptolemy.arc.nasa.gov>


I've made the following report available on the Neuroprose Archive
(cheops.cis.ohio-state.edu) as
		buntine.treecode.ps.Z
not because I think connectionists are "deeply" interested in tree learning
research but because I think it would be a handy resource for
comparative studies:
	1)    systems such as CART/C4 are recognised programs
	      for benchmarking supervised learning systems against
	2)    home-grown reimplementations can be buggy and a timesink
	3)    if your problem has some inherent structure and a few
	      key indicator variables then trees may be a good thing to
	      try as well
	4)    trees typically don't work well with purely numeric
	      data or with problems with many variables all giving some
	      minor contribution to the prediction being made

The IND Tree Package we developed here incorporates some of early C4,
most of the classification trees component of CART (no regression) along
with some more recent Bayesian/MDL approaches that sometimes work better.

You can obtain LaTeX source for the following introductory report
if you email to:

	ind at kronos.arc.nasa.gov

and ask for "About the IND Tree Package".

---------------------------------------
	
	About the IND Tree Package
	
	Wray Buntine, RIACS 
        NASA Ames Research Center 
        Mail Stop 269-2 
        Moffet Field, CA 94035 

 	September 29, 1991

   This note introduces the IND Tree Package to prospective procurers and
   those users/installers looking at IND for the first time.
   IND does supervised learning using classification trees.
   IND integrates features from Breiman {\it et al.}'s CART and Quinlan's C4 
   with newer Bayesian and minimum encoding methods for growing classification
   trees, and provides an experimental control suite on top.  The package comes
   with a manual, ``man'' entries, and a guide to tree methods and research.
   Information about obtaining IND, performance statistics,
   documentation, authorship, copyright, installation, etc., are given.
   IND is currently under development, although it has been used considerably
   since late 1989.  IND is implemented in C under UNIX.

----------------------------------------

Wray Buntine
RIACS (Research Inst. for Advanced Comp. Sc.)
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 244-17                          fax:    (415) 604 6997
Moffett Field, CA, 94035 		  email:  wray at ptolemy.arc.nasa.gov

From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU  Sun Oct  6 09:31:00 1991
From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU)
Date: Sun, 6 Oct 91 09:31 U
Subject: Thank's for help.
Message-ID: <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU>


From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU  Sun Oct  6 09:33:00 1991
From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU)
Date: Sun, 6 Oct 91 09:33 U
Subject: Thank's for help
Message-ID: <01GBEHHAOVDCD7QHLX@BITNET.CC.CMU.EDU>


From tackett at ipla00.dnet.hac.com  Sun Oct  6 00:08:02 1991
From: tackett at ipla00.dnet.hac.com (Walter Alden Tackett)
Date: Sun, 6 Oct 91 00:08:02 EDT
Subject: tree classification code available for comparative studies
Message-ID: <9110060708.AA10023@ipla00.ipl.hac.com>

Wray Buntine <wray at ptolemy.arc.nasa.gov> writes:
> not because I think connectionists are "deeply" interested in tree learning

...only in *dendritic* trees, maybe? ;-)
-wt

From aboulang at BBN.COM  Sun Oct  6 11:40:36 1991
From: aboulang at BBN.COM (aboulang@BBN.COM)
Date: Sun, 6 Oct 91 11:40:36 EDT
Subject: Detailed Balance
In-Reply-To: 7923509%TWNCTU01.BITNET@bitnet.cc.cmu.edu's message of Sun, 6 Oct 91 09:31 U <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU>
Message-ID: <mailman.449.1149540230.24850.connectionists@cs.cmu.edu>


   The property (2) is called detailed balance resulting in a Gibbs
   distribution for the probability to find the system in a particular
   state. The rule (1) is an update procedure for the spin Sk which
   ensure detailed balance provided that E is an energy. Both principles
   are fundamental facts of statistical mechanics of neural networks
   (or if you prefer result from an maximum entropy analysis of neural
   nets). The book by Hertz Krogh and Palmer summerizes all that in
   a nice way. The book title is "Introduction to Neural Computation".


We really should be saying that detailed balance in sampling implies a
Gibbs distribution, but that the Gibbs distribution does not imply the
use of a sampling procedure with detailed balance. There is some new
work on this:

J. Marroquin & A. Ramerez
"Stochastic Cellular Automata with Gibbsian Invariant Measures"
IEEE Trans Information Theory May(*), 1991

* I can't find the paper so I may have the month wrong.

This is potentially good news to people trying to get annealing-type
algorithms to work for fine-grained MIMD parallelism.

Regrads,
Albert Boulanger
aboulanger at bbn.com

From M.Stannett at dcs.sheffield.ac.uk  Sun Oct  6 00:10:13 1991
From: M.Stannett at dcs.sheffield.ac.uk (Mike Stannett)
Date: Sun, 6 Oct 91 00:10:13 BST
Subject: summary of concurrent semantics
Message-ID: <9110052310.AA15255@dcs.sheffield.ac.uk>


((This message is just over two pages of A4 long))

A very brief (incomplete) summary of concurrent semantics
---------------------------------------------------------

	(This description reflects my personal bias towards
	trace models; I apologise in advance to anyone who
	feels I've given an unbalanced account of the field.)

You will recall Russell's demonstration that mathematics early this
century was built on very dodgy ground. The search was on, and still
is, for a formal theory of mathematics itself - why is it sensible to
discuss some sets but not others?

This purely mathematical problem led directly to many aspects of
computer science that are now taken for granted. For example, Skolem
(c. 1934) realised that the derivation of Russell's paradox could be
avoided by introducing the notion of definition-by-recursion.
Meanwhile, Church was developing the lambda-calculus, Post was working
on his production systems, and Turing was introducing his machine
models and computational AI.

As a result, there is a wealth of structure available for discussing
the underlying nature of computational processes themselves. This is
essential in some cases. For example, we need to ensure that the code
we produce will generate the same behaviour when compiled on two
different systems; consequently, we need some way of describing the
semantics of this code (i.e. what it's supposed to mean) which is
machine-independent. There are several approaches to this problem, with
perhaps the most mathematical being 'denotational semantics', under
which all programs can be regarded as functional - a program becomes a
function which maps abstract 'inputs' to abstract 'outputs'.

For concurrent systems, this 'functional' view is insufficient. A
standard example concerns the use of shared variables: from a purely
sequential point of view, the two programs

	prog1:  x=0; x++; x++
	prog2:  x+0; x+=2

are identical, since they implement the same overall function. From the
concurrent point of view, they are NOT identical, because they can
interact with a third process in different ways. For example, if we run
first prog1 and then prog2 in the context of

	prog3:  x=10;

then the possible values of x on termination of the combined systems
are different

	prog1 | prog3  :  2, 10, 11, 12, error
	prog2 | prog3  :  2, 10, 12, error

depending on precisely when prog3 gets executed.

Accordingly, much of concurrent semantics is based on the idea that
processes should be regarded as active agents which interact with each
other. For example, we would reject the notion that the variable x is
just a passive entity which is operated upon; instead it becomes an
agent in its own right, which interacts with the processes that update
it.

Many solutions to the problem of correctly representing the semantics
of concurrent systems have been developed, and can be roughly divided
into two 'schools' - interleaving and non-interleaving. According to
the interleaving version, the sequences of activities that might be
performed by two systems running concurrently are just the
interleavings of the sequences for the systems taken individually. This
is the approach adopted in (the standard theories of) CCS and CSP. The
non-interleaving school argues that this representation is
inappropriate, and indeed unnecessary, since models of 'true'
concurrency are easy to develop (e.g. Petri nets). In the middle
ground, there are models such as 'Mazurkiewicz trace theory' which
consider the behaviour of a concurrent system to be represented by the
collection of ALL its possible action-sequences (rather than accepting
the notion that any one of these traces will do as a valid
representation). Nor is this a complete list of the approaches used;
for example, there is a growing tendency to use models based on
category theory and general topology, but I can't reasonably include
these in a short summary (besides, I don't know enough about them to
represent them accurately).

The key differences between the different approaches are in the way
they treat the relationship between time and causality. Given that we
are trying to describe a system based on the possible observations of
its behaviour, we have to be careful when we impute relationships that
may not exist. It may just happen, for example, that one event in a
system is always followed by another - but this doesn't mean that they
are causally related. Sometimes this doesn't matter, but problems can
arise when we introduce additional processes with which to interact. It
becomes very difficult to work out precisely how the models of
individual processes should be 'stuck together' to get a valid model of
the combined system.

Presumably this problem is reflected in difficulties faced by
connectionists in deciding what happens when large nets are considered
to be made up of more manageable sub-nets. Do you have a general theory
yet for deciding

	* what process is computed by a given net ?
	* what process is computed by a given combination of
	  smaller nets ?

If not, perhaps our two different disciplines could benefit from
talking to one another.


Some sources
============

Probably the best sources for results in semantics and concurrency are
the many volumes of the "Lecture Notes in Computer Science" series
from Springer-Verlag. In addition,


CCS:  The standard text is
	Robin Milner 1989 Communication and Concurrency
	Prentice-Hall International

CSP:  The standard text is
	C.A.R. (Tony) Hoare 1985 Communicating Sequential Processes
	Prentice Hall International

A good collection of papers that demonstrates the relationships
between the many approaches to concurrent semantics is

	Kwiatkowska M.Z., Shields M.W, and Thomas R.M. (eds)
	Semantics for concurrency, Leicester 1990
	BCS/Springer 'Workshops in Computing'
	ISBN 3-540-19625-0

I've also got a couple of recent tech. reports concerning
generalisations of trace theory for those who want them, but be
warned that these are of a highly technical nature, and may not be
of much relevance to you just yet. These are

	Kwiatkowska M.Z. and Stannett M.
	On transfinite traces
	CS-91-06

	Stannett M.
	Trace convergence over infinite alphabets
	CS-91-08

Best wishes,   Mike Stannett.

From rba at vintage.bellcore.com  Mon Oct  7 15:23:15 1991
From: rba at vintage.bellcore.com (Bob Allen)
Date: Mon, 7 Oct 91 15:23:15 -0400
Subject: No subject
Message-ID: <9110071923.AA12445@vintage.bellcore.com>

Subject:  Student Travel Grants for NIPS'91

Modest financial support for travel to the Neural Information Processing Systems 
(NIPS, Denver Dec 2-5, 1991) conference is available
to students and other young researchers who are active in neural networks research.
Those requesting support should send a one-page summary of their background and
research interests, a cirriculum vitae, and their email address to:

Dr. R.B. Allen
NIPS Treasurer
Bellcore
MRE 2A-367
445 South Street
Morristown, NJ  07960-1910

Travel grant check for those receiving awards will be available at the
conference registration desk.


From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU  Tue Oct  8 13:00:00 1991
From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU)
Date: Tue, 8 Oct 91 13:00 U
Subject: Thank's
Message-ID: <01GBHGORN4ZKD7Q01U@BITNET.CC.CMU.EDU>


From aboulang%BBN.COM at CARNEGIE.BITNET  Sun Oct  6 11:40:36 1991
From: aboulang%BBN.COM at CARNEGIE.BITNET (aboulang%BBN.COM@CARNEGIE.BITNET)
Date: Sun, 6 Oct 91 11:40:36 EDT
Subject: Detailed Balance
In-Reply-To: 7923509%TWNCTU01.BITNET@bitnet.cc.cmu.edu's message of Sun, 6 Oct
 91 09:31 U <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU>
Message-ID: <01GBFAUK6QK0D7QISN@BITNET.CC.CMU.EDU>

We really should be saying that detailed balance in sampling implies a
Gibbs distribution, but that the Gibbs distribution does not imply the
use of a sampling procedure with detailed balance. There is some new
work on this:
 
J. Marroquin & A. Ramerez
"Stochastic Cellular Automata with Gibbsian Invariant Measures"
IEEE Trans Information Theory May(*), 1991
 
* I can't find the paper so I may have the month wrong.
 
This is potentially good news to people trying to get annealing-type
algorithms to work for fine-grained MIMD parallelism.
 
Regrads,
Albert Boulanger
aboulanger at bbn.com

From PAR%DM0MPI11.BITNET at BITNET.CC.CMU.EDU  Tue Oct  8 12:03:11 1991
From: PAR%DM0MPI11.BITNET at BITNET.CC.CMU.EDU (Pal Ribarics)
Date: Tue, 08 Oct 91 16:03:11 GMT
Subject: NN Workshop
Message-ID: <01GBI4QH7740D7POVG@BITNET.CC.CMU.EDU>

*******************************************************************************
 
Dear Colleague ,
 
we would like to remind you of the deadline for sending abstracts to
the topical workshop on Neural Networks within the
 
  Second International Workshop on Software Engineering, Artificial
  Intelligence and Expert Systems for High Energy and Nuclear Physics
 
Talks will be selected by the Organizing Committee on the basis of a
detailed abstract to be submitted before:
 
                      15 October, 1991.
 
to the address below. You will also find a registration form which
was sent to you in a prior mail.
 
Best regards
 
B. Denby
C. Kiesling
C. Peterson
P. Ribarics
========================================================================
 
 
                    SECOND INTERNATIONAL WORKSHOP ON
 
              SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE
 
          AND EXPERT SYSTEMS FOR HIGH ENERGY AND NUCLEAR PHYSICS
 
                                1992
 
                          January 13 - 18
 
            L'AGELONDE  FRANCE-TELECOM  LA LONDE LES MAURES
 
                         BP 64      F-83250
 
 
                            REGISTRATION
 
 
NAME:                                     FIRSTNAME:
 
LABORATORY:                               COUNTRY
 
ADDRESS:
 
TEL:             FAX:                 TELEX:              E-MAIL:
 
 
HOTEL RESERVATION (Number of persons):
 
 
In the following you are expected to answer with the corresponding number
or character from the list above.
However if your interest is not mentioned in the list give a full
description.
 
 
WOULD YOU BE INTERESTED TO JOIN A WORKING GROUP OF THE ASTEC PROJECT ?
 
                       YES/NO
 
GROUP:
 
SUBGROUP:
 
 
WOULD YOU LIKE TO ATTEND TOPICAL WORKSHOPS OR TUTORIALS ?
 
 
WORKSHOPS:
 
 
TUTORIALS:
 
 
WOULD YOU LIKE TO PRESENT A TALK ?
 
                       YES/NO
 
TALK TITLE:
 
 
To be considered by the organizing committee, send an extended abstract
 
                     before Oct. 15, 1991 to:
 
 
 Michele Jouhet                 Marie-claude Fert
   CERN                          L.A.P.P. - IN2P3
  PPE-ADM                        B.P. 110
CH-1211 Geneve 23                F-74941 Annecy-Le-Vieux
            SWITZERLAND                        FRANCE
 
Tel:    (41) 22 767 21 23      Tel: (33) 50 23 32 45
Fax:    (41) 22 767 65 55      Fax: (33) 50 27 94 95
Telex:  419 000                Telex: 385 180 F
E-mail: jouhet at CERNVM
 
 
Workshop fee  :    700 FFr.               Student :   500 FFr.
 
Accommodation :   2000 FFr.    Accompagning Person: +1200 FFr.
 
To be paid by check:
   Title:    International Workshop
             CREDIT LYONNAIS/Agence Internationale
   Bank:     30002
   Guichet:  1000
   Account:  909154 V
   Address:  LYON REPUBLIQUE
 
The accommodation includes: hotel-room, breakfast, lunch and dinner
for 6 days.
 
Tennis, mountain bike and other activities will be available.
 
 
Denis Perret-Gallix Tel: (41) 22 767 62 93 E-mail: Perretg at CERNVM
                    Fax: (41) 22 782 89 23

From squires at cs.wisc.edu  Wed Oct  9 03:22:37 1991
From: squires at cs.wisc.edu (Charles Squires)
Date: Wed, 9 Oct 91 02:22:37 -0500
Subject: 3 reports available
Message-ID: <9110090722.AA17071@mozzarella.cs.wisc.edu>


           *** PLEASE DO NOT FORWARD TO OTHER LISTS ***

The following three working papers have been placed in the neuroprose
archive:

-Maclin, R. and Shavlik, J.W., Refining Algorithms with Knowledge-Based
 Neural Networks:  Improving the Chou-Fasman Algorithm for Protein Folding,
 Machine Learning Research Group Working Paper 91-2.

      Neuroprose file name:  maclin.fskbann.ps.Z

-Scott, G.M., Shavlik, J.W., and Ray, W.H., Refining PID Controllers
 using Neural Networks, Machine Learning Research Group Working Paper 91-3.

      Neuroprose file name:  scott.nnpid.ps.Z

-Towell, G.G. and Shavlik, J.W., The Extraction of Refined Rules from
 Knowledge-Based Neural Networks, Machine Learning Research Group Working
 Paper 91-4.

      Neuroprose file name:  towell.interpretation.ps.Z

The abstract of each paper and ftp instructions follow:

----------

    Refining Algorithms with Knowledge-Based Neural Networks:
     Improving the Chou-Fasman Algorithm for Protein Folding

                         Richard Maclin
                         Jude W. Shavlik

                     Computer Sciences Dept.
                University of Wisconsin - Madison
                    email: maclin at cs.wisc.edu


     We describe a method for using machine  learning  to  refine
algorithms represented as generalized finite-state automata.  The
knowledge in an automaton  is  translated  into  a  corresponding
artificial   neural   network,   and  then  refined  by  applying
backpropagation  to  a  set  of  examples.   Our  technique   for
translating  an  automaton  into  a  network  extends  the  KBANN
algorithm, a system that translates a set of propositional,  non-
recursive   rules  into  a  corresponding  neural  network.   The
topology and weights of the neural network are set  by  KBANN  so
that  the  network  represents  the  knowledge  in the rules.  We
present the extended system, FSKBANN, which  augments  the  KBANN
algorithm  to handle finite-state automata.  We employ FSKBANN to
refine the Chou-Fasman algorithm, a  method  for  predicting  how
globular  proteins  fold.   The  Chou-Fasman  algorithm cannot be
elegantly  formalized  using  non-recursive  rules,  but  can  be
concisely  described  as  a  finite-state  automaton.   Empirical
evidence shows that the refined  algorithm  FSKBANN  produces  is
statistically  significantly more accurate than both the original
Chou-Fasman algorithm and a  neural  network  trained  using  the
standard  approach.   We also provide extensive statistics on the
type of errors each of the three approaches makes and discuss the
need  for better definitions of solution quality for the protein-
folding problem.

----------

              Refining PID Controllers using Neural Networks

                   Gary M. Scott (Chemical Engineering)
                   Jude W. Shavlik (Computer Sciences)
                   W. Harmon Ray (Chemical Engineering)
                     University of Wisconsin


     The  KBANN  (Knowledge-Based  Artificial  Neural   Networks)
approach  uses  neural  networks  to refine knowledge that can be
written in the form of simple  propositional  rules.   We  extend
this idea further by presenting the MANNCON (Multivariable Artif-
icial Neural Network Control) algorithm by which the mathematical
equations governing a PID (Proportional-Integral-Derivative) con-
troller determine the topology and initial weights of a  network,
which  is  further  trained using backpropagation.  We apply this
method to the task of controlling the outflow and temperature  of
a water tank, producing statistically- significant gains in accu-
racy over both a standard neural  network  approach  and  a  non-
learning PID controller.  Furthermore, using the PID knowledge to
initialize the weights of the network produces statistically less
variation  in testset accuracy when compared to networks initial-
ized with small random numbers.

----------

     The Extraction of Refined Rules from Knowledge-Based Neural Networks

                         Geoffrey G. Towell
                           Jude W. Shavlik
            
                   Department of Computer Science
                      University of Wisconsin
                 E-mail Address: towell at cs.wisc.edu

Neural networks, despite their empirically-proven abilities, have been little
used for the refinement of existing knowledge because this task requires a
three-step process. First, knowledge in some form must be inserted into a
neural network. Second, the network must be refined. Third, knowledge must be
extracted from the network. We have previously described a method for the
first step of this process. Standard neural learning techniques can accomplish
the second step. In this paper, we propose and empirically evaluate a method
for the final, and possibly most difficult, step. This method efficiently
extracts symbolic rules from trained neural networks. The four major results
of empirical tests of this method are that the extracted rules:
(1) closely reproduce (and can even exceed) the accuracy
    of the network from which they are extracted; 
(2) are superior to the rules produced by
    methods that directly refine symbolic rules; 
(3) are superior to those produced by 
    previous techniques for extracting rules from trained neural networks;
(4) are ``human comprehensible.'' 
Thus, the method demonstrates that neural networks can be an effective tool
for the refinement of symbolic knowledge.  Moreover, the rule-extraction
technique developed herein contributes to the understanding of how symbolic
and connectionist approaches to artificial intelligence can be profitably
integrated.

----------

FTP Instructions:

     unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary

     ftp> get maclin.fskbann.ps.Z
OR...     get scott.nnpid.ps.Z
OR...     get towell.interpretation.ps.Z

     ftp> quit

     unix> uncompress maclin.fskbann.ps.Z
OR...      uncompress scott.nnpid.ps.Z
OR...      uncompress towell.interpretation.ps.Z

     unix> lpr maclin.fskbann.ps
OR...      lpr scott.nnpid.ps
OR...      lpr towell.interpretation.ps
           (or use whatever command you use to print PostScript)


From danielg at cogs.sussex.ac.uk  Wed Oct  9 07:07:27 1991
From: danielg at cogs.sussex.ac.uk (Daniel Glaser)
Date: Wed, 9 Oct 91 12:07:27 +0100
Subject: Restrictions on recurrent learning
Message-ID: <29747.9110091107@rsunx.cogs.susx.ac.uk>

I have been working on some simple recurrent networks as defined by
Jordan(1986) and Elman(1990), and am interested in the class of temporal
regularities that they can learn. In particular, how do they compare
with more general back propagation through time defined by the PDP
group(1986) and Werbos(1990) ?

In the Jordan/Elman nets, activation flows forward in time from
`copies' of units from previous cycles, and thus, during learning,
error only propagates backwards locally in time.

Does anyone know of any theoretical or empirical work on what these
different types of network can learn ?

If replies are addressed to me personally, I will post a summary in
due course.

Thanks
 
Daniel.

References:

Elman, J.~L. (1990).
Finding structure in time.
{\em Cognitive Science}, {\bf 14}:179--211.

Jordan, M.~I. (1986).
Attractor dynamics and parallelism in a connectionist sequential
 machine.
In {\em Proceedings of the Eighth Annual Meeting of the Cognitive
Science Society}, Hillsdale, NJ. Erlbaum.

Rumelhart, D.~E., McClelland, J.~L., \& Williams, R.~J. (1986).
Learning internal representations by error propagation.
In D.~E. Rumelhart \& J.~L. McClelland (Eds.), {\em Parallel
  Distributed Processing: Explorations in the Microstructure of Cognition},
  volume~1  chapter~8. Cambridge, MA: MIT Press/Bradford Books.

Werbos, P.~J. (1990).
Backpropagation through time: What it does and how to do it.
{\em Proceedings of the IEEE}, 78(10):1550--1560.


From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Wed Oct  9 14:05:33 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Wed, 09 Oct 91 14:05:33 -0400
Subject: Recurrent Cascade-Correlation Code
Message-ID: <mailman.450.1149540230.24850.connectionists@cs.cmu.edu>


Simulation code for the Recurrent Cascade-Correlation (RCC) algorithm,
previously available only in Common Lisp, has now been translated into C by
Conor Doherty of the University College of Dublin (Ireland).  This code is a
modification of the C program for original Cascade-Correlation, written by
Scott Crowder of Carnegie Mellon.  My thanks to Conor and Scott for their
help in making these programs available to the barbarian hordes who speak
only C.

For a description of this algorithm, see Scott E. Fahlman, "The Recurrent
Cascade-Correlation Architecture" in Advances in Neural Information
Processing Systems 3, edited by R. P. Lippmann, J. E. Moody, and D. S.
Touretzky, Morgan Kaufmann Publishers, 1991.  Alternatively, see the tech
report mentioned below.

The instructions for accessing any of this code via FTP are included at the
end of this message.

Scott E. Fahlman
School of Computer Science
Carnegie Mellon University

===========================================================================

Public-domain simulation programs for the Quickprop, Cascade-Correlation,
and Recurrent Cascade-Correlation learning algorithms are available via
anonymous FTP on the Internet.  This code is distributed without charge on
an "as is" basis.  There is no warranty of any kind by the authors or by
Carnegie-Mellon University.

Instructions for obtaining the code via FTP are included below.  If you
can't get it by FTP, contact me by E-mail (sef+ at cs.cmu.edu) and I'll try
*once* to mail it to you.  Specify whether you want the C or Lisp version.
If it bounces or your mailer rejects such a large message, I don't have
time to try a lot of other delivery methods.

I am maintaining an E-mail list of people using this code so that I can
notify them of any changes or problems that occur.  I would appreciate
hearing about any interesting applications of this code, and will try to
help with any problems people run into.  Of course, if the code is
incorporated into any products or larger systems, I would appreciate an
acknowledgement of where it came from.

If for some reason these programs do not work for you, please contact me
and I'll try to help.  Common errors: (1) Some people don't notice that the
symmetric sigmoid output units in cascor have a range of -0.5 to +0.5 (for
reasons that are mostly historical).  If you try to force this algorithm to
produce an output of +1.0 or +37.3, it isn't going to work.  (2) Note that
quickprop (which is used inside of Cascade-Correlation) is designed to
update the weights after every epoch, and it assumes that all the epochs
are identical.  If you try to run this code updating after every training
case, you will lose badly.  If you want to change the training set, it is
important to zero out the PREV-SLOPES and DELTAS vectors, and also to
re=build the caches in Cascade-Correlation.

HOW TO GET IT:

For people (at CMU, MIT, and soon some other places) with access to the
Andrew File System (AFS), you can access the files directly from directory
"/afs/cs.cmu.edu/project/connect/code".  This file system uses the same
syntactic conventions as BSD Unix: case sensitive names, slashes for
subdirectories, no version numbers, etc.  The protection scheme is a bit
different, but that shouldn't matter to people just trying to read these
files.

For people accessing these files via FTP:

1. Create an FTP connection from wherever you are to machine
"pt.cs.cmu.edu".  The internet address of this machine is 128.2.254.155,
for those who need it.

2. Log in as user "anonymous" with your own ID as password.  You may see an
error message that says "filenames may not have /.. in them" or something
like that.  Just ignore it.

3. Change remote directory to "/afs/cs/project/connect/code".  NOTE: you
must do this in a single operation.

4. At this point FTP should be able to get a listing of files in this
directory with DIR and fetch the ones you want with GET.  (The exact FTP
commands you use depend on your local FTP server.)

Current contents:

quickprop1.lisp		Original Common Lisp version of Quickprop.
quickprop1.c		C version by Terry Regier, U. Cal. Berkeley.
cascor1.lisp		Original Common Lisp version of Cascade-Correlation.
cascor1.c		C version by Scott Crowder, Carnegie Mellon
rcc1.lisp		Common Lisp version of Recurrent Cascade-Correlation.
rcc1.c			C version, trans. by Conor Doherty, Univ. Coll. Dublin
vowel.c			Code for Tony Robinson's vowel benchmark.
am4.tar.Z		Aspirin/Migraine code from MITRE.
backprop.lisp		Overlay for quickprop1.lisp.  Turns it into backprop.
---------------------------------------------------------------------------
Tech reports describing these algorithms can also be obtained via FTP.
These are Postscript files, processed with the Unix compress/uncompress
program.

Follow the steps for FTP access as above, but cd to directory

unix> ftp pt.cs.cmu.edu (or 128.2.254.155)
Name: anonymous
Password: <your user id>
ftp> cd /afs/cs/project/connect/tr
ftp> binary
ftp> get filename.ps.Z
ftp> quit
unix> uncompress filename.ps.Z
unix> lpr filename.ps   (or however you print postscript files)

For "filename", sustitute the following:

cascor-tr		Cascade-Correlation paper.
qp-tr			Paper on Quickprop and other backprop speedups.
rcc-tr			Recurrent Cascade-Correlation paper.
precision		Hoehfeld-Fahlman paper on Cascade-Correlation with
			limited numerical precision.


From B344DSL at UTARLG.UTA.EDU  Wed Oct  9 23:55:00 1991
From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU)
Date: Wed, 9 Oct 1991 22:55 CDT
Subject: Announcement and call for abstracts for Feb. conference
Message-ID: <01GBK4XORVOW000MGU@utarlg.uta.edu>

ANNOUNCEMENT AND CALL FOR ABSTRACTS

          WORKSHOP ON

OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS?

Sponsored by the Metroplex Institute for Neural Dynamics (MIND) and the Texas
SIG of the International Neural Network Society (INNS).  To be held at a loca-
tion to be announced in the Dallas-Fort Worth area, Thursday through Saturday,
February 6-8, 1992.

Confirmed speakers include:

	Stephen Grossberg (Boston University)
	Stephen Hampson (University of California, Irvine)
	Karl Pribram (Radford University)
	Harold Szu (Naval Surface Warfare Center)
	Graham Tattersall (University of East Anglia)

The focus of this conference will be twofold: (1) how to optimize different
aspects of neural and cognitive function and (2) whether particular natural or
artificial solutions to specific neural or cognitive problems are in fact opti-
mal.  Specific problems to which these optimality considerations are applied will be taken from many areas including goal direction and planning, adaptive cat-
egorization, sensory perception, and motor control.

The talks will be an hour each for invited speakers and 45 minutes each for contributed speakers, with time afterwards for questions.  Speakers will not be re-
quired to write a paper, but will be invited to contribute chapters to a book 
several months after the conference.  Books based on two previous MIND conferen-
ces  -- on Motivation, Emotion, and Goal Direction in Neural Networks and NeuralNetworks for Knowledge Representation and Inference -- are now being published
by Lawrence Erlbaum Associates.

Registration for the conference will be $80 for non-students, $20 for students,
with a $10 rebate for MIND or Texas SIG membership.  We will try to arrange for
discounted air fares from American Airlines as we have done in the past.  Those
interested in presenting should send me a short (1-3 paragraph) abstract by
December 1, 1991, using either e-mail, FAX, or snail mail.  Notification of ac-
ceptance will be given December 15, 1991.  We will not be holding parallel ses-
sions, so there are limitations on the number of speakers.  However, individu-
als who send high-quality abstracts that cannot be accommodated in actual talks
will have space to present their work in posters at the conference, and will 
also be invited to contribute to the book.

	Prof. Daniel S. Levine
	Department of Mathematics
	University of Texas at Arlington
	Arlington, TX 76019-0408 

	e-mail: b344dsl at utarlg.uta.edu
	FAX: 817-794-5802
	Telephone: 817-273-3598

From bessiere at imag.fr  Thu Oct 10 12:48:37 1991
From: bessiere at imag.fr (Pierre Bessiere)
Date: Thu, 10 Oct 1991 17:48:37 +0100
Subject: 4 reports available
Message-ID: <9110101648.AA09388@imag.imag.fr>


The following four papers/reports have been placed in the neuroprose
archive:

- Bessiere, P.; "Toward a synthetic cognitive paradigm: Probabilistic 
  Inference"; Conference COGNITIVA90, Madrid, Spain, 1990

      Neuroprose file name:  bessiere.cognitiva90.ps.Z

- Talbi, E-G. & Bessiere, P.; "A parallel genetic algorithm for the graph 
  partitioning problem"; ACM-ICS91 (Conference on Super Computing), Cologne,
  Germany, 1991

      Neuroprose file name:  bessiere.acm-ics91.ps.Z

- Bessiere, P., Chams, A. & Muntean, T.; "A virtual machine model for 
  artificial neural network programming"; INNC90 (International Neural
  Networks Conference), Paris, France, 1990

      Neuroprose file name:  bessiere.innc90.ps.Z

- Bessiere, P., Chams, A. & Chol, P.; "MENTAL: a virtual machine approach to
  artificial neural networks programming"; ESPRIT B.R.A. project NERVES (3049),
  final report, 1991

The abstract of each paper and ftp instructions follow:

                                ----------

        TOWARD A SYNTHETIC COGNITIVE PARADIGM: PROBABILISTIC INFERENCE

Cognitive science is a very active field of scientific interest. 
It turns out to be a "melting pot" of ideas coming from very 
different areas. One of the principal hopes is that some synthetic 
cognitive paradigms will emerge from this interdisciplinary 
"brain storming". The goal of this paper is to answer the question:
"Given the state of the art, is there any hints indicating the 
emergence of such synthetic paradigms?" The main thesis of 
the paper is that there is a good candidate, namely, the probabilistic 
inference paradigm.

In support of the above thesis the structure of the paper is as follows:
	- in a first part, we identify five criteria to qualify 
as a synthetic cognitive paradigm (validity, self consistency, 
competence, feasibility and mimetic power);
	- in the second paragraph, the principles of probabilistic 
inference are reviewed and justifications of validity and self 
consistency of this paradigm are given (Marr's computational level);
	- then, the competence criterion is discussed, considering 
the efficiency of probabilistic inference for dealing with the different 
classical cognitive riddles and analyzing the relationships of 
probabilistic inference with several of the usual connexionist 
formalisms (Marr's algorithmic level);
	- the criteria of feasibility (condition of computer implementation) 
and mimetic power (adequation with what is known of the architecture 
of the nervous system) are finally considered in the fourth 
part (Marr's implementation level).

As a conclusion, it will appear that probabilistic inference is at 
least a very interesting framework to get a synthetic overview of a 
number of works in the area and to identify and formalize the most 
puzzling questions. Some of these questions will be listed. 
In fact, probabilistic inference will appear finally to be
able to play the same role for computational cognitive science 
that formal logic has played for classical symbolic Artificial 
Intelligence: a sound mathematical foundation serving as a guide 
line, as a constant reference and as a source of inspiration.

                                ----------

     A PARALLEL GENETIC ALGORITHM FOR THE GRAPH PARTITIONING PROBLEM

Genetic algorithms are stochastic search and optimization techniques 
which can be usedf for a wide range of applications. This paper 
addresses the application of genetic algorithms to the graph 
partitioning problem. Standard genetic algorithms with large 
populations suffer from lack of efficiency (quite high execution time). 
A massively parallel genetic algorithm is proposed, an implementation on 
a SuperNode( of Transputers( and results of various benchmarks are given. 

The parallel algorithm shows a superlinear speed-up, in the sense 
that when multiplying the number of processors by p, the time 
spent to reach a solution with a given score, is divided by kp (k>1). 

A comparative analysis of our approach with hill-climbing algorithms 
and simulated annealing is also presented. The experimental 
measures show that our algorithm gives better results concerning both 
the quality of the solution and the time needed to reach it.

                                ----------

    A VIRTUAL MACHINE MODEL FOR ARTIFICIAL NEURAL NETWORK PROGRAMMING

This paper introduces the model of a virtual machine 
for A.N.N. (Artificial Neural Networks).

The context of this work is a collaborative project to study new 
V.L.S.I. implementations and new architectures for neuronal machines. The 
work consists in the specification and a prototype implementation of 
a description language for A.N.N., of the associated virtual 
machine, of the compiler between them and of the compilers mapping 
the virtual machine on different highly parallel computers.

In this short paper we present the virtual machine model which 
combines the features of various parallel programming paradigms. 
Our model allows, in particular, to have the same A.N.N. program 
running on both synchronous or asynchronous type of machines. In 
this framework a parallel architecture (S.M.A.R.T.) and a dynamically 
reconfigurable parallel machine of Transputers (SuperNode) are 
considered as target machines.

                                ----------

  MENTAL: A VIRTUAL MACHINE APPROACH TO ARTIFICIAL NEURAL NETWORKS PROGRAMMING
                          (ATTENTION: 100 pages)

This report treats (extensively) the same subject than the short paper
described just above. Some parts are extracted from the three previouly
presented papers.

                                ----------

These reports may be FTP from either neuroprose archives or from my
own server (IMAG):

How to get files from the Neuroprose archives?
______________________________________________

Anonymous ftp on:
	- archive.cis.ohio-state.edu (128.146.8.52)

mymachine>ftp archive.cis.ohio-state.edu
Name: anonymous
Password: yourname at youradress
ftp>cd pub/neuroprose
ftp>binary
ftp>get bessiere.foo.ps.Z
ftp>quit
mymachine>uncompress bessiere.foo.ps.Z

How to get files from IMAG?
___________________________

Anonymous ftp on:
	- 129.88.32.1

mymachine>ftp 129.88.32.1
Name: anonymous
Password: yourname at youradress
ftp>cd pub/SYMPA/NNandGA
ftp>binary
ftp>get bessiere.foo.ps.Z
ftp>quit
mymachine>uncompress bessiere.foo.ps.Z


-- 

Pierre BESSIERE
***************

IMAG/LGI                                  phone:
BP 53X                                    Work: 33/76.51.45.72
38041 Grenoble Cedex                      Home: 33/76.51.16.15
FRANCE                                    Fax:  33/76.44.66.75
                                          Telex:UJF 980 134 F

E-Mail: bessiere at imag.imag.fr

C'est au savant moderne que convient, plus qu'a tout autre, l'austere
conseil de Kipling: "Si tu peux voir s'ecrouler soudain l'ouvrage de ta vie, 
et te remettre au travail, si tu peux souffrir, lutter, mourrir sans murmurer,
tu seras un homme , mon fils." Dans l'oeuvre de la science seulement on peut 
aimer ce qu on detruit, on peut continuer le passe en le niant, on peut venerer
son maitre en le contredisant.       GASTON BACHELARD

From gary at cs.UCSD.EDU  Thu Oct 10 13:26:04 1991
From: gary at cs.UCSD.EDU (Gary Cottrell)
Date: Thu, 10 Oct 91 10:26:04 PDT
Subject: Restrictions on recurrent learning
Message-ID: <9110101726.AA24233@desi.ucsd.edu>

Fu-Sheng Tsung and I showed there were problems that a
hidden-recurrent (Elman-style) net can learn that an output-recurrent
Jordan net can't in our 1989 paper in IJCNN:

Tsung, Fu-Sheng and Cottrell, G. (1989) A sequential adder using recurrent
networks. In \fIProceedings of the International Joint Conference on
Neural Networks\fP, Washington, D.C.

A similar paper with some state space analysis is in:
Cottrell, G. and Fu-sheng Tsung (1991). Learning simple arithmetic procedures.
In J.A. Barnden & J.B. Pollack (Eds),
\fIAdvances in connectionist and neural computation theory, Vol 1:
High-level connectionist models\fP, Norwood: Ablex.

There are simple logical arguments that show that hidden-recurrent
nets are more powerful than output-recurrent nets. The bottom line is
that if there is a problem where the teaching signal forces
"forgetting" of the input, then a Jordan-style output-recurrent
network cannot respond to things that require remembering it.

Hal White also believes Elman nets are strictly more powerful than Jordan
nets, but I'm not sure he has a proof.

gary cottrell 619-534-6640 Sec'y: 619-534-5288 FAX: 619-534-7029
Computer Science and Engineering C-014
UCSD, 
La Jolla, Ca. 92093
gary at cs.ucsd.edu (INTERNET)
{ucbvax,decvax,akgua,dcdwest}!sdcsvax!gary (USENET)
gcottrell at ucsd.edu (BITNET)

From ECONEC at vax.oxford.ac.uk  Fri Oct 11 11:39:00 1991
From: ECONEC at vax.oxford.ac.uk (ECONEC@vax.oxford.ac.uk)
Date: Fri, 11 Oct 91 11:39 BST
Subject: REQUEST FOR INFORMATION: NNs AND ECONOMICS
Message-ID: <mailman.451.1149540230.24850.connectionists@cs.cmu.edu>

REQUEST FOR INFORMATION

I am studying for an MLitt/DPhil at the Oxford University and would be very
grateful for some help. This message is being transmitted to several relevant
lists and please feel free to forward it to anyone who might be interested.
Apologies in advance to anyone who gets fed up with seeing it!

1) REQUEST: I am interested in references and names for work broadly in the 
area of AI techniques applied to economics. To narrow this down, I am 
interested in AI as a tool for developing alternative models of economic 
behaviour than the traditional view of man as a perfectly informed calculating
machine! Because of the behavioural aspect and my preference for economic
theory I am hoping to avoid work that simply uses AI techniques to solve
traditional models faster. (GAs as function optimisers for instance.) Similarly
I am not seeking information on decision support or Expert Systems unless they
make some attempt (or claim) to emulate human decision making behaviour.
(Default Logics? Frames?) Please err on the side of completeness!

2) OFFER: Obviously I can provide summaries of my findings to various lists 
in the usual way. (Perhaps you could say where you saw my post so I can keep
the summaries relevant to each list.) What I would also like to do is find out
whether there is any interest in an adhoc email list of people working in this
area. Or if there is one already I would very much like to hear about it. I'm
sure such things have been going for years in the US but information here in the
UK seems very sparse. I would be quite happy to "maintain" an unofficial
bulletin board or mailing list if one does not exist.

Many thanks in advance for any help and please feel free to contact me on any
aspect of this posting.

Edmund Chattoe

SNAIL: LADY MARGARET HALL
       OXFORD
       OXON
       OX2 6QA


From lyn at dcs.exeter.ac.uk  Fri Oct 11 13:56:49 1991
From: lyn at dcs.exeter.ac.uk (Lyn Shackleton)
Date: Fri, 11 Oct 91 13:56:49 BST
Subject: special deal for Connection Science
Message-ID: <11273.9110111256@castor.dcs.exeter.ac.uk>

 
**********	CONNECTION SCIENCE SPECIAL ISSUE    ******************


     CONNECTIONIST MODELLING OF PSYCHOLOGICAL PROCESSES

VOLUME 3.2 (out now)


EDITOR 
Noel Sharkey

SPECIAL BOARD
Jim Anderson
Andy Barto
Thomas Bever
Glyn Humphreys
Walter Kintsch
Dennis Norris
Kim Plunkett
Ronan Reilly
Dave Rumelhart
Antony Sanford

CONTENTS

J R Levenick:NAPS: a connectionist implementation of cognitive maps.

A Pouget & S J Thorpe: Connectionist models of orientation
identification.

D R Shanks: A connectionist account of base-rate biases in
categorization.

A J O'Toole, K Deffenbacher, H Abdi & J Bartlett: Simulating the
"Other-race effect" as a problem in perceptual learning.

S Kaplan, M Sonntag & E Chown: Tracing recurrent activity of cognitive
elements (TRACE): a model of temporal dynamics in a cell assembly.

Research Notes:
A H Kawamoto & S N Kitzis: Time course of regular and irregular pronunciations.


A VERY SPECIAL DEAL FOR MEMBERS OF THE CONNECTIONISTS MAILING.

Prices for members of this list will now be:

North America 44 US Dollars (reduced from 126 dollars)

Elsewhere and U.K. 22 pounds sterling.
(Sterling checks must be drawn on a UK bank)

These rates start from 1st January 1992 (volume 4).

Conditions:
1. Personal use only (i.e. non-institutional).
2. Must subscribe from your private address.

You can receive a subscription form by emailing direct to the publisher:

email: carfax at ibmpcug.co.uk 
Say for the attention of David Green and say CONNECTIONISTS MAILING LIST.

noel


From mclennan at cs.utk.edu  Fri Oct 11 17:43:22 1991
From: mclennan at cs.utk.edu (mclennan@cs.utk.edu)
Date: Fri, 11 Oct 91 17:43:22 -0400
Subject: report: contin. symbol systems
Message-ID: <9110112143.AA01451@maclennan.cs.utk.edu>


    ** Please do not forward to other boards.  Thank you. **

The following technical report has been placed in the Neuroprose
archives at Ohio State.  Ftp instructions follow the abstract.
N.B.  The uncompressed file is long (1.82 MB), so you may have to
use the -s (symbolic link) option on lpr to print it.

      -----------------------------------------------------

                    Continuous Symbol Systems
                   The Logic of Connectionism

                         Bruce MacLennan
                   Computer Science Department
                     University of Tennessee
                       Knoxville, TN 37996
                      maclennan at cs.utk.edu

                   Technical Report CS-91-145

                            ABSTRACT:

It has been long assumed that knowledge and thought are most
naturally represented as _discrete_symbol_systems_ (calculi).
Thus a major contribution of connectionism is that it provides an
alternative model of knowledge and cognition that avoids many of
the limitations of the traditional approach.  But what idea
serves for connectionism the same unifying role that the idea of
a calculus served for the traditional theories? We claim it is
the idea of a _continuous_symbol_system_.

This paper presents a preliminary formulation of continuous sym-
bol systems and indicates how they may aid the understanding and
development of connectionist theories.  It begins with a brief
phenomenological analysis of the discrete and continuous; the aim
of this analysis is to directly contrast the two kinds of symbols
systems and identify their distinguishing characteristics. Next,
based on the phenomenological analysis and on other observations
of existing continuous symbol systems and connectionist models, I
sketch a mathematical characterization of these systems.  Finally
the paper turns to some applications of the theory and to its
implications for knowledge representation and the theory of com-
putation in a connectionist context. Specific problems addressed
include decomposition of connectionist spaces, representation of
recursive structures, properties of connectionist categories, and
decidability in continuous formal systems.

A preliminary version of this paper was presented at the workshop
"Neural Networks for Knowledge Representation, Fourth Annual
Workshop of the Metroplex Institute for Neural Dynamics (MIND),"
Westlake TX, October 4-6, 1990.  Also presented at "ConnectFest
1990," sponsored by Indiana University Center for Research in
Concepts and Cognition, November 3-4, 1990.

      -----------------------------------------------------
                        FTP INSTRUCTIONS

Either use "Getps maclennan.css.ps.Z", or do the following:

     unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary
     ftp> get maclennan.css.ps.Z
     ftp> quit
     unix> uncompress maclennan.css.ps.Z
     unix> lpr -s maclennan.css.ps (or however you print postscript)

Note that the postscript version is missing three (nonessential)
figures that have been pasted into the hardcopy version.  If you
need hardcopy, then send your request to:

                       library at cs.utk.edu

     Bruce MacLennan
     Department of Computer Science
     107 Ayres Hall
     The University of Tennessee
     Knoxville, TN 37996-1301

     (615)974-0994/5067
     FAX: (615)974-4404
     maclennan at cs.utk.edu


From david at cns.edinburgh.ac.uk  Sun Oct 13 17:20:34 1991
From: david at cns.edinburgh.ac.uk (David Willshaw)
Date: Sun, 13 Oct 91 17:20:34 BST
Subject: Computational Neureoscientist post
Message-ID: <4519.9110131620@subnode.cns.ed.ac.uk>


                         UNIVERSITY OF OXFORD
                 MRC Centre in Brain and Behaviour


The Medical Research Council has awarded a 7-year grant to establish a
Research Centre in Brain and Behaviour, based at the University of
Oxford, and also involving scientists from other universities
including Birmingham, Cambridge, Durham, Edinburgh and London.

The main theme of the Research Centre is the organisation, function,
development and disorders of the cerebral cortex, and central to this
theme is the exploration of the cortex as an instrument of
computation. To this end, the Centre carries out research involving
many different methodologies, in the areas of sensory systems,
learning and memory, and motor control.

Applications are invited for the post of Computational Neuroscientist
to work on theoretical aspects of learning and memory. The post will
be based at the University of Edinburgh, where the post-holder will be
expected to spend 80% of his/her time. The remaining time will be
spent in linking with complementary work being carried out by other
participants of the Centre, particularly at the universities of Oxford
and Cambridge.

A range of projects is available, and prospective applicants are
encouraged to discuss their plans with Dr David Willshaw of the
University of Edinburgh. Two possibilities which are compatible with
present work are:

1)  Development of a model of the mammalian hippocampal formation as
    an associative memory;

2)  Investigation of associative and error-correcting models of
    cerebellar function as implemented in a biologically realistic form.


This appointment, which is available from January 1992 for 2 years in
the first instance and potentially renewable for a further 4 years,
will be made on the RS1A scale (currently 11,969-19,073 pounds p.a.
with a discretionary scale rising to 21,391 pounds p.a.).

Applications (including the name and address of two referees) should
be sent to Ms Catherine Greasley, Administrative Secretary, MRC
Research Centre in Brain and Behaviour, Department of Experimental
Psychology, University of Oxford, South Parks Road, Oxford, OX1 3UD
(telephone (0865) 271364 - mornings only) no later than Friday 8
November 1991.


      The University of Oxford is an Equal Opportunities Employer


David Willshaw
Centre for Cognitive Science
2 Buccleuch Place
Edinburgh EH8 9LW
UK

Tel: (+44) 31 650 4404
Fax: (+44) 31 650 4587
Email: d.willshaw at edinburgh.ac.uk


From harnad at Princeton.EDU  Sun Oct 13 19:51:05 1991
From: harnad at Princeton.EDU (Stevan Harnad)
Date: Sun, 13 Oct 91 19:51:05 EDT
Subject: Newell's Unified Theories of Cognition: BBS Call for Book Reviewers
Message-ID: <9110132351.AA08163@psycho>

Below is the abstract of a book that will be accorded multiple book
review in Behavioral and Brain Sciences (BBS), an international,
interdisciplinary journal that provides Open Peer Commentary on
important and controversial current research in the biobehavioral and
cognitive sciences. Commentators must be current BBS Associates or
nominated by a current BBS Associate. To be considered as a commentator
on this book, to suggest other appropriate commentators, or for
information about how to become a BBS Associate, please send email to:

harnad at clarity.princeton.edu  or harnad at pucc.bitnet        or write to:
BBS, 20 Nassau Street, #240, Princeton NJ 08542  [tel: 609-921-7771]

To help us put together a balanced list of commentators, please give some
indication of the aspects of the topic on which you would bring your
areas of expertise to bear if you are selected as a commentator.
____________________________________________________________________
          BBS Multiple Book Review of:

		     UNIFIED THEORIES OF COGNITION
		     (Harvard University Press, 1990)

		     Allen Newell
		     School of Computer Science
		     Carnegie-Mellon University

This book presents the case that cognitive science should turn its
attention to developing theories of human cognition that cover the full
range of human perceptual, cognitive, and action phenomena. Cognitive
science has now produced a massive number of high quality regularities
with many microtheories that reveal important mechanisms. The need for
integration is pressing and will continue to increase. Equally
important, cognitive science now has the theoretical concepts and tools
to support serious attempts at unified theories. The argument is made
entirely by presenting an exemplar unified theory of cognition both to
show what a real unified theory would be like and to provide convincing
evidence that such theories are feasible. The exemplar is Soar, a
cognitive architecture realized as a software system. After a
detailed discussion of the architecture and its properties, with its
relation to the constraints on cognition in the real world and to
existing ideas in cognitive science, Soar is used as a theory for a
wide range of cognitive phenomena: immediate responses
(stimulus-response compatibility and the Sternberg phenomena); discrete
motor skills (transcription typing); memory and learning (episodic
memory and the acquisition of skill through practice); problem solving
(cryptarithmetic puzzles and syllogistic reasoning); language (sentence
verification and taking instructions); and development (transitions in
the balance beam task). The treatments vary in depth and adequacy, but
they clearly reveal a single, highly specific, operational theory that
works over the entire range of human cognition. Soar is presented as an
exemplar unified theory, not as the sole candidate. Cognitive science
is not ready yet for a single theory -- there must be multiple
attempts. But cognitive science must begin to work towards such unified
theories.

From kamil at apple.com  Mon Oct 14 19:41:34 1991
From: kamil at apple.com (Kamil A. Grajski)
Date: Mon, 14 Oct 91 16:41:34 -0700
Subject: batch-mode parallel implementations
Message-ID: <9110142341.AA19545@apple.com>

Hi folks,

In reviewing some implementations of back-prop type algorithms on
parallel machines, it is apparent that several such implementations
obtain their high performance because of batch-mode training.
What this means is that one operates on N independent training
patterns simultaneously and then collects all the weight update
information and reestimates once per N samples.  Example where
this has been used (among others) are the GF-111, MasPar, CM-2,
Warp (I think, at least for a self-org feature map implementation),
etc.  In many papers, I have read passing references to the fact that
real-time learning is preferred (in practice) over the theoretically
indicated batch-mode (so-called "true gradient") learning.  Some of
the arguments given include "faster" convergence and "better"
generalization.  Are the convergence and generalization arguments
linked at some deeper level of analysis?  (You could have fast
convergence which generalizes poorly, etc.) I have played with this
just a little bit on small speech and other datasets without reaching
any conclusive results.

I am wondering whether there have been some definitive studies,
theoretical and/or practical which really confront this issue?
How big an issue is this for people?  For example, would you NOT
look at a parallel design which assumes batch-mode training?

Kamil
P.S.  If this is a dead issue and I missed the funeral, I apologize.

================
Kamil A. Grajski
Apple Computer
(408) 974-1313
kamil at apple.com
================

From B344DSL at UTARLG.UTA.EDU  Mon Oct 14 14:14:00 1991
From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU)
Date: Mon, 14 Oct 1991 13:14 CDT
Subject: Announcement of talk by Pribram at Georgetown, Oct. 18
Message-ID: <01GBQK3RGXY80003LS@utarlg.uta.edu>

From:	IN%"PRUEITT at guvax.georgetown.edu"  "Paul S. Prueitt" 14-OCT-1991 12:29:48.05
To:	IN%"kpribram at ruacad.ac.runet.edu"  "kpribram"
CC:	IN%"duziakm at isnet.inmos.com"  "duziakm", IN%"pwerbos at note.nsf.gov"  "pwerbos", IN%"liwu at aic.nrl.navy.mil"  "liwu", IN%"kugler at rucs2.sunlab.cs.runet.edu"  "kugler", IN%"medsker at AUVM.BITNET"  "medsker", IN%"b344dsl at UTARLG.UTA.EDU"  "b344dsl", IN%"prueitt
Subj:	Pribram's Talk on Friday


From PRUEITT at guvax.georgetown.edu  Mon Oct 14 14:15:00 1991
From: PRUEITT at guvax.georgetown.edu (Paul S. Prueitt)
Date: 14 Oct 91 13:15:00 EST
Subject: Pribram's Talk on Friday
Message-ID: <01GBQIANVZ28000315@utarlg.uta.edu>


Please Communicate within your group

*********************Please post and forward on E-mail*******************

********************
Georgetown University 
Physics Department
and Neural Network Research Facility

1991-92 Colloquium Series on
Behavioral and Computational Neuroscience

Friday, October 18th
4:00 P.M. to 6:00 P.M.
Auditorium Room 112 Reiss Building, Georgetown University
Refreshments at 3:30 P.M. in Room 505

Dr. Karl Pribram
****************
Center for Brain Research and Informational Sciences, Radford University

Brain and Perception, 
Holonomy and Structure in Figural Processing

Dr. Pribram will discuss topics from his new book; Brain and Perception, 
Holonomy and Structure in Figural Processing.  A one hour prepared 
lecture is to be followed by a one hour discussion.  The book is 
now available from Dr. Edward J. Finn, Chairman of the G.U. Physics 
Department or from Lawrence Erlbaum Associates. 

Professor Pribram will autograph copies of the book after the Colloquium.
*************************************************************************


For additional information please call Edward Finn at 202-687-6231.

Parking: Use Georgetown Univ. Entrance One from Reservoir Road 
(Northern Boundary)


********************


From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Tue Oct 15 01:23:47 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Tue, 15 Oct 91 01:23:47 -0400
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Mon, 14 Oct 91 16:41:34 -0800.
             <9110142341.AA19545@apple.com> 
Message-ID: <mailman.452.1149540230.24850.connectionists@cs.cmu.edu>


I don't recall seeing any studies that claim better generalization for
per-sample or continuous updating than for per-epoch or batch updating.
Can you supply some citations?  The only reason I can think of for better
generalization in the per-sample case would be a weak sort of
simulated-annealing effect, with the random variation among individual
training samples helping to jiggle the system out of small local minima in
the vicinity of the best answer.

As for speed of convergence, continuous updating clearly beats per-epoch
updating if the training set is highly redundant.  To see this, imagine
taking a small set or training cases, duplicating that set 1000 times, and
presenting the resulting huge set as the training set.  Per-sample updating
would probably have converged on a good set of weights before the first
per-epoch weight adjustment is ever made.  Also, in some cases it just is
not practical to use per-epoch updating.  There may be a stream of
ever-changing data going by, and it may be impractical to store a large set
of samples from this data stream for repeated use.

On the other hand, it is rather dangerous to use continuous updating with
high learning rates or with techniques that adjust the learning rate based
on some sort of second-derivative estimate.  If you are not very careful, a
few atypical cases in a row can accelerate you right out of the solar
system.  Some techniques, such as quickprop and most of the conjugate
gradient methods, depend on the ability to look at the same set of training
examples more than once, so they inherently are per-epoch models.

In my opinion, the best solution in most situations is probably to use one
of the accelerated convergence methods and to update the weights after an
"epoch" whose size is chosen by the experimenter.  It must be sufficiently
large to give a reasonably stable picture of the overall gradient, but not
so large that the gradient is computed many times over before a
weight-update cycle occurs.  However, I am sure that this view is not
universally accepted: some people seem to believe that per-sample updating
is superior in all cases.

-- Scott Fahlman

From castillo at eel.upc.es  Tue Oct 15 11:16:28 1991
From: castillo at eel.upc.es (Francisco Castillo Cobo)
Date: Tue, 15 Oct 1991 11:16:28 UTC+0100
Subject: add to NEURAS-LIST
Message-ID: <"114*/S=castillo/OU=eel/O=upc/PRMD=iris/C=es/"@MHS>

	Hi, I am currently compiling a list of incremental (or growing) neural network
s, I have some already identified, including RCE and Tiling. I am interested in
 receiving additional references on the matter and would be glad to summarize t
he responses and se
nd them to  anyone who might be interested.

	Thanx!

	F.Castillo

From ecal at cgref.cemagref.fr  Tue Oct 15 08:06:13 1991
From: ecal at cgref.cemagref.fr (European Conference on Artificial Life)
Date: Tue, 15 Oct 91 12:06:13 GMT
Subject: ECAL91 programme
Message-ID: <9110151206.AA11528@cgref>

Please find enclosed an E-mail version of ECAL91 programme (more up-to-date 
than the paper programme). You can use the registration form enclosed,
granted that you send your payment by regular mail at the given address. 


=====================CUT HERE=====================CUT HERE======================

1st European Conference on Artificial Life
________________________________________________________________________________


		PROGRAMME - PROGRAMME - PROGRAMME- PROGRAMME

________________________________________________________________________________

EEEEEEE        CCCCCC         AA         LL                     99           11
EE            CC            AA AA        LL                   99  99         11
EE           CC            AA   AA       LL                  99    99        11
EE          CC            AA     AA      LL                   99   99        11
EEEEE       CC            AAAAAAAAA      LL                     99999        11
EE          CC            AA     AA      LL                        99        11
EE           CC           AA     AA      LL                        99        11
EE            CC          AA     AA      LL                       99         11
EEEEEEE        CCCCC      AA     AA      LLLLLLLL              9999          11

________________________________________________________________________________


To be held on December 11-13 1991


in	Centre des Congres de la Villette
	Salle Laser
	cite des Sciences et de l'Industrie
	Paris, France


Publisher : MIT Press / Bradford Books

Sponsors :

la Cite
CEMAGREF
Banque de France
CNR
Fondation de France
AFCET
Electricite de France
CREA
Offilib

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


Artificial life: a new scientific field


Artificial life embodies a recent and important conceptual step in modern
science: asserting that the core of intelligence and cognitive abilities is the
same as the capacity for living. Metaphorically, artificial life would see in
the modest insect rather than in the symbolic abilities of an expert the best
prototype for intelligence . What needs to be understood and characterized is
the class of processes that endow living creatures with their characteric
autonomy, key properties such as viability, abduction and adaptability. The
autonomy of the living beings is understood here both with regards to their
actions and to the way in which they shape their world into significance. This
exploration goes hand in hand with the theory, design and construction of
simple autonomous agents.

The recent surge of interest in 'artificial life' has to be understood in the
context of the long tradition inaugurated with cybernetics, seeking common
basis for the living and the artificial. Artificial life can take advantage of
the years of research in the tradition of symbolic computation that still
characterizes most of the research in artificial intelligence, as well as the
more recent explosive development of neural networks and connectionist
approaches. Artificial life also induces a renewal of a whole range of
engineering traditions, such as control theory and robotics, beyond classical
notions of goal and planning, into biologically inspired notions of viability
and adaptation, situatedness and operational closure, thus putting evolutionary
processes at the very center of the stage.

The first European meeting intends to highlight the practice of such autonomous
systems in all their forms, by hosting the presentation and discussion of the
most recent research in the area. Beyond research results, another main
intention of the meeting is to engage researchers and philosophers to examine
the epistemological basis of this new trend. Only a sustained analysis of the
main concepts and ideas can provide a fertile ground for important advances and
a change of research paradigm.


Conference Chairs : Paul Bourgine and Francisco Varela


Programme Committee :

	H. Bersini, B				Ch. G. Langton, USA
	R. Brooks, USA				J. A. Meyer, F
	J. Demongeot, F				H.Schwefel, FRG
	B. Goodwin, UK				D. Parisi, I
	S. Kauffman, USA


Organizing Committee :

	I. Alvarez				V. Douzal
	L. Bochereau				T. Fuhs
	G. Deffuant

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


Wednesday December 11


8:00    REGISTRATION

9:30    WELCOME ADDRESS
Paul BOURGINE, CEMAGREF - (F),  Francisco VARELA, CREA - (F)

9:45            AUTONOMOUS ROBOTS (I)

Invited lecture:
Rodney BROOKS, MIT - (USA)
"Robots and artificial life"

Uwe SCHNEPF, GMD - (FRG), Mukesh J. PATEL, University of Sussex - (UK)
"Concept formation as Emergent Phenomena"

Rolf PFEIFER, Free University of Brussels - (B), Paul VERSCHURE, Univ. of
California, Santa Cruz (USA)
"Distributed adaptive control : a paradigm for autonomous agents"

Break / refreshments

Tim SMITHERS, University of Edimburgh - (UK)
"Taking eliminative materialism seriously : a methodology for autonomous
systems research"

Leslie P. KAELBLING, Brown University - (USA)
"An adaptable mobile robot"

Pattie MAES, MIT - (USA)
"Learning behavior networks from experience"

13:15   LUNCH

14:30           SWARM INTELLIGENCE

Invited lecture:
Jean-Louis DENEUBOURG,  Free University of Brussels - (B)
"Swarm-made architecture"

Alberto COLORNI, Marco DORIGO, Vittorio MANIEZZO, Politecnico di Milano - (I)
"Distributed optimization by ant colonies"

Andrew M. ASSAD, Univ. of Illinois - (USA), Norman H. PACKARD, Inst. for
Scientific Interchange - (I)
"Emergent colonization in an artificial ecology"

Gerardo BENI, Susan HACKWOOD, Univ. of California, Riverside - (USA)
"The maximum entropy principle and sensing in swarm intelligence"

Break / refreshments

17:00           EPISTEMOLOGICAL ISSUES

Stefan HELMREICH, Stanford University - (USA)
"The historical and epistemological ground of von Neumann's theory of
self-reproducing automata and theory of games"

Jean-Luc DORMOY, EDF - (F), Sylvie KORNMAN,  LAFORIA - (F)
"Meta-knowledge, autonomy and (artificial) evolution : some lessons learnt so
far"


18:00           POSTERS AND DEMOS

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


Thursday December 12


9:00            EPISTEMOLOGICAL ISSUES (Continued)

R. Allen GARDNER, Beatrix T. GARDNER, University of Nevada - (USA)
"A feedforward model of animal learning"

Bernard MANDERICK, Free University of Brussels - (B)
"Selectionist systems as cognitive systems"

Break / refreshments

10:15           AUTONOMOUS ROBOTS (II)

Ian HORSWILL, MIT - (USA)
"Characterizing adaptation by constraint"

Didier KEYMEULEN, Jo DECUYPER, Free University of Brussels - (B)
"On the self-organizing properties of topological maps"

Piet SPIESSENS, Jan TORREELE, Free University of Brussels - (B)
"Massively parallel evolution of recurrent networks : an approach to temporal
processing"

Dave CLIFF, University of Sussex - (UK)
"Neural networks for visual tracking in an artificial  fly"

12:45   LUNCH

14:15           LEARNING AND EVOLUTION

Invited lecture:
Domenico PARISI, Stefano NOLFI, Federico CECCONI, CNR - (I)
"Learning, behaviour, and evolution"

Hugues BERSINI, Free University of Brussels - (B)
"Immune network and adaptive control"

Franck HOFFMEISTER, Thomas BACK , University of Dortmund - (FRG)
"Genetic self-learning"

Heinz MUHLENBEIN, GMD - (FRG)
"Darwin's continent cycle theory and its simulation by the Prisoner's dilemna"

Break / refreshments

Melanie MITCHELL, John H. HOLLAND, University of Michigan - (USA), Stephanie
FORREST, University of New Mexico - (USA)
"The royal road for genetic algorithms : fitness landscapes and GA performance"

Brad FULLMER, Risto MIIKKULAINEN,  University of Texas - (USA)
"Evolving finite state behaviour using marker-based genetic encoding of neural
networks"


18:00   Invited lecture:
Stuart KAUFMANN , University of Pennsylvania - (USA)
"Waiting for Carnot"


20:30   DINNER

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


Friday December 13


9:30            ADAPTIVE AND EVOLUTIONARY MECHANISMS


Barry McMULLIN, Dublin City University - (UK)
"The Holland alpha-Universes revisited"

Robert J. COLLINS, David R. JEFFERSON, University of California - (USA)
"The evolution of sexual selection and female choice"

Filippo MENCZER, Domenico PARISI,  CNR - (I)
"A model for the emergence of sex in evolving networks : adaptive advantage or
random drift ?"

Break / refreshments

Inman HARVEY, University of Sussex  - (UK)
"Species adaptation genetic algorithms : a basis for a continuing SAGA"

Jakob SKIPPER, Niels Bohr Institute - (Dk)
"The complete zoo evolution in a box"

Jeffrey HORN, University of Illinois - (USA)
"Measuring the evolving complexity of stimulus-response organisms"


13:15   LUNCH


14:30           CONCEPTUAL FOUNDATIONS


Hugues BERSINI, Free University of Brussels - (B)
"Animat's I"

Claus EMMECHE, Institute of Computer and Systems Sciences - (Dk)
"Life as an abstract phenomenon : is Artificial Life possible ?"

John STEWART - Paris (F)
"Life=cognition : the epistemological and ontological signifance of Artificial
Life"

Break / refreshments

Peter CARIANI, Boston - (USA)
"Some epistemological implications of devices which construct their own sensors
and effectors"

Mark A. BEDAU, Reed College - (USA)
"Philosophical aspects of Articial Life"


17:30   CONCLUDING REMARKS

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________

POSTER SESSION

Petr KURKA, Charles University - (Cz)
"Natural Selection in a population of automata"

Thomas BACK, University of Dortmund - (FRG)
"Self-adaptation in genetic algorithms"

Robert DAVIDGE, University of Sussex - (UK)
"Looking at life"

Hugo de GARIS, Free University of Brussels - (B)
"Streerable GenNets : the genetic programming of controllable behaviors in
GenNets"

Bruno MARCHAL, Free University of Brussels - (B)
"Amoeba, planaria and dreaming machines"

Alexis DROGOUL, Jacques FERBER, LAFORIA - (F)
"A behavioural simulation model for the study of emergent social structures"

Antonio RIZZO, CNR - (I), Neil BURGESS,  University of Manchester - (UK)
"Action based neural network for adaptive control : the tank case"

John R. KOZA, Stanford University - (USA)
"Evolving emergent wall following robotic behavior using the genetic
programming paradigm"

Bruno GAS, Rene NATOWICZ, ESIEE - (F)
"A non-supervised continuous learning model of neural network for temporal
sequence recognition"

Eric DEDIEU,  Emmanuel MAZER, IMAG - (F)
"The SWALLOW modeler : an approach to sensory relevance"

Gilles VENTURINI, ESIEE - (F)
"Characterizing the adaptation abilities of a class of genetic base machine
learning algorithms"

Barbara WEBB, Tim SMITHERS, University of Edimburgh - (UK)
"The connection between AI and biology in the study of behaviour"

Ulrich NEHMZOW, Tim SMITHERS, University of Edimburgh - (UK)
"Using motor actions for location recognition"

Stephen TODD, Wiliam LATHAM, IBM - (UK)
"Artificial life or surreal art?"

R.C. PATON , H. S. NWANA, M. J. SHAVE, T. J. BENCH-CAPON, University of
Liverpool - (UK)
"Computing at the tissue/organ level (with particular reference to the liver)"

Pierre BESSIERE, IMAG - (F)
"Genetic Algorithms applied to formal neural networks : parallel genetic
implementation of a Boltzmann machine and associated robotic experimentations"

Karl SIMS, Thinking Machines Corp. - (USA)
"Interactive evolution of dynamical systems"

Nicolas MEULEAU, CEMAGREF - (F)
"Co-evolution and mimetism : a program simulating road traffic"

Christian NOTTOLA, Frederic LEROY, Banque de France - (F)
"Dynamics of artificial markets

M. SNAITH, 0.HOLLAND, TAG - (UK)
"Application of the temporal difference learning to the neural control of
quadrupede locomotion"

Simon GOOS, Jean-Louis DENEUBOURG, Free University of Brussels - (B)
"Harvesting by a group of robots"

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


Registration Form


Name : ......................     First name : .......................

Firm   :..............................................................
Address : ............................................................
......................................................................
Zip code : .............  City : .....................................
Country : ................................
Phone : ............ Fax : ...............
Email : ..................................

Invoice to be sent to : ................................


Registration fees               Before 20/11/91         After 20/11/91
________________________________________________________________________________
Students*			o FF  750		o FF  750
University Members		o FF 1500		o FF 1750
Others				o FF 2200		o FF 2500
________________________________________________________________________________
* Student status proof required

These fees include all refreshments and lunches.


Payment (in french francs only, foreign cheques accepted):

o Cheques  (to be sent to ECAL 91)
      please note that all charges, if any, must be at the participants'
      expense.

o Banker's draft to the order of ECAL:

Credit Lyonnais, bank account 30002 08948 0000079087X 55 Versailles    StLouis,
F-78000. PLease ask your bank to arrange the transfer at no cost for the
beneficiary. Bank charges, if any, will be at the participants' expense.


Travel

Please, send me

	o  Domestic railway discount ticket SNCF (20%)
	o  Domestic flight discount ticket  Air Inter (35%)


Cancellations

Refunds of 50 % will be made if a written request is received before November
30. No refunds will be made for cancellations received after this date. In case
of conference cancellation beyond its control, ECAL organizing committee limits
its liability to the registration fees already paid.


		     Date                                       Signature


Send this form to :

	ECAL 91
	17 allee Gabrielle d'Estrees
	F-75019 Paris
	FRANCE

Further information concerning registration :
	Fax : (+33) 1 40 96 60 80
	Voice : (+33) 1 40 96 61 79
	E-mail : ecal at cemagref.fr

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


General Information
___________________


Language

The conference will be conducted in English.

Accommodation

Hotel Forest Hill La Villette *** (5-minutes walk )
26 av. Corentin Cariou, Paris.
Tel : +33 1 44 72 15 30, fax: 33 1 44 72 15 80.
Single or double rooms: 480FF, special price for ECAL participants.

Hotel Arcade La Villette ** (5-minutes walk)
Tel : +33 1 40 38 04 04
Single: 390FF, double room: 420FF. Please reserve at least 30 days in advance.

Hotel Campanile Pantin **  (10-minutes walk)
Tel : +33 1 48 91 32 76
Single or double rooms: 335FF. Please reserve at least 45 days in advance.
Tel : +33 (1) 48 91 32 76

Reservation centers (other hotels):

Tel: 33 1 47 27 15 15 (500 to 700FF rooms).
Tel: 33 1 43 59 12 12. (Elysee 12 12).
Tel: 33 1 42 56 30 00, fax 33 1 42 89 42 97 (Paris Sejour Reservations)

Tourist information : 33 1 47 23 61 72

Cheaper accomodations are available at:
Centre de sejour Eugene Henaff
Tel 33 (1) 48 39 19 05


Entry visas
___________

For non European Community members, please check with the french consulate
whether you need a Visa.


Access to Paris cite des Sciences et de l'Industrie
___________________________________________________

La cite des Sciences et de l'Industrie is located in northeast Paris, at La
Villette Park, 30, avenue Corentin Cariou, 75019 Paris. It is 40 minutes from
Roissy and Orly airports. You can reach the Cite:

by car: Circular highway, Porte de la Villette exit. Parking available at quai
de la Charente and Boulevard Macdonald;
by metro: Line 7, Porte de la Villette station;
by bus: lines 150-152-250A-PC.

For information about the cite des Sciences, call 33 1 46 42 13 13
(round-the-clock), or by Minitel: 3615 code Villette.

From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu  Tue Oct 15 10:22:41 1991
From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu)
Date: Tue, 15 Oct 91 09:22:41 EST
Subject: Paper
Message-ID: <mailman.453.1149540230.24850.connectionists@cs.cmu.edu>

Following is the abstract of a paper accepted by IJCNN'91-SINGAPORE.
The main purpose of this paper was to attack the problems of slow rate of
convergence, local minima, and incapability of learning (under certain
preset criteria) etc problems associated with the original back-propagation
neural nets from an alternative viewpoint ---- topology ---- instead of the
learning algorithm and units responsive characteristics. It was shown in
this paper that the topology is a very important factor limiting the
performances of back-propagation neural networks besides the already studied
factors such as the learning algorithm and the units characteristics.

All comments are welcome.


        PPNN: A Faster Learning and Better Generalizing Neural Net


                                  Bo Xu
                           Indiana University

                               Liqing Zheng
                            Purdue University


  Abstract----It was pointed out in this paper that the planar topology of
current back-propagation neural network (BPNN) sets limits to solve the
slow convergence rate problem, local minima, and other problems associated
with BPNN.  The parallel probabilistic neural network (PPNN) using a new
neural network topology, stereotopology, was proposed to overcome these
problems.  The learning ability and the generalization ability of BPNN and
PPNN were compared for several problems.  The simulation results show
that PPNN was capable of learning any kinds of problems much faster than
BPNN and generalized better than BPNN too.  It was analyzed that the faster,
universal learnability of PPNN was due to the parallel characteristic of
PPNN's stereotopology, and the better generalization ability came from the
probabilistic characteristic of PPNN's memory retrieval rule.


Bo Xu
Indiana University
itgt500 at indycms.iupui.edu

From xiru at Think.COM  Tue Oct 15 11:35:55 1991
From: xiru at Think.COM (xiru Zhang)
Date: Tue, 15 Oct 91 11:35:55 EDT
Subject: batch-mode parallel implementations 
In-Reply-To: Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU's message of Tue, 15 Oct 91 01:23:47 -0400 <9110151441.AA10584@chaos.cs.brandeis.edu>
Message-ID: <9110151535.AA02757@yangtze.think.com>


From jcp at vaxserv.sarnoff.com  Wed Oct 16 12:03:09 1991
From: jcp at vaxserv.sarnoff.com (John Pearson W343 x2385)
Date: Wed, 16 Oct 91 12:03:09 EDT
Subject: batch-mode parallel implementations
Message-ID: <9110161603.AA09000@sarnoff.sarnoff.com>

Xiru Zhang stated:
>From the point of view of implementation, if a network is not large, there
>is not much you can parallelize if you do per-sample training.

Even in per-sample training one may be able to efficiently exploit a
parallel machine. Each processor simulates the same network but has a
different set of initial weights. The convergence time and performance
of a trained network can be very dependent on the initial weights.
I would appreciate being sent references that discuss this last statement.

John Pearson
David Sarnoff Research Center
CN5300
Princeton, NJ 08543
609-734-2385
jcp at as1.sarnoff.com

From gary at cs.UCSD.EDU  Wed Oct 16 13:05:51 1991
From: gary at cs.UCSD.EDU (Gary Cottrell)
Date: Wed, 16 Oct 91 10:05:51 PDT
Subject: batch-mode parallel implementations
Message-ID: <9110161705.AA27497@desi.ucsd.edu>

I tried implementing Elman's simple recurrent nets on an
Intel Hypercube using data parallelism (a copy of the net
at each node, each getting a part of the training set).

I found is was as fast as a bat out of h**l, but as many times
faster as it was, it was also as many times SLOWER at converging,
leading to a net gain of 0!

g.

PS I did not try conjugate gradient, or back propping more steps
in time, which probably would have helped convergence lots.

From orilex at crl.ucsd.edu  Wed Oct 16 15:33:44 1991
From: orilex at crl.ucsd.edu (Roy Higginson)
Date: Wed, 16 Oct 91 12:33:44 PDT
Subject: address for Sanger
Message-ID: <9110161933.AA21258@crl.ucsd.edu>


Can someone give me an e-mail address for Dennis Sanger AT&T Bell/Univ
of CO at Boulder?

Thanks, Higginson

From ajr at eng.cam.ac.uk  Wed Oct 16 17:48:31 1991
From: ajr at eng.cam.ac.uk (Tony Robinson)
Date: Wed, 16 Oct 91 17:48:31 BST
Subject: TR available: Phoneme recognition with recurrent networks
Message-ID: <16687.9110161648@dsl.eng.cam.ac.uk>

***Do not forward to other bboards***

I've recently completed a technical report on connectionist phoneme
recognition which I would like to make available to interested researchers.
It describes a series of changes which have been made to tidy up a previously
published system.  Copies of the technical report may be obtained courtesy of
Jordan Pollack by anonymous ftp from archive.cis.ohio-state.edu in the
directory /pub/neuroprose as file robinson-tr82.ps.Z.  If this option is not
available to you, or if you would like a reprint of the background article,
please send me email giving your full address.

Tony [Robinson]

Cambridge University Engineering Department, Trumpington Street, Cambridge, UK
------------------------------------------------------------------------------

	Several Improvements to a Recurrent Error Propagation Network
			  Phone Recognition System
				      
				Tony Robinson
			      ajr at eng.cam.ac.uk
			     CUED/F-INFENG/TR.82
			      30 September 1991

Recurrent Error Propagation Networks have been shown to give good performance
on the speaker independent phone recognition task in comparison with other
methods [Robinson and Fallside, Computer Speech and Language, July 1991].
This short report describes several recent improvements made to the existing
recogniser for the TIMIT database.

The improvements are: an addition to the preprocessor to represent voicing
information; use of histogram normalisation on the input channels of the
network; normalisation of the output channels to enforce unity sum; a change
in the cost function to give equal weighting to each target symbol; a change
in the representation of the outputs to reduce quantisation errors;
retraining on the complete TIMIT training set; and the better estimation of
HMM phone models.

Most of these changes decrease the number of arbitrary parameters used and
allow for the integration of the system with standard HMM techniques.  The
result of these changes is a decrease in the number of errors by about 16%
(from 36.5% to 30.7% when all 61 TIMIT phones are used and from 30.2% to
25.0% on a reduced 39 phone set).

From shams at maxwell.hrl.hac.com  Wed Oct 16 17:23:42 1991
From: shams at maxwell.hrl.hac.com (shams@maxwell.hrl.hac.com)
Date: Wed, 16 Oct 91 14:23:42 PDT
Subject: batch-mode parallel implementations
Message-ID: <9110162123.AA08260@maxwell.hrl.hac.com>

We have exploited the "epoch" training method for implementing back-prop on a 
2-D systolic array processor of Hughes [1,2].  There are two basic problems 
with this approach.  First, there are only a limited number of models that allow 
for epoch training (e.g. back-prop).  Second,  this type of parallelism is not 
useful during recall or classification cycle since there is only a single input 
pattern to be evaluated (unless the input data rate exceeds the processor 
throughput enabling the input data to be buffered for batch processing).  As the 
number of neurons used in real-world applications continue to increase,  there 
would be enough computation to keep all the processors busy without having to 
use epoch parallelism.
[1] S. Shams and K. W. Przytula, "Mapping of Neural Networks onto 
Programmable Parallel Machines," Proceedings of the Intern. Symp. on Circuits 
and Systems, New Orleans, LA, Vol. 4, pp. 2613-2617, 1990.

[2] S. Shams and K. W. Przytula. "Implementation of Multilayer Neural 
Networks on Parallel Programmable Digital Computers." In Parallel Algorithms 
and Architectures for DSP Applications. Ed. M. Bayoumi, Kluwer Academic 
Publishers, pp. 225-253, 1991.

Soheil Shams
Hughes Research Labs.

From karunani at CS.ColoState.EDU  Wed Oct 16 22:23:31 1991
From: karunani at CS.ColoState.EDU (n karunanithi)
Date: Wed, 16 Oct 91 20:23:31 MDT
Subject: HowtoScale
Message-ID: <9110170223.AA05027@zappa>

  Dear Connectionist,
    Some time back I posted the following problem in this news group and
    many people responded with suggestions and references. I thankful to
    all of them. I have summarized their responses and posting here to
    for other who might find it interesting. For completeness sake
    I have included my original posting as well.

******Issue raised:
Background:
-----------
   I have been using neural network models 
(both Feed-Forward Nets and Recurrent Nets) in a prediction
application and I am getting pretty good results. In fact
neural networks approach outperformed many well known analytic
models. Similar results have been reported by many researchers
in (chaotic) time series predictions. 

 Suppose that X is the independent variable and Y is the
dependent variable. Let (x(i),y(i)) represent a sequence 
of actual input/output values observed at 
time i = 0,1,2,..,t of a temporal process. Let further that both 
the input and the output variables are single dimensional variable and
can take on a sequence of +ve integers up to a maximum of 2000.
Once we train a network with the
history of the system up to time "t" we can use the network
to predict outputs y(t+h), h=1,..,n  for any future input x(t+h).
In my application I already have the complete sequence and
hence I know what is the maximum value for x and y.
Using these maximum I normalized both X and Y over a 0.1 to 0.9 range.
(Here I call such normalization as "scaled representation".)
Since I have the complete sequence it is possible for me to evaluate 
how good the networks' predictions are.

Now some basic issues: 
---------------------
1) How to represent these variables if we don't know in advance
what the maximum values are? 
 Scaled representation presupposes the existence of a maximum value.
 Some may suggest that a linear units can be used at the output layer
 to get rid of scaling. If so how do I represent the input variable?
 The standard sigmoidal unit(with temp = 1.0) gets saturated(or railed
 to 1.0) when the sum is >= 14. However one may suggest that changing 
 the output range of the sigmoidal can help to 
 get rid of saturation effect. Is it a correct approach?

2) In such prediction application, people (including me)
compare the predictive accuracy of neural networks with
that of parametric models(that are based on analytical reasons). 
But one main advantage with the parametric models is that
their parameters can be calculated using any of the following
parameter estimation techniques: least square,
maximum likelyhood, Bayesian, Genetic Algorithms or any other
method. These parameter estimation techniques do not require
any scaling, and hence there is no need for preguessing of the maximum values.
However with the scaled representation in neural networks one can
not proceed without making guesses about the maximum(or a future)
input and/or output. In many real life situations such guesses are
infeasible or dangerous. How do we address this situation?

____________________________________________________________________________
N.  KARUNANITHI              E-Mail: karunani at CS.ColoState.EDU
Computer Science Dept,       
Colorado State University,
Collins, CO 80523.           
____________________________________________________________________________


******Responses Received:

1)  Dr Huang at CMU
Date: Thu, 26 Sep 1991 11:40-EDT
From: Xuedong.Huang at SPEECH2.CS.CMU.EDU

I have several papers addressing the issues you raised. See for example:

[1] Huang, X : A Study on Speaker-Adaptive Speech Recognition" DARPA Speech
and Language Workshop, Feb , 1991, pp278-283
[2] Huang, X, K. Lee and A. Waibel: Connectionist speaker normlization and its
 applications to speech recognition", IEEE Workshop on NNSP,
 Princeton, Sept. 1991

X.D. Huang, PhD
Research Computer Scientist		
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 School of Computer Science		Tel:  (412) 268 2329  
 Carnegie Mellon University		Fax:  (412) 681 5739  
 Pittsburgh, PA 15213			Email: xdh at cs.cmu.edu 

=============================================================================

2) From Alexander at CUNY
Date: Thu, 26 Sep 91 14:45 EDT
From: TWOMBLY%JHUBARD.BITNET at CUNYVM.CUNY.EDU

In response to your question about scaling for sigmoidal units.....

  I ran into the same problem of not knowing the maximum value that my
input/output data would take at any particular time.  There were no  a priori
bounds that could be reasonably set, so the solution (in this case) was to get
rid of the sigmoidal activation function and replace it with one that did not
require any scaling.  The function I used was a clipped linear function - that
is, f(x) = 0. for x<0., and f(x) = x for x>0.  For my data this activation
function worked as well as the sigmoidal units (in some cases better) because
the hidden units never took advantage of the non-linearity in the upper range
of the sigmoid function.
  The only difficulty with this function is that it does not have a continuous
derivative at 0.  You can get around this problem by tacking on a 1/x type
function for x<0 that drops off very quickly.  This will provide a well
behaved, non-zero derivative for all parts of the activation function while
adding a negligable value to the output for x<0.  The actual function I
use is:

f(x) = x;                       x > 0.
f(x) = 1/(10**2 - x*10**4);     x < 0.

I hope this helps.

-Alexander
=============================================================================

3) Dr. Fahlman at CMU

Date: Thu, 26 Sep 91 22:20:14 -0400
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU

    1) How to represent these variables if we don't know in advance
    what the maximum values are? 
     Scaled representation presupposes the existence of a maximum value.
     Some may suggest that a linear units can be used at the output layer
     to get rid of scaling.

Right, I was about to suggest that.

     If so how do I represent the input variable?
     The standard sigmoidal unit(with temp = 1.0) gets saturated(or railed
     to 1.0) when the sum is >= 14. However one may suggest that changing 
     the output range of the sigmoidal can help to 
     get rid of saturation effect. Is it a correct approach?
    
For a non-recurrent network, the first layer of weights cand and usually
will scale the inputs for you.  You save some learning time and possible
traps if the inputs are in some reasonable range, but it really isn't
essential.  I'd advise adding a small constant (0.1 works well) to the
derivative of the sigmoid for all units so that you can recover if the unit
gets pinned to an extreme value.

I don't understand your second point, so I won't try to reply to it.

Scott Fahlman
Carnegie Mellon University

=============================================================================
4) Ian Fitchet at Birmingham University

Date: Fri, 27 Sep 91 03:43:40 +0100
From: Ian Fitchet <I.D.Fitchet at computer-science.birmingham.ac.uk>

 I'm no expert, but how about having two outputs: one is a control and
has a (mostly) fixed value; the other is the output y(i) which is
adjusted such that the one divided by the other gives the required
result.  Off the top of my head, have the control output 0.9 most of
the time, when the value of y(i) goes above unity have y(i) = 0.9 and
the control decrease, so that if the control equalled 0.45, say, then
the real value of the output would be 0.9/0.45 = 2.0 .

 Of course the question is then, how do I train the nextwork to set
the value of the control?  But I leave that as an exercise... :-)

Cheers,

	Ian

--
Ian Fitchet     				     I.D.Fitchet at cs.bham.ac.uk
School of Computer Science
Univ. of Birmingham, UK, B15 2TT
 "You run and you run to catch up with the sun, but it's sinking"  Pink Floyd

=============================================================================

5) From Dermot O'Brien at the University of Edinburgh 

Date: Fri, 27 Sep 91 10:32:31 WET DST
Sender: dob at castle.edinburgh.ac.uk

You may be interested in the following references (if you havn't read them
already):

@techreport{Lapedes:87,
   Author = "Alan S. Lapedes and Robert M. Farber",
   Title  = "Nonlinear signal processing using neural networks: prediction
   and system modelling",
   Institution = "Los Alamos National Laboratory", Year = 1987,
   Number = "LA-UR-87-2662"}

@incollection{Lapedes:88,
   Author = "Alan S. Lapedes and Robert M. Farber",
   Title  = "How Neural Nets Work",
   BookTitle = "Evolution, Learning, and Cognition", Pages = {331--346},
   Editor = "Y.C Lee", Year = 1988, Publisher = "World Scientific",
   Address = "Singapore"}

The above papers analyse the behaviour of feed-forward neural networks
applied to the problem of time series prediction, and make an
interesting analogy with Fourier decomposition.

Cheers,

Dermot O'Brien
Physics Department
University of Edinburgh
The King's Buildings
Mayfield Road
Edinburgh EH9 3JZ
Scotland
=============================================================================

6) From: Tony Robinson <ajr at eng.cam.ac.uk>
Date: Fri, 27 Sep 91 12:23:23 BST

My immediate advice is:

  Don't put the input through a nonlinearity at the start of the network.
  Use linear output units.
  Allow a linear path through the system so that if a linear solution to the
    problem is possible then this is a possible network solution.

Then you will have no problems with maximum values.

Tony [Robinson]
=============================================================================

End of summary.
____________________________________________________________________________
N.  KARUNANITHI              E-Mail: karunani at CS.ColoState.EDU
Computer Science Dept,       
Colorado State University,
Collins, CO 80523.           
____________________________________________________________________________

From thomasp at informatik.tu-muenchen.dbp.de  Mon Oct 14 05:17:00 1991
From: thomasp at informatik.tu-muenchen.dbp.de (Thomas)
Date: 14 Oct 91 10:17 +0100
Subject: report available
Message-ID: <91Oct14.101724met.34256(a)gshalle1.informatik.tu-muenchen.de>


From khosla at latcs1.lat.oz.au  Thu Oct 17 04:00:31 1991
From: khosla at latcs1.lat.oz.au (Rajiv Khosla)
Date: Thu, 17 Oct 91 18:00:31 +1000
Subject: Spatial crosstalk and modular NN architechture
Message-ID: <9110170800.AA00862@latcs1.lat.oz.au>


Dear Connectionists,

            Can anyone enlighten me on the following.

         I have to model a problem with 28 discrete inputs(1's and 0's) and
26 discrete outputs. Infact, these 26 discrete outputs can be represented by
5 normalized continous outputs also.

               Now, I have no problem modelling it as a 28-11-5  network using
Scott Fahlman's quickprop . However, I get into all sorts of problems when I
have to model 28-?-26 network(? stands for any no. of hid. units. I tried upto
104). Sometime back, I read a paper on modular NN architechtures which suggested
that because of spatial crosstalk  one should have dedicated or independent links between hidden units and each output unit. This would result in faster
training and better generalization. I tried this architechture by making suitable changes in the quickprop algorithm but to no avail. There is no improvement
over the standard architechture vis-a-vis training. Infact, things seemed to get
slightly worse. I tried with 2,3,4 sets(that is, in all 52,78,104 hid. units 
resp.) of hid. units per output unit. I gave up after about 5000 epochs as I
couldn't see any significant improvement in the total error. 

            Has anyone used the modular architechture in a similar situation
with large number of output  nodes with positive results? Am I doing something
wrong? Is there any other solution  except making the outputs continous and
reducing the number of output nodes?

               I have only recently started reading this group. So, Pl. excuse
the naiveity of the questions if any.

  Please e-mail  your replies to khosla at latcs1.lat.oz.au

                         Thanks in advance,
                                                    Rajiv

From neural!lamoon.neural!yann at att.att.com  Thu Oct 17 10:46:39 1991
From: neural!lamoon.neural!yann at att.att.com (neural!lamoon.neural!yann@att.att.com)
Date: Thu, 17 Oct 91 10:46:39 -0400
Subject: batch-mode parallel implementations 
Message-ID: <9110171446.AA19788@lamoon>


Several years ago, Steve Nowlan and I implemented a "batch-mode"
vectorized backprop on a Cray. Just as in Gary Cottrell's story, the
raw CUPS rate was high, but because batch mode converges so much slower
than on-line, the net gain was 0.

I think Patrick Haffner and Alex Waibel had a similar experience
with their implementations of TDNNs on the Alliant. 

Now, the larger, and more redundant the dataset is, the larger the difference
in convergence speed between on-line and batch.  
For small (and/or random) datasets, batch might be OK, but who cares.  
Also, if you need a very high accuracy solution (for function approximation
for example), a second-order batch technique will probably be better than
on-line.

Sadly, almost all speedup techniques for backprop only apply to batch (or
semi-batch) mode. That includes conjugate gradient, delta-bar-delta, most
Newton or Quasi-Newton methods (BFGS...), etc... 

I would love to see a clear demonstration that any of these methods beats a
carefully tuned on-line gradient on a large pattern classification problem.
I tried many of these methods several years ago, and failed.

I think there are two interesting challenges here:
1 - Explain theoretically why on-line is so much faster than batch
    (something that goes beyond the "multiple copies" argument).
2 - Find more speedup methods that work with on-line training.

  -- Yann  Le Cun

From kamil at apple.com  Thu Oct 17 12:59:21 1991
From: kamil at apple.com (Kamil A. Grajski)
Date: Thu, 17 Oct 91 09:59:21 -0700
Subject: batch & on-line training
Message-ID: <9110171659.AA23721@apple.com>

The consensus opinion seems to be that on-line learning is preferred
for situations consisting of a classification problem with a large
(possibly redundant) dataset.  What appears to have been a common
experience is that batch-mode training generates impressive MCUP
statistics, but convergence is slower enough that the net gain is 0.
It is difficult to make a scientific judgement still, mostly because
the evidence appears to be largely anecdotal, e.g., "I really tried
hard to make one (batch, or on-line) work, and it beat the other."

It has been observed that several algorithms for accelerating
convergence are designed for (semi-)batch mode.  Were these to be
seriously evaluated, would the net gain 0 still occur?  On the other
hand, with more work could on-line methods widen their apparent
superiority?

I don't think that we're splitting hairs by addressing this issue.
One trend in the implementations side of NNs is to have the highest
MCUPS performance.  In several instances, this is achieved using
mappings/architectures which rest on batch-mode training.  I think
that one might design a neurocomputer differently depending on which
training mode is to be used, e.g., the communication vs computation
curves are different.  So, at the moment, in certain instances, we've
actually put the cart before the horse.  We have fast batch implemen-
tations.  Do we make batch-mode training better, or can we make on-line
so fast and so optimally design a machine that the issue is moot?
(I'm ignoring the (possibly substantial) conflicting requirements
between training & recognition modes, here.)

In any event, it seems that folks are having success doing either
in different situations.  However, there doesn't seem to be a
compelling argument for preferring one or the other IN PRINCIPLE.

Cheers,
Kamil

From dlukas at park.bu.edu  Thu Oct 17 12:58:42 1991
From: dlukas at park.bu.edu (David Lukas)
Date: Thu, 17 Oct 91 12:58:42 -0400
Subject: Graduate study in Cognitive & Neural Systems at Boston University
Message-ID: <9110171658.AA15628@park.bu.edu>

(please post)

         ***********************************************
         *                                             *
         *                 DEPARTMENT OF               *
         *      COGNITIVE AND NEURAL SYSTEMS (CNS)     *
         *              AT BOSTON UNIVERSITY           *
         *                                             *
         ***********************************************

                    Stephen Grossberg, Chairman

The Boston University Department of Cognitive and Neural Systems
offers comprehensive advanced training in the neural and computational
principles, mechanisms, and architectures that underly human and
animal behavior, and the application of neural network architectures
to the solution of outstanding technological problems.

Applications for Fall, 1992 admissions and financial aid are now
being accepted for both the MA and PhD degree programs.

To obtain a brochure describing the CNS Program and a set of application
materials, write or telephone:

 Department of Cognitive & Neural Systems
 Boston University
 111 Cummington Street, Room 240
 Boston, MA 02215
 (617) 353-9481

or send a mailing address to: kellyd at cns.bu.edu

Applications for admission and financial aid should be received by
the Graduate School Admissions Office no later than January 15.

Applicants are required to submit undergraduate (and, if applicable,
graduate) transcripts, three letters of recommendation, and Graduate
Record Examination (GRE) scores. The Advanced Test should be in the
candidate's area of departmental specialization. GRE scores may be
waived for MA candidates and, in exceptional cases, for PhD candidates,
but absence of these scores may decrease an applicant's chances for
admission and financial aid.

Description of the CNS Department:

The Department of Cognitive and Neural Systems (CNS) provides advanced
training and research experience for graduate students interested in the neural 
and computational principles, mechanisms, and architectures that underlie human
and animal behavior, and the application of neural network architectures to the
solution of outstanding technological problems. Students are trained in a broad
range of areas concerning cognitive and neural systems, including vision and 
image processing; speech and language understanding; adaptive pattern
recognition; cognitive information processing; self-organization; associative
learning and long-term memory; cooperative and competitive network dynamics and
short-term memory; reinforcement, motivation, and attention; adaptive
sensory-motor control and robotics; and biological rhythms; as well as the
mathematical and computational methods needed to support advanced modeling
research and applications. The CNS Department awards MA, PhD, and BA/MA degrees.

The CNS Department embodies a number of unique features. It has
developed a core curriculum that  
consists of ten interdisciplinary graduate courses each of which 
integrates the psychological, neurobiological, mathematical, and computational 
information needed to theoretically investigate fundamental issues concerning 
mind and brain processes and the applications of neural networks to technology.
Additional advanced courses, including research seminars, are also offered.
Each course is typically taught once a week in the evening to make the program
available to qualified students, including working professionals, throughout
the Boston area. Students develop a coherent area of expertise by designing a 
program that includes courses in areas such as Biology, Computer Science, 
Engineering, Mathematics, and Psychology, in addition to courses in the CNS
core curriculum.

The CNS Department prepares students for thesis research with scientists 
in one of several Boston University research centers or groups, and with 
Boston-area scientists collaborating with these centers. The unit most closely 
linked to the department is the Center for Adaptive Systems. The Center for 
Adaptive Systems is also part of the Boston Consortium for Behavioral and 
Neural Studies, a Boston-area multi-institutional Congressional Center of 
Excellence. Another multi-institutional Congressional Center of Excellence 
focused at Boston University is the Center for the Study of Rhythmic 
Processes. Other research resources include distinguished research groups in
neurophysiology, neuroanatomy, and neuropharmacology at the Medical
School and the Charles River campus; in sensory robotics, biomedical
engineering, computer and systems engineering, and neuromuscular research 
within the Engineering School; in dynamical systems within the mathematics
department; in theoretical computer science within the Computer Science
Department; and in biophysics and computational physics within the Physics
Department. 

1991 FACULTY and STAFF of CNS and CAS:

Daniel H. Bullock         Nancy Kopell       
Gail A. Carpenter         John W.L. Merrill
Michael A. Cohen          Ennio Mingolla
H. Steven Colburn         Alan Peters
Paolo Gaudiano            Adam Reeves
Stephen Grossberg         James T. Todd
Thomas G. Kincaid         Allen Waxman


From MURTAGH at SCIVAX.STSCI.EDU  Thu Oct 17 15:29:25 1991
From: MURTAGH at SCIVAX.STSCI.EDU (MURTAGH@SCIVAX.STSCI.EDU)
Date: Thu, 17 Oct 1991 15:29:25 -0400 (EDT)
Subject: Workshop: Par. Prob. Solving: Applns. in Statistics & Economics
Message-ID: <911017152925.28c128fa@SCIVAX.STSCI.EDU>

Workshop Announcement and Call for Papers:

"Parallel Problem Solving From Nature: Applications in Statistics & Economics".
-------------------------------------------------------------------------------
Interdisciplinary Project Center for Supercomputing, ETH, Zurich, Switzerland.
December 10-11, 1991.

Support/Sponsorship: DOSES/Statistical Office of the European Communities;
IPS, ETH Zurich; Konjunkturforschungsstelle, ETH Zurich; MasPar Distributor
AG Zurich; PAR, Schweizerische Informatiker Gesellschaft; Parsytec GmbH,
Aachen; QT optec AG, Zug; Schweizerischer Bankverein, Basel, IBM Switzerland.

Program Committee: J. Frain (Central Bank of Ireland), K. Kirchmayr 
(Schweizerischer Bankverein, Basel), F. Murtagh (Munotec Systems, Munich and 
Dublin), P. Van Nypelseer (DOSES/EUROSTAT, Luxembourg), U. Reimer 
(Rentenanstalt Zuerich), M.M. Richter (DFKI Kaiserslautern), W. Roth 
(Konjunkturforschungsstelle ETH, Zurich), D. Wuertz (IPS, ETH Zurich), and
H.G. Zimmermann (Siemens, Munich).

Invited Speakers: J. Bernasconi (ABB Corp. Research, Baden), A. Colin (Citibank,
London), F. Fogelman-Soulie (MIMETICS, Chatenay Malabry), J. Frain (Central Bank
of Ireland), H. Horner (Universitaet Heidelberg), H. Muehlenbein (GMD, Sankt
Augustin, Bonn), F. Murtagh (Munotec Syst., Munich), M.B. Priestley (UMIST
Manchester), R. Rohwer (CSTR University of Edinburgh), C. Schaefer (Rowland 
Inst. of Science, Cambridge MA), P. Treleaven (University College London),
A. Varfis (Joint Research Center, Ispra), H.-M. Wallmeier (IBM Scientific
Center, Heidelberg), D. Weers (Aspen Intellect, Zug), A. Weigend (Stanford
University) D. Wuertz (IPS, ETH Zurich), H.G. Zimmermann (Siemens, Munich).

Registration: SFr 400 for those from profit-making companies; otherwise SFr 150.
A limited fund will be available to support younger participants who would not
otherwise be able to attend.  Late registration, after November 1, additional
SFr 50.  Remittance (only Swiss Francs) to: PASE-Workshop - Dr. Diethelm
Wuertz, Schweizerischer Bankverein, Zurich.  Acccount number: P0-206066.0.
Accommodation requests: directly to: Verkehrsverein Zurich (VVZ), Kongressbuero,
Postfach, CH-8023 Zurich, Switzerland (Tel: + 41 1 211-1256).

Contact Point:
Dr. Diethelm Wuertz, IPS ETH Zurich, ETH Zentrum, CLU B3, CH-8092 Zurich,
Switzerland.  Fax: + 41 1 252-0185. Email: wuertz at ips.ethz.ch
or the undersigned.

Abstract: 1 page, by November 1.

F.D. Murtagh
murtagh at scivax.stsci.edu

From dominic at DEBUSSY.CODA.CS.CMU.EDU  Thu Oct 17 16:21:08 1991
From: dominic at DEBUSSY.CODA.CS.CMU.EDU (Chioccioli)
Date: Thu, 17 Oct 91 14:21:08 -0600
Subject: No subject
Message-ID: <9110172021.AA24272@debussy.cs.colostate.edu>

This posting briefly describes my interest in parallel learning
algorithms for neural networks. 

Currently I am investigating the following two aspects of
parallel reinforcement learning algorithms for sequential decision
tasks:


1) Multiple nets on multiple task simulations.

   Our goal here is to combine multiple-simultaneous experiences
   to reduce the wall-clock time required to learn a task.


2) Multiple nets on single task simulation.

   This paradigm assumes that multiple simulations cannot be 
   run, however, parallel search of the (single) experience 
   space obtained from running a single simulation can be used 
   to reduce the total number of trials (i.e. simulated experiences)
   required for learning.

Several different algorithms will be attempted for both of the
above tasks.

I am interested in hearing from others who may also be doing research
in parallel learning algorithms for neural networks. 
Pointers to relevant publications or references will be most helpful.


thanks, in advance for any responses. 

I will post a summary of any references I receive provided
that this is not a repeated request and that sufficient 
response is forthcoming.

Regards, Steve Dominic

dominic at debussy.cs.colostate.edu
Colorado State University
Computer Science Dept.

From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu  Thu Oct 17 16:01:33 1991
From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu)
Date: Thu, 17 Oct 91 15:01:33 EST
Subject: Paper
Message-ID: <mailman.455.1149540231.24850.connectionists@cs.cmu.edu>

Two days ago I posted the abstract of our paper "PPNN: A Faster Learning
and Better Generalizing Neural Net".  Because the paper will appear in the
proceeding of IJCNN'91-SINGAPORE, I thought it would be not necessary to
place it in the neuroprose.  However, since the posting, I have received
large amounts of messages requesting a copy of the paper, and the request
is still going on.  Because I had no preparation for this, I was unable to
answer all of the messages in time.  Please excuse me for any possible delay
and errors in replying your requests.

Thanks to many colleagues suggestion, I am going to place the paper to
neuroprose archive.  I will provide the procedures for reaching it
at cheops of Ohio State when it is ready.  I will be happy to send hardcopy
to those having no access to FTP.

Bo Xu
Indiana University
itgt500 at indycms.iupui.edu

From khosla at latcs1.lat.oz.au  Thu Oct 17 20:49:38 1991
From: khosla at latcs1.lat.oz.au (khosla@latcs1.lat.oz.au)
Date: Fri, 18 Oct 91 10:49:38 +1000
Subject: Pl. Ignore
Message-ID: <9110180049.AA28265@latcs2.lat.oz.au>

This is a test

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Fri Oct 18 02:10:29 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Fri, 18 Oct 91 02:10:29 -0400
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Thu, 17 Oct 91 10:46:39 -0500.
             <9110171446.AA19788@lamoon> 
Message-ID: <mailman.456.1149540231.24850.connectionists@cs.cmu.edu>


    Yann LeCun writes:

    Now, the larger, and more redundant the dataset is, the larger the difference
    in convergence speed between on-line and batch.  
    For small (and/or random) datasets, batch might be OK, but who cares.  

I think that it may be misleading to lump together "large" and "redundant"
as if they were the same thing, or as if they were inseparable.  I agree
that for highly redundant datasets, continuous updating has an advantage.
I also agree that for small datasets, we don't care much about speed.  But
it seems to me that it is possible to have a large, not-very-redundant data
set, and that accelerated batch methods should have an advantage for these.

I guess you could measure redundancy by seeing if some subset of the
training data set produces essentially the same gradient vector as the full
set.  Probably statisticians have good ways of talking about this
redundancy business -- unfortunately, I don't know the right vocabulary.
In a data set with noise, you need a big enough training set to raise
relatively rare but real features above the level of the random background
noise.  If you have roughly that much data, I bet fast batch techniques
would win; if you have a training set that is several times this minimal
size, then continuous updating would win.  That's my suspicion, anyway.

    I would love to see a clear demonstration that any of these methods beats a
    carefully tuned on-line gradient on a large pattern classification problem.
    I tried many of these methods several years ago, and failed.
    
Well, if my hypothesis above is right, we could demonstrate this by finding
a dataset that is large enough to make you happy, but not highly redundant.
I guess that we could create this by taking any large dataset, measuring
its redundancy, and trimming it down to minimal size (assuming that the
result still can be classified as large).  Do you know of any big sets that
would qualify?  It should preferably a relatively "pure" N-input
data-classification problem, without all the additional issues (e.g.
translation invariance) that are present in image-processing and
speech-processing tasks.

    I think there are two interesting challenges here:
    1 - Explain theoretically why on-line is so much faster than batch
        (something that goes beyond the "multiple copies" argument).
    2 - Find more speedup methods that work with on-line training.
    
I have a hunch that if we work hard enough on speeding up online training,
we'll end up with something whose NET EFFECT is equivalent to the following:

1. Accumulate gradient data for a length of time that is adaptively chosen:
   Large enough for the gradients to be stable and accurate, but not large
   enough to be redundant.

2. Use something equivalent to one of the batch-processing acceleration
   techniques on this smoothed gradient.

That's not to say that the technique will necessary do this in an obvious
way -- it may be twiddling the weights each time a sample goes by -- but I
suspect this kind of accumulation, smoothing, and acceleration will be
present at some level.  As I said, for now this is just a hunch.

-- Scott Fahlman

P.S. I avoid using the term "on-line" for what I call "per-sample" or
"continuous" updating of weights.  For me, "online" means something else.
At this moment, I am sitting at my workstation watching one of my
batch-updating algorithms running "on-line" in front of me.

From smagt at fwi.uva.nl  Fri Oct 18 09:23:07 1991
From: smagt at fwi.uva.nl (Patrick van der Smagt)
Date: Fri, 18 Oct 91 14:23:07 +0100
Subject: Spatial crosstalk and modular NN architechture
Message-ID: <9110181323.AA28643@fwi.uva.nl>

>         I have to model a problem with 28 discrete inputs(1's and 0's) and
>26 discrete outputs. Infact, these 26 discrete outputs can be represented by
>5 normalized continous outputs also.

If one would want to model any kind of function, why go for the least
obvious solution via a neural network first?  Since your problem is
binary, too, I would first create a much simpler method such as 
k-nearest-neighbour or any bin approach which would enable one to
gain an understanding of the data and the overlap.  Ten years ago
this would have been a more standard approach, instead of using a
black box (aka neural network).

The reason that I would _not_ immediately grasp a network to do some
function-approximation is that I have seen too many people choke
on the fact that they do not understand their data, or the complexity
of the data, a reasonable ratio #degrees of freedom:#learning samples,
etc.

				Patrick van der Smagt

From xiru at Think.COM  Fri Oct 18 10:42:04 1991
From: xiru at Think.COM (xiru Zhang)
Date: Fri, 18 Oct 91 10:42:04 EDT
Subject: batch & on-line training
In-Reply-To: "Kamil A. Grajski"'s message of Thu, 17 Oct 91 09:59:21 -0700 <9110171659.AA23721@apple.com>
Message-ID: <9110181442.AA03133@yangtze.think.com>


   Date: Thu, 17 Oct 91 09:59:21 -0700
   From: "Kamil A. Grajski" <kamil at apple.com>

   The consensus opinion seems to be that on-line learning is preferred
   for situations consisting of a classification problem with a large
   (possibly redundant) dataset.  What appears to have been a common
   experience is that batch-mode training generates impressive MCUP
   statistics, but convergence is slower enough that the net gain is 0.
   It is difficult to make a scientific judgement still, mostly because
   the evidence appears to be largely anecdotal, e.g., "I really tried
   hard to make one (batch, or on-line) work, and it beat the other."

I have used per-epoch training on an auto-association netowrk, to extract
"features" of protein local structures, using as few hidden units as
possible. I spent a lot of time to fine-tune the training process, such as
using different learning rate at different stage of training, different
momentum term, different range of random weights at the beginning, how
large each "batch" is, etc. At the end I got a pretty good convergence
rate. (Maybe I did not spend enough effert to fine-tune the per-sample
training.) My feeling is that training a large network with lots of
examples is still an art. You can almost always improve it if you spend
time on it. Per-epoch training may have somewhat different behavior than
per-sample training. So different training schedule is often needed. And it
takes time to figure out what is a good one. It also critically depends on
the particular problem you want to solve. 

Besides the issue of convergence rate, I wonder if people have compared
networks trained by per-epoch schedule and per-sample schedule, to see if
they have the same level of generalization. One thing I noticed in my work
is that per-sample training tends to make certain weights much larger than in
per-epoch training. But I am not sure if this is true in general. 


- Xiru Zhang

From neural!lamoon.neural!yann at att.att.com  Fri Oct 18 11:08:03 1991
From: neural!lamoon.neural!yann at att.att.com (neural!lamoon.neural!yann@att.att.com)
Date: Fri, 18 Oct 91 11:08:03 -0400
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Fri, 18 Oct 91 02:10:29 -0400.
Message-ID: <9110181508.AA00547@lamoon>


   Scott Fahlman writes:

    >I avoid using the term "on-line" for what I call "per-sample" or
    >"continuous" updating of weights.

I personally prefer the phrase "stochastic gradient" to all of these.

   >I guess you could measure redundancy by seeing if some subset of the
   >training data set produces essentially the same gradient vector as the full
   >set.

Hmmm, I think any dataset for which you expect good generalization is redundant.
Train your net on 30% of the dataset, and measure how many of the remaining
70% you get right. If you get a significant portion of them right, then
accumulating gradients on these examples (without updating the weights) would
be little more than a waste of time.

This suggests the following (unverified) postulate:
 The better the generalization, the bigger the speed difference between
 on-line (per-sample, stochastic....) and batch.

In other words, any dataset interesting enough to be learned (as opposed to
stored) has to be redundant.
There might be no such thing as a large non-redundant dataset that is worth 
learning.

  -- Yann

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Fri Oct 18 12:38:38 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Fri, 18 Oct 91 12:38:38 -0400
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Fri, 18 Oct 91 11:08:03 -0500.
             <9110181508.AA00547@lamoon> 
Message-ID: <mailman.457.1149540231.24850.connectionists@cs.cmu.edu>


    Original-From: Yann le Cun <yann at lamoon.neural>
    I personally prefer the phrase "stochastic gradient" to all of these.
    
That's a fine term, but it seems to me that it refers to one of the effects
of per-sample updating, and not to the mechanism itself.  You might get a
"stochastic gradient" because you are updating after every randomly chosen
sample, but you might also get it from noise in the samples themselves.  So
if you want to refer to the choice of updating mechanism, and not to the
quality of the gradient, I think it's better to use a term like "per-sample
updating" that is nearly impossible for the reader to misunderstand.

       >I guess you could measure redundancy by seeing if some subset of the
       >training data set produces essentially the same gradient vector as the full
       >set.
    
    Hmmm, I think any dataset for which you expect good generalization is redundant.
    Train your net on 30% of the dataset, and measure how many of the remaining
    70% you get right. If you get a significant portion of them right, then
    accumulating gradients on these examples (without updating the weights) would
    be little more than a waste of time.
    
    This suggests the following (unverified) postulate:
     The better the generalization, the bigger the speed difference between
     on-line (per-sample, stochastic....) and batch.
    
    In other words, any dataset interesting enough to be learned (as opposed to
    stored) has to be redundant.
    There might be no such thing as a large non-redundant dataset that is worth 
    learning.
    
I think we may be talking about two different things here.  Let's assume
that there is some underlying distribution that we are trying to model, and
that we take some number of samples from this distribution to use as a
training set.  It is clearly true that there must be some "redundancy" in
the underlying distribution if it is to be worth modelling.  In this case,
I'm using the term "redundancy" to mean that there's some sort of regular
statistical structure that is stable enough to be of predictive value.  Put
another way, the distribution must not be totally random-looking; it has
less than the maximum possible information per sample.

However, given one of these redundant underlying distributions, we want to
choose a training set that is large enough to be representative of the
distribution (and to separate signal from noise), but not so large as to be
redundant itself.  This training set is what I was referring to in my
earlier message.  I think it is quite possible for the training set to be
large, not internally redundant, and interesting in the sense that it
models an predictable (redundant) underlying distribution.  And this is the
kind of case where I think that batch-updating has an advantage.

-- Scott Fahlman

From english at sun1.cs.ttu.edu  Fri Oct 18 14:20:19 1991
From: english at sun1.cs.ttu.edu (Tom English)
Date: Fri, 18 Oct 91 13:20:19 CDT
Subject: batch-mode parallel implementations
Message-ID: <9110181820.AA00593@sun1.cs.ttu.edu>

Scott Fahlman remarked,

> As for speed of convergence, continuous updating clearly beats per-epoch
> updating if the training set is highly redundant.

Another important factor is the autocorrelation of the training sequence.
Consider a (highly redundant) training sequence that starts with 1000
examples of A and ends with 1000 examples of B.  With continuous updating,
there is a good chance that learning the B examples will cause the learned
response to A examples to be lost.  The obvious answer, in this contrived
case, is to alternate presentations of A and B examples.

Now for an uncontrived case:  Suppose we are training a recurrent net for
speaker-independent speech recognition, and that inputs to the net are
power spectra extracted from the speech signal at fixed intervals.  There
are relatively long intervals in which the speech sound (spectrum) does
not change much.  There are even longer intervals in which the speaker
does not change.  Reordering the spectra for an utterance is clearly
not an option, and continuous updating seems imprudent even though the
redundancy of the training set is high.  I'm sure there are plenty of
nonstationary time series, other than speech, which present the same
problems.

In response to Scott's remark on the batch size used with an accelerated
convergence procedure,

> It must be sufficiently large to give a reasonably stable picture of
> the overall gradient, but not so large that the gradient is computed
> many times over before a weight-update cycle occurs.

I would like to mention a case where, surprisingly, even large batches
gave instability.  The application was recognition of handwritten
lower-case letters, and the network was of the LeCun variety.  The
training set comprised three batches of 1950 letter images (a total of
5850 images).  This partition was chosen randomly.  Fahlman's quickprop
behaved poorly, and with some close inspection I found a number of
weights for which the partial derivative was changing sign from one
batch to the next.  Further, the magnitudes of those partials were not
always small.  In short, the performance surfaces for the three batches
differed considerably.  The moral:  You may have to make a single batch
of the entire training set, even when working with fairly large training
sets.

-- Tom English
   english at sun1.cs.ttu.edu

From nowlan at helmholtz.sdsc.edu  Fri Oct 18 14:30:16 1991
From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan)
Date: Fri, 18 Oct 91 11:30:16 MST
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Thu, 17 Oct 91 10:46:39 -0400.
Message-ID: <9110181830.AA14145@bose>


A couple of clarifications with regards to Yann's post:

i) The dataset used in the comparison had a high degree of redundancy.

ii) The "batch-mode" back-prop was vanilla fixed-step gradient descent, not
    a second order method.

The issue of "batch" versus "on-line" is still a very open one. For relatively
small problems (for me < ~5000 cases) I prefer conjugate gradient because
of accuracy and no need to tune parameters. These techniques are also very
easy to parallelize over cases.

I have also implemented on a Cray a BP simulator that vectorized over
connections rather than cases, and could implement on-line or batch techniques
with ease. My experience here suggested that speed-ups could be obtained
when the network had as few as a few thousand connections.

		- Steve


From yoshua at psyche.mit.edu  Sat Oct 19 12:55:19 1991
From: yoshua at psyche.mit.edu (Yoshua Bengio)
Date: Sat, 19 Oct 91 12:55:19 EDT
Subject: online parallel implementation
Message-ID: <9110191655.AA12225@psyche.mit.edu>


This message concerns an attempt to apply some parallelism
to online back-propagation.

I had recently access to N = 20 to 40 NeXT workstations on which I could
perform learning experiments with back-propagation. My training database
was huge (TIMIT, more than half a million patterns, but
organized in sequences - sentences - of about 100 'frames' each),
so I did not want to use a batch-based method.

The idea I attempted to implement was the following:

Split the database into N copies.
Run N versions of the network on each of the N copies (on the N machines).
Share weights _asynchronously_ among the networks, after 1 or more sequence.

A 'server' program running on a separate machine received requests
from any of the other machines to collect its contribution
and return to it the current global moving average of the weights.

Since I was running backpropagation through time the weight
update was performed only after each sequence even in the
single machine implementation, hence the update was not
much less 'online' in the parallel implementation. 

Unfortunately, I don't have anymore access to these machines
- because I have moved to a new institution - and I didn't have
time to perform enough experiments and compare this approach
with others.

Yoshua Bengio
MIT

From honavar at iastate.edu  Sat Oct 19 13:30:33 1991
From: honavar at iastate.edu (honavar@iastate.edu)
Date: Sat, 19 Oct 91 12:30:33 CDT
Subject: redundancy (was Re: batch-mode implementations)
In-Reply-To: Your message of Fri, 18 Oct 91 11:08:03 -0400.
             <9110181508.AA00547@lamoon> 
Message-ID: <9110191730.AA07387@iastate.edu>


Scott Fahlman wrote: 

>>I guess you could measure redundancy by seeing if some subset of the
>>training data set produces essentially the same gradient vector as the full
>>set.

Yann Le Cun responded: 
> Hmmm, I think any dataset for which you expect good generalization is redunda
nt.
> Train your net on 30% of the dataset, and measure how many of the remaining
> 70% you get right. If you get a significant portion of them right, then
> accumulating gradients on these examples (without updating the weights) would
> be little more than a waste of time.

It is probably useful to distinguish between redundancy WITHIN the training set
and the redundancy BETWEEN the training and test sets (or, redundancy in
the combined training and test sets).  I suspect Scott Fahlman was  
refering to the redundancy (R1) within the training set while Le Cun 
was refering to the redundancy (R2) in the set formed by the union of
training set and test set (please correct me if I am wrong). I would
expect the relationship between generalization and R1 to be quite different
from the relationship between generaization and R2.  

Whether the two measures of redundancy will be the same or not will almost
certainly depend on the method(s) (e.g., sampling procedures, sample size 
reduction techniques) used to arrive at the data actually given to the
network during training. 
In fact, if a training set T (obtained say, by random sampling 
from some underlying distribution) were to be preprocessed in
some fashion (e.g., using statistical techniques) and reduced
training set T' was obtained from T after eliminating the "redundant" samples,
clearly the redundancy (R1') within the reduced training set T' will be much
smaller than the redundancy (R1) in the original training set T although the
overall redundancy (R2) in the set formed by the union of T and the test data
may be more or less equal to the redundancy (R2') in the set formed by the 
union of T' and the test data. My guess is that the generalization on the test 
data will be more or less the same irrespective of whether T or T' is used for
training the network.  

Vasant Honavar 
honavar at iastate.edu 


From nowlan at helmholtz.sdsc.edu  Sat Oct 19 15:05:24 1991
From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan)
Date: Sat, 19 Oct 91 12:05:24 MST
Subject: Paper Announcement (Neuroprose)
Message-ID: <9110191905.AA15742@bose>


    ** Paper available via Neuroprose ***************************************
    ** Please do not forward to other mailing lists or boards.  Thank you. **

The following paper has been placed in the Neuroprose
archives at Ohio State. The file is nowlan.soft-share.ps.Z
Ftp instructions follow the abstract.


      -----------------------------------------------------

                    Simplifying Neural Networks
                      by Soft Weight-Sharing

                       Steven J. Nowlan
                Computational Neuroscience Laboratory
                      The Salk Institute
                        P.O. Box 5800
                     San Diego, CA 92186-5800

                      Geoffrey E. Hinton
                 Department of Computer Science
                     University of Toronto
                    Toronto, Canada M5S 1A4


                            ABSTRACT:

One way of simplifying neural networks so they generalize better is to add an
extra term to the error function that will penalize complexity.  Simple
versions of this approach include penalizing the sum of the squares of the
weights or penalizing the number of non-zero weights.  We propose a more
complicated penalty term in which the distribution of weight values is
modelled as a mixture of multiple gaussians.  A set of weights is simple if
the weights have high probability densities under the mixture model.  This can
be achieved by clustering the weights into subsets with the weights in each
cluster having very similar values.  Since we do not know the appropriate
means or variances of the clusters in advance, we allow the parameters of the
mixture model to adapt at the same time as the network learns.  Simulations on
two different problems demonstrate that this complexity term is more effective
than previous complexity terms.

      -----------------------------------------------------
                        FTP INSTRUCTIONS

Either use "Getps nowlan.soft-share.ps.Z", or do the following:

     unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary
     ftp> get nowlan.soft-share.ps.Z
     ftp> quit
     unix> uncompress nowlan.soft-share.ps.Z
     unix> lpr -s nowlan.soft-share.ps (or however you print postscript)


Steven J. Nowlan
Computational Neuroscience Laboratory
The Salk Institute
P.O. Box 85800
San Diego, CA
92186-5800

Work Phone: 619-453-4100 X463
e-mail:  nowlan at helmholtz.sdsc.edu


From tgd at guard.berkeley.edu  Sat Oct 19 17:09:06 1991
From: tgd at guard.berkeley.edu (Tom Dietterich)
Date: Sat, 19 Oct 91 14:09:06 -0700
Subject: batch-mode parallel implementations
In-Reply-To: Tom English's message of Fri, 18 Oct 91 13:20:19 CDT <9110181820.AA00593@sun1.cs.ttu.edu>
Message-ID: <9110192109.AA04626@guard.berkeley.edu>

There has been a fair amount of work in decision-tree learning on the
issue of breaking large training sets into smaller batches.  In 1980,
Quinlan introduced a method called "windowing" in which a small sample
(or window) of the training data is initially drawn at random.  The
algorithm is trained on this window and then tested on the remainder of
the data (that was excluded from the window).  Then, some fraction of
the misclassified examples (possibly all of them) are added to the
window.

Generally speaking, in noise-free domains, windowing works quite well.
A very high-performing decision tree can be learned with a relatively
small window.  However, for noisy data, the general experience has
been that the window eventually grows to include the entire training set.
Jason Catlett (Sydney U) recently completed his dissertation on
testing windowing and various other related tricks on datasets of
roughly 100K examples (straight classification problems).  I recommend
his papers and thesis.

His main conclusion is that if you want high performance, you need to
look at all of the data.

--Tom

From ross at psych.psy.uq.oz.au  Sat Oct 19 19:50:16 1991
From: ross at psych.psy.uq.oz.au (Ross Gayler)
Date: Sun, 20 Oct 1991 09:50:16 +1000
Subject: batch & on-line training
Message-ID: <9110192350.AA02282@psych.psy.uq.oz.au>

On the topic of batch versus on-line training, Kamil at apple.com writes:

>  ... there doesn't seem to be a
> compelling argument for preferring one or the other IN PRINCIPLE.

I would like to turn the dichotomy into a trichotomy and argue that there
is an 'in principle' reason for a preference.

I want to add one-shot learning, which I define (on the spur of the moment)
to be successful learning from one occasion of exposure to the input.

This phenomenon is known to happen in animals (e.g. it can happen in taste
aversion conditioning) and can happen in humans (e.g. recognition of an
abstract painting seen only once before).  One-shot learning becomes critical
if you are trying to perform 'cognitive' tasks - when you learn the route to
a new office you don't need hundreds or thousands of exposures to get it
right.

Obviously, one-shot learning can't be expected to happen in all circumstances:
you have to be working in a constrained problem domain that can support it
and the learner has to have the background knowledge that will support what is
to be learned.  Most of the work that is done with backprop and its relatives
starts with near to a tabula rasa and all the time and effort goes into
creating the universe from only the input data.

Obviously, techniques do exist for one-shot learning: e.g. simple delta rule
with a learning rate of 1.  The problem is that they fail on the problems
that people regard as interesting - inputs non-orthogonal and hidden units
required.  The challenge is to find a one-shot learning algorithm that can
work on interesting problems.  I believe that this will require strong
architectural and problem data constraints.

I see the current heavy use of gradient-descent techniques as analogous to
the period in the history of AI when researchers looked for general problem
solving techniques that were universally applicable.  General techniques
worked on toy problems but rapidly bogged down on real problems.  In BP,
we have a technique for learning arbitrary mappings, and we pay for it with
excruciatingly slow learning.

To summarise: IF you want to perform cognitive tasks THEN 'in principle' one
shot learning is the only training regime that is acceptable (although slower
learning may be required to get the net to the point where it can learn in
one shot).  All you have to do is invent a good one-shot learning scheme :-).

Ross Gayler
ross at psych.psy.uq.oz.au

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Sun Oct 20 11:08:11 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Sun, 20 Oct 91 11:08:11 -0400
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Fri, 18 Oct 91 13:20:19 -0600.
             <9110181820.AA00593@sun1.cs.ttu.edu> 
Message-ID: <mailman.458.1149540231.24850.connectionists@cs.cmu.edu>


    I would like to mention a case where, surprisingly, even large batches
    gave instability.  The application was recognition of handwritten
    lower-case letters, and the network was of the LeCun variety.  The
    training set comprised three batches of 1950 letter images (a total of
    5850 images).  This partition was chosen randomly.  Fahlman's quickprop
    behaved poorly, and with some close inspection I found a number of
    weights for which the partial derivative was changing sign from one
    batch to the next.  Further, the magnitudes of those partials were not
    always small.  In short, the performance surfaces for the three batches
    differed considerably.  The moral:  You may have to make a single batch
    of the entire training set, even when working with fairly large training
    sets.
    
    -- Tom English

Note that it is OK to switch from one training set to another when using
Quickprop, but that every time you change the training set you *must* zero
out the prev-slopes and delta vectors.  This prevents to quadratic part of
the algorithm from trying to draw a parabola between two slopes that are
not closely related.  If you don't do this, that one step can badly mess up
the weights you've laboriously accumulated so far.  Of course, if you do
this after every sample, the quadratic acceleration never kicks in and you
end up with nothing more than plain old backprop without momentum.  If you
want to get any benefit from quickprop, you have to run each distinct
training set for at least a few cycles.

If you were aware of all that (it's unclear from your message) and still
experienced instability, then I would say that the batches, even though
they are fairly large, are not large enough to provide a fair
representation of the underlying distribution.

-- Scott

From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu  Sun Oct 20 19:55:51 1991
From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu)
Date: Sun, 20 Oct 91 18:55:51 EST
Subject: One-shot learning
Message-ID: <mailman.459.1149540231.24850.connectionists@cs.cmu.edu>

Ross Gayler wrote:

>To summarise: IF you want to perform cognitive tasks THEN 'in principle' one
>shot learning is the only training regime that is acceptable (although slower
>learning may be required to get the net to the point where it can learn in
>one shot).  All you have to do is invent a good one-shot learning scheme :-).

Although one-shot (-trial) learning may not be the only mode of learning in
our cognitive processes, it's true that the learning in our cognitive
processes will not take as many times (epochs) as current BPNN takes.
One-shot learning can be served as a goal and a criterion for learning schemes
in both cognitive learning processes as well as learning systems for practical
applications.

Our work on PPNN (I posted the abstract several days ago) was originally driven
by the one-trial learning.  Although PPNN has not reached one-trial learning,
it has stepped closer to it.

In order to contrast the topological effect, we constrained PPNN to be the
same as BPNN in all aspects except the topology. It was shown that the
stereotopology alone can increase the training times (epoches) by several
orders (due to the characteristics of PPNN's stereotopology, we used the
average training time instead of epochs to measure the rate of convergence).
It was found that the more difficult the problem is, the higher the order is.
This topological speedup lies in the fact that there is a cause of slowness
in the original planar topology of BPNN that cannot be accounted for by the
learning algorithm or units characteristics (no matter what learning algorithm
is used or what units responsive characteristics are employed, this cause of
slow learning always exists.  It is inherent to the planar topology of BPNN).


Bo Xu
Indiana University
itgt500 at indycms.iupui.edu

From mmoller at daimi.aau.dk  Mon Oct 21 08:13:06 1991
From: mmoller at daimi.aau.dk (mmoller@daimi.aau.dk)
Date: Mon, 21 Oct 91 13:13:06 +0100
Subject: Batch methods versus stochastic methods...
Message-ID: <9110211213.AA13826@sinope.daimi.aau.dk>


--- Concerning the discussion about batch-update versus stochastic
update.

The last about 6 month we have been working with online versus batch problems.
A preprint of a paper, which tries to describe why the stochastic methods in 
some instances are better than the deterministic batch mehods will soon be 
available via the neuroprose archive.  
The paper also introduces a new algorithm which combines the good properties
of the stochastic methods as well the batch methods.
Our results so far can be summarized as follows:

The redundancy of the trainingset plays as has been mentioned before
a very important role. It is not clear, however, how to define this
redundancy in a proper way. The usually definition of redundancy taken
from the information theory can give a hint about he redundancy but can 
not in any obvious way provide a precise defintion, because this would 
involve the information content of the trainingset as well as the 
internal dynamics (the structur) of the network. So when we discuss
the concept of redundancy we should be aware of that redundancy in the 
context of learning in feedforward networks is not very well defined.

Another very important issue which I think is even more important 
than the concept of redundancy is the structure of the error surface.
The "true" error surface which are given by the whole trainingset
is as we know often characterized by a large number of flat regions and 
very steep, narrow ravines. 
Batch methods operates in the true but very complex error
surfaces while stochastic methods operate in partial error surfaces which
are only approximations to the true error surface. So stochastic methods
makes a noisy, stochastic search in the true error surface which can
help them through the flat regions. One can think of the stochastic search
as a kind of "simulated annealing" approach where increase of error is also
allowed.

The algorithm we propose is based on a combination of the good
properties of stochastic and batch algorithms. The main idea is to
use a conjugate gradient algorithm on blocks of data (block-update or
semi-batch update). Because the conjugate gradient algorithms updates weights
with variable (and sometime large) stepsizes a validation scheme is used to
control the updates. Through a simple sample technique we estimate the 
probabillity that an update will decrease the total error. This probabillity
is then used to decide whether to update or not. 
The number of patterns needed in each block-update is a variable and 
controlled by an adaptive optimization scheme during training.

We have done some experiments with this approach on the nettalk problem. 
Our results so far shows that the approach decreases the error faster per
epoch than the stochastic backpropagation. More computation is however needed
per epoch. 
An interesting observation is that the number of blocks needed
to make an update is growing during learning so that after a certain
number of epochs the blocksize is equal to the number of patterns.
When this happens the algorithm is equal to a traditional batch-mode
algorithm and no validation is needed anymore.
In order to be able to draw some definite conclusions we need a few more 
experiments on different trainingsets.
Unfortunately, we do not have any datasets of the proper size.
So I would appreciate if anyone could inform me about where to find big 
datasets that are public available.

-- Martin M

-----------------------------------------------------------------------
Martin F. Moller	       	email: mmoller at daimi.aau.dk
Computer Science Department	phone: +45 86202711 5223
Aarhus University		fax:    +45 86135725
Ny Munkegade, Building 540
8000 Aarhus C
Denmark
----------------------------------------------------------------------


From giles at research.nec.com  Mon Oct 21 09:15:03 1991
From: giles at research.nec.com (Lee Giles)
Date: Mon, 21 Oct 91 09:15:03 EDT
Subject: Announcement of NIPS Workshop
Message-ID: <9110211315.AA19197@fuzzy.nec.com>


Announcement of NIPS Workshop:

**************************************************************************

            RECURRENT NETWORKS: THEORY AND APPLICATIONS

Recurrent neural networks have a very large potential for handling
dynamical / sequential problems, e.g. recognition and classification
of time-dependent signals like speech, modelling and control of
dynamical systems, learning of grammars and symbolic processing, etc.
However, the fulfillment of this potential remains one of the
important open issues in the neural network area. Training algorithms
are very inefficient in terms of memory demands, computational needs
or both. Little is known about convenient architectures for recurrent
networks. The number of known successful applications is very limited.

Even for static applications (operation in the "fixed point mode"),
recurrent networks are more general, and therefore more powerful, in
principle, than feedforward ones. However, once again, little is known
about their actual (dis)advantages, convenient architectures,
successful applications, etc.

We welcome proposals for presentations ( no more than one page in
length) related to the theme of theory or applications of recurrent networks. 
Subject to the number of received proposals, we envisage a two day workshop, 
one day theory, the next day applications, with 15-20 minute
presentations, each followed by about 10 minutes of discussion. 

Please send proposals to Lee Giles.

Organizers:

Professor Luis Borges de Almeida
INESC
Rua Alves Redol, 9
Apartado 10105
1017 LISBOA CODEX PORTUGAL
351-1-544607
inesc!lba at relay.EU.net (or) 
lba at sara.inesc.pt

C. Lee Giles
NEC Research Institute
4 Independence Way
Princeton, N.J. 08540
609-951-2642
FAX: 609-951-2482
giles at research.nj.nec.com

Richard Rohwer                            
Centre for Speech Technology Research      
Edinburgh University                     
80, South Bridge                                 
Edinburgh  EH1 1HN,   Scotland             
(44 or 0) (31) 650-2764             
FAX: (44 or 0) (31) 226-2730
rr%ed.cstr at nsfnet-relay.ac.uk (or) 
rr at uk.ac.ed.cstr
**************************************************************************


                                  C. Lee Giles
                                  NEC Research Institute
                                  4 Independence Way
                                  Princeton, NJ 08540
                                  USA

Internet:   giles at research.nj.nec.com
    UUCP:   princeton!nec!giles
   PHONE:   (609) 951-2642
     FAX:   (609) 951-2482


From DOW_ERNST at LILLY.COM  Mon Oct 21 10:16:00 1991
From: DOW_ERNST at LILLY.COM (Ernst Dow, 276-9916)
Date: Mon, 21 Oct 1991 09:16 EST
Subject: one-shot learning
Message-ID: <01GC03SM0RHC0000EE@GATEWAY.LILLY.COM>

Ross Gayler writes:

  I want to add one-shot learning, which I define (on the spur of the moment)
  to be successful learning from one occasion of exposure to the input.

  This phenomenon is known to happen in animals (e.g. it can happen in taste
  aversion conditioning) and can happen in humans (e.g. recognition of an
  abstract painting seen only once before). etc.

If it was a big enough event in your life, you will have memorized the
event. If it was not so monumental, you can help your memory by replaying
the event in your mind.

But in this case, we are talking memorization, not generalization. You may
be able to identify the painting you saw before, but could you make the
leap to recognizing all other abstract paintings?

Ernst Dow
ernst at lilly.com

From: DOW ERNST                     (MCVAX0::TC64566)


From mike at psych.ualberta.ca  Mon Oct 21 12:15:37 1991
From: mike at psych.ualberta.ca (Mike R. W. Dawson)
Date: Mon, 21 Oct 1991 10:15:37 -0600
Subject: Open position in cognitive psychology
Message-ID: <9110211613.AA01542@psych.ualberta.ca>

I'd like to bring the following open position in cognitive psychology to
the attention of anyone who might be modeling cognitive processes with
their networks:

=======================================================================

          Cognitive or Developmental Psychologists


The Department of Psychology, University of Alberta, invites
applications for one and, subject to budgetary considerations,
possibly two tenure track positions at the level of beginning 
Assistant Professor, salary range: $38,955-$55,755.  Candidates
with research expertise in either COGNITIVE PSYCHOLOGY or
DEVELOPMENTAL PSYCHOLOGY will be considered.  The position in
Cognitive is open with respect to area of specialization.  The
position in Developmental is also open with respect to area, but
there is some preference for individuals with interests in language
development, conceptual development, mathematical cognition, reading,
scientific reasoning, spelling, or writing.  Current Developmental
faculty conduct research on emergent literacy, reading, and
arithmetic skill.  Decisions will be made on the basis of demonstrated
research excellence, interactions with colleagues, and teaching
ability.  Applications should include a curriculum vita, three
letters of recommendation, and reprints or recent publications.
These materials should be sent, as appropriate, to Cognitive Search
Chair, Dr. Peter Dixon, or Developmental Search Chair, Dr. Jeffrey
Bisanz, Department of Psychology, University of Alberta, Edmonton,
Alberta, Canada T6G 2E9.  To receive full consideration, all
materials must be received by January 1, 1992.

The University of Alberta is committed to the principle of equity
in employment.  The University encourages applications from
aboriginal persons, disabled persons, members of visible
minorities and women.
========================================================================

Michael R. W. Dawson                       email: mike at psych.ualberta.ca
Department of Psychology
University of Alberta
Edmonton, Alberta                       Tel:  +1 403 492 5175
T6G 2E9, Canada                         Fax:  +1 403 492 1768

From bap at james.psych.yale.edu  Mon Oct 21 13:41:35 1991
From: bap at james.psych.yale.edu (Barak Pearlmutter)
Date: Mon, 21 Oct 91 13:41:35 -0400
Subject: Paper Announcement (Neuroprose)
In-Reply-To: "Steven J. Nowlan"'s message of Sat, 19 Oct 91 12:05:24 MST <9110191905.AA15742@bose>
Message-ID: <9110211741.AA03347@james.psych.yale.edu>

The following paper has not been placed in the Neuroprose archives at
Ohio State.  The file is not pearlmutter.soft-share.soft-share.ps.Z.
Ftp instructions follow the abstract.


      -----------------------------------------------------

		      Simplifying Neural Network
		     Soft Weight-Sharing Measures
				  by
			 Soft Weight-Measure
			 Soft Weight Sharing

			  Barak Pearlmutter
		       Department of Psychology
		      P.O. Box 11A Yale Station
		      New Haven, CT  06520-7447

                            ABSTRACT:

It has been shown by Nowlan and Hinton (1991) that it is advantagious
to construct weight complexity measures for use in weight
regularization through the use of EM, instead of relying on some
a-priori complexity measure, or even worse, neglecting regularization
by assuming a uniform distribution.  Their work can be regarded as a
generalization of the "Optimal Brain Damage" of Le Cunn et al (1990),
in which the distribution of weights is estimated with a histogram, a
peculiar functional form for a distribution.  Nowlan and Hinton assume
a much simpler functional form for the distribution, avoiding
overfitting and therefore overregularization.  However, they disregard
the issue of regularization of the regularizer itself.  Just as
certain weights might be considered a-priori quite unlikely, certain
distributions of weights may be considered a-priori quite unlikely.
To solve this problem, we introduce a regularization term on the
parameters of the weight distribution being estimated.  This
regularization term is itself determined by a distribution over these
distributional parameters.  In this light, Nowlan and Hinton (1991)
make the uniform distributional parameter distribution assumption.
Here, we estimate the distribution of distributions by running an
ensemble of networks, with EM used to estimate the weight distribtion
of each network (following Nowlan and Hinton), but we then use EM to
estimate the distribution of distributions across networks.  Of
course, each estimated distribution is used to regularize the
parameters over which that distribution is defined, leading to
regularization of the individual network regularizers.

We do not consider how to estimate the a-priori distribution which
might be used to regularize the distribution being used to regularize
the distribution being used to regularize the weights being estimated
from the data, which will be the explored in a future paper.

      -----------------------------------------------------
                        FTP INSTRUCTIONS

Either use "getps pearlmutter.soft-share.soft-share.ps.Z", or do the following:

     unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary
     ftp> get pearlmutter.soft-share.soft-share.ps.Z
     ftp> quit
     unix> uncompress pearlmutter.soft-share.soft-share.ps.Z
     unix> lpr -s pearlmutter.soft-share.soft-share.ps


Barak Pearlmutter
Department of Psychology
P.O. Box 11A Yale Station
New Haven, CT  06520-7447

Work Phone: 203 432-7011

From ANDERSON%BROWNCOG.BITNET at mitvma.mit.edu  Mon Oct 21 15:46:00 1991
From: ANDERSON%BROWNCOG.BITNET at mitvma.mit.edu (ANDERSON%BROWNCOG.BITNET@mitvma.mit.edu)
Date: Mon, 21 Oct 91 14:46 EST
Subject: Technical Report Announcement
Message-ID: <mailman.460.1149540231.24850.connectionists@cs.cmu.edu>


                     Technical Report 91-3 available from:

                Department of Cognitive and Linguistic Sciences

                Box 1978, Brown University, Providence, RI 02912


                        A Study in Numerical Perversity:

                    Teaching Arithmetic to a Neural Network


           James A. Anderson, Kathryn T. Spoehr, and David J. Bennett

                Department of Cognitive and Linguistic Sciences

                                    Box 1978

                                Brown University

                              Providence, RI 02912


                                    Abstract


             There are only a few hundred well-defined facts in

        elementary arithmetic, but humans find them hard to learn and

        hard to use.  One reason for this difficulty is that the

        structure of elementary arithmetic lends itself to severe

        associative interference.  If a neural network corresponds in

        any sense to brain-style computation, then we should expect

        similar difficulties teaching elementary arithmetic to a neural

        network.  We find this observation is correct for a simple

        network that was taught the multiplication tables.  We can

        enhance learning of arithmetic by forming a hybrid coding for

        the representation of number that contains a powerful analog or

        "sensory" component as well as a more abstract component.  When

        the simple network uses a hybrid representation, many of the

        effects seen in human arithmetic learning are reproduced,

        including overall error patterns and response time patterns for

        false products.  An extension of the arithmetic network is

        capable of being flexibly programmed to correctly answer

        questions involving terms such as "bigger" or "smaller."

        Problems can be answered correctly, even if the particular

        comparisons involved had not been learned previously.  Such a

        system is genuinely creative and flexible, though only in a

        limited domain.  It remains to be seen if the computational

        limitations of this approach are coincident with the limitations

        of human cognition.


             A version of this report will appear as a chapter in:

          "Neural Networks for Knowledge Representation and Inference"

               Edited by Daniel S. Levine and Manuel Aparicio, IV

                               To be published by

               Lawrence Erlbaum Associates, Hillsdale, New Jersey


             Copies can be obtained by sending an email message to:

                            LI700008 at brownvm.BITNET

                                     or to:

                            anderson at browncog.BITNET


From english at sun1.cs.ttu.edu  Mon Oct 21 17:12:09 1991
From: english at sun1.cs.ttu.edu (Tom English)
Date: Mon, 21 Oct 91 16:12:09 CDT
Subject: batch-mode parallel implementations
Message-ID: <9110212112.AA01265@sun1.cs.ttu.edu>

With regard to my earlier posting on problems I encountered in applying
Quickprop, Scott Fahlman has replied:

  Note that it is OK to switch from one training set to another when using
  Quickprop, but that every time you change the training set you *must* zero
  out the prev-slopes and delta vectors.

  If you want to get any benefit from quickprop, you have to run each
  distinct training set for at least a few cycles.

  If you were aware of all that (it's unclear from your message)....

Well, I was not aware of what others were doing in practice.  Scott's
original tech report on Quickprop gave results only for the case of
once-per-epoch weight updates.  I apologize for referring to my
implementation with once-per-batch weight updates and no zeroing 
between batches as "Fahlman's Quickprop."

What I *did* understand was that Quickprop's attempt to approximate
the error surface with a paraboloid was going to be fouled-up if the
"pictures" of the error surface gleaned from different batches were
substantially different.  Training for multiple iterations with
one batch, and then resetting the variables used in estimating the
shape of the error surface before going on to the next batch would
certainly eliminate the problem I described.

The prospect of choosing the number of iterations per batch does not
thrill me, however.  In general, I hate parameter tweaking.  From my
perspective, the worst thing about parameter tweaking is that we
don't really know how it affects the quality of the final network
obtained.  Also, exploring the effects of different parameter settings
takes too much of *my* time.  I want a procedure that does not require
tweaking and that runs at a reasonable fraction of the speed of a
"well-tuned" stochastic gradient descent procedure for a wide range of
problems.  (I haven't experimented with conjugate gradient descent yet,
but it seems to fit my bill.)

--Tom
  english at sun1.cs.ttu.edu


From giles at research.nec.com  Tue Oct 22 15:51:28 1991
From: giles at research.nec.com (Lee Giles)
Date: Tue, 22 Oct 91 15:51:28 EDT
Subject: Announcement of NIPS (Neural Information Processing Systems) Workshop
Message-ID: <9110221951.AA21064@fuzzy.nec.com>


Announcement of NIPS (Neural Information Processing Systems) Workshop:
Dec 6-7, Vail, Colorado.

**************************************************************************

            RECURRENT NETWORKS: THEORY AND APPLICATIONS

Recurrent neural networks have a very large potential for handling
dynamical / sequential problems, e.g. recognition and classification
of time-dependent signals like speech, modelling and control of
dynamical systems, learning of grammars and symbolic processing, etc.
However, the fulfillment of this potential remains one of the
important open issues in the neural network area. Training algorithms
are very inefficient in terms of memory demands, computational needs
or both. Little is known about convenient architectures for recurrent
networks. The number of known successful applications is very limited.

Even for static applications (operation in the "fixed point mode"),
recurrent networks are more general, and therefore more powerful, in
principle, than feedforward ones. However, once again, little is known
about their actual (dis)advantages, convenient architectures,
successful applications, etc.

We welcome proposals for presentations ( no more than one page in
length) related to the theme of theory or applications of recurrent networks. 
Subject to the number of received proposals, we envisage a two day workshop, 
one day theory, the next day applications, with 15-20 minute
presentations, each followed by about 10 minutes of discussion. 

Please send proposals to Lee Giles.

Organizers:

Professor Luis Borges de Almeida
INESC
Rua Alves Redol, 9
Apartado 10105
1017 LISBOA CODEX PORTUGAL
351-1-544607
inesc!lba at relay.EU.net (or) 
lba at sara.inesc.pt

C. Lee Giles
NEC Research Institute
4 Independence Way
Princeton, N.J. 08540
609-951-2642
FAX: 609-951-2482
giles at research.nj.nec.com

Richard Rohwer                            
Centre for Speech Technology Research      
Edinburgh University                     
80, South Bridge                                 
Edinburgh  EH1 1HN,   Scotland             
(44 or 0) (31) 650-2764             
FAX: (44 or 0) (31) 226-2730
rr%ed.cstr at nsfnet-relay.ac.uk (or) 
rr at uk.ac.ed.cstr
**************************************************************************

                                  
                                  C. Lee Giles
                                  NEC Research Institute
                                  4 Independence Way
                                  Princeton, NJ 08540
                                  USA

Internet:   giles at research.nj.nec.com
    UUCP:   princeton!nec!giles
   PHONE:   (609) 951-2642
     FAX:   (609) 951-2482


From thsspxw at iitmax.iit.edu  Tue Oct 22 19:10:57 1991
From: thsspxw at iitmax.iit.edu (Peter Wohl)
Date: Tue, 22 Oct 91 18:10:57 CDT
Subject: batch-mode parallel implementations 
In-Reply-To: <8431.688104706@B.GP.CS.CMU.EDU>; from "Connectionist_Research_Group@B.GP.CS.CMU.EDU" at Oct 22, 91 12:11 (midnight)
Message-ID: <9110222311.AA09935@iitmax.iit.edu>

Dear connectionists,
I have some comments on several of these, so I decided not to include
all the history of this discussion in my reply (you read it anyway).
So here I go:

1. Given per-sample training, one still faces the problem of how to deal
with really large networks (thousands of neurons and hundreds of thousands
connections) on a parallel machine that has far fewer processors.
What has been proposed: a) SIMD (don't cry for unused processors, as long
as you can communicate fast enough); b) MIMD with clustering neurons
somehow together, to increase granularity (SIMD also needs some), problem
here being dependence on VERY particular nets (usually layers with powers
of 2 neurons); c) re-writing the communication of the algorithm (see for
example my paper this coming Nov at ICTAI'91).

2. I agree that epoch-training is probably desirable. How large is a
"typical" epoch for a "large" net (thousands of neurons, 
fraction of million connections at least) ? Tens of vectors, hundreds ?
I would say, no more than few hundreds.

3. "Recall" (forward propagation with no weight update) is far easier to
parallelize, since there is no end-of-epoch bottleneck (barrier synch).
In some results (to be published next year), we achieved (on 32 BBN
Butterfly processors) almost 2 million connec-presen/sec with backprop.,
but over 5 million at recall. (2.5 million if you "adjust" forward-only
by dividing by two, to match the backprop figure more closely).

To summarize, I think the real problem of parallelizing ANNs applies when
at least one of net-size or training-epoch-size is large (and thus slow
when run sequentially). And don't forget: net architecture could change
during training (e.g. cascade corr), and still keep it parallel.

Thanks for your patience,
Peter Wohl
thsspxw at iitmax.iit.edu

From spotter at darwin.bio.uci.edu  Tue Oct 22 19:17:52 1991
From: spotter at darwin.bio.uci.edu (Steve Potter)
Date: Tue, 22 Oct 91 16:17:52 PDT
Subject: Continuous vs. Batch learning
Message-ID: <9110222317.AA22627@sanger.bio.uci.edu>


It is pretty clear to me that biological neural networks have all adapted
to prefer the continuous learning technique, as we can verify for humans
by remembering something that we only saw (or heard, etc.) once.  One-trial
learning paradigms abound in the behavioral literature.  I cant think of 
any biological examples of batch learning, in which sensory data are
saved until a certain number of them can be somehow averaged together
and conclusions made and remembered. Any ideas?  

Anyway, perhaps we should take an example from nature, which has been
optimizing things far longer than we have!

Steve Potter
UC Irvine Psychobiology dept.
Irvine, CA 92717

spotter at darwin.bio.uci.edu


From jbower at cns.caltech.edu  Wed Oct 23 00:47:51 1991
From: jbower at cns.caltech.edu (Jim Bower)
Date: Tue, 22 Oct 91 21:47:51 PDT
Subject: CNS*92
Message-ID: <9110230447.AA01301@cns.caltech.edu>


                       CALL FOR PAPERS                        

			First Annual
	     Computation and Neural Systems Meeting
			  CNS*92

	    Tuesday, July 26 through Sunday, July 31
                            1992

		    San Francisco, California

          This is the first annual meeting of an  inter-disciplinary  
          conference intended to address the broad range of research
	  approaches and issues involved in the general field of 
	  computational neuroscience.  The meeting itself has grown out
	  of a workshop on "The Analysis and Modeling of Neural Systems"
	  which has been held each of the last two years at the same 
	  site.  The strong response to these previous meetings has 
	  suggested that it is now time for an annual open meeting 
	  on computational approaches to understanding neurobiological 
	  systems.

	  CNS*92 is intended to bring together experimental and
	  theoretical neurobiologists along with engineers, computer
          scientists, cognitive scientists, physicists, and  mathematicians
          interested  in  understanding how neural systems compute.  
	  The meeting will equally emphasize experimental, model-based, 
	  and more abstract theoretical approaches to understanding 
	  neurobiological computation.
	
          The first day of the meeting (July 26) will be devoted to tutorial 
	  presentations and workshops focused on particular technical
	  issues confronting computational neurobiology.  The next three
	  days will include the main technical program consisting of
	  plenary, contributed and poster sessions.  There will be no
	  parallel sessions and the full text of presented papers will
	  be published.  Following the regular session, there will be 
	  two days of focused workshops at a site on the California coast 
	  (July 30-31). Participation in the workshops is restricted to 
	  75 attendees.
          
	   
          Technical Program:  Plenary, contributed and poster sessions will
          be  held.   There will be no parallel sessions.  The full text of
          presented  papers  will  be  published.   

	  Presentation categories:
		A. Theory and Analysis
		B. Modeling and Simulation
		C. Experimental
		D. Tools and Techniques 

	  Themes:
		A. Development
		B. Cell Biology
		C. Excitable Membranes and Synaptic Mechanisms
		D. Neurotransmitters, Modulators, Receptors
		E. Sensory Systems
			1. Somatosensory
			2. Visual
			3. Auditory
			4. Olfactory
			5. Other
		F. Motor Systems and Sensory Motor Integration
		G. Behavior
		H. Cognitive 
		I. Disease	


	  Submission  Procedures: Original  research  contributions  are  
	  solicited,  and  will  be carefully refereed.  Authors must submit 
	  six  copies  of  both  a 1000-word  (or less) summary and six copies 
	  of a separate singlepage  50-100  word  abstract  clearly   stating   
	  their   results postmarked  by January 7, 1992. Accepted abstracts 
	  will be published in the conference program.  Summaries are for  
	  program  committee use  only.   At the bottom of each abstract page 
	  and on the first summary page indicate preference for oral or poster
  	  presentation and  specify  at least one appropriate category and 
	  and theme.  Also indicate preparation if applicable.  Include
          addresses of all authors on the front  of  the  summary  and  the
          abstract  and  indicate  to which author correspondence should be
          addressed. Submissions will not be considered that lack  category
          information,  separate  abstract sheets, the required six copies,
          author addresses, or are late.
	

          Mail Submissions To:

	  Chris Ploegaert
	  CNS*92 Submissions 
	  Division of Biology
	  216-76
	  Caltech
	  Pasadena, CA. 91125


          Mail For Registration Material To:

	  Chris Ghinazzi
	  Lawrence Livermore National Laboratories
	  P.O. Box 808
	  Livermore CA.  94550


          All  submitting  authors  will  be  sent  registration   material
          automatically.   Program  committee decisions will be sent to the
          correspondence author only.

          CNS*92 Organizing Committee: 
		Program  Chair,  James M. Bower, Caltech.
	  	Publicity  Chair,  Frank Eeckman, Lawrence Livermore Labs.
	  	Finances, John Miller, UC Berkeley and 
			Nora Smiriga, Institute of Scientific Computing Res.	
		Local Arrangements, Ted Lewis, UC Berkeley and 
			Muriel Ross, NASA Ames.
	  
	  Program Committee:  
	  	William Bialek, NEC Research Institute. 
		James M. Bower, Caltech.
	  	Frank Eeckman, Lawrence Livermore Labs. 
		Scott Fraser, Caltech.
	  	Christof Koch, Caltech. 
	  	Ted Lewis, UC Berkeley. 
		Eve Marder, Brandeis.
		Bruce McNaughton, University of Arizona. 
		John Miller, UC Berkeley. 
		Idan Segev, Hebrew University, Jerusalem
		Shihab Shamma, University of Maryland.  
		Josef Skrzypek, UCLA.
                 

	      DEADLINE FOR SUMMARIES & ABSTRACTS IS January 7, 1992 
                                     

				please post


From palmer at world.std.com  Wed Oct 23 02:25:10 1991
From: palmer at world.std.com (Kent D Palmer)
Date: Wed, 23 Oct 91 02:25:10 -0400
Subject: THINKNET  NEWSLETTER  ANNOUNCEMENT
Message-ID: <9110230625.AA18459@world.std.com>

===========================START=OF=THINKNET=FILE============================
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|||||||            PLEASE POST ----- NEWSLETTER ANNOUNCEMENT         ||||||||
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

                             /|   .......  ..   ..    .     .        .      .
                           .==|........ ...  ..   ....   .      ....   ..    
._____.   . * .     .    / ===|_ _.    ..______________________________......
   |  |   | | |\    |  / ======== |\ ...|  .... |.THINKNET:An Electronic....
   |  |---| | |  \  |< ========== |. \ .|---- . |.Journal Of Philosophy,...
   |  |   | | |    \|  \ ======== |... \| ..... |.Meta-Theory, And Other..
   |  |   | | |     |    \ ====== |.... |____.. |.Thoughtful Discussions....
                           .==|  ........  ..     .... ..      ...     .. . 
                             \|    .... ...  ..  ..  . . ..    .          . 

-----------------------------------------------------------------------------
OCTOBER 1991                     ISSUE  001               VOLUME 1   NUMBER 1
-----------------------------------------------------------------------------

This is an announcement for Thinknet, an on-line magazine forum dedicated to 
thoughtfulness in the cybertime environment. Thinknet covers philosophy, 
systems theory, and meta-theoretical discussions within disciplines. It is your
interdisciplinary window on to what significant information sources are 
available to foster thought provoking discussion.


*CONTENTS*

Publication Data

     Scope of newsletter.
     Rationale for newsletter.
     Subscriptions and Submittals address.
     Bulletin Boards where it may be found.
     Services offered by newsletter.
     Staff of this edition.
     Coda: call for participation.

About Thinknet
    
     Discussion of goals of Thinknet Newsletter.

Prospect for Philosophy and Systems Theory in Cybertime

     Is there a possibility for a renaissance for philosophy?

The Philosophy Category on GEnie

      Review by Gordon Swobe with list of topics.

Philosophy on the WELL

      Review by Jeff Dooley with list of topics.

Origin Conference on the WELL

      Review by Bruce Schuman with list of topics

Internet Philosophy Mailing Lists

      A review of all know philosophy oriented mailing lists by Stephen Clark.

Books Of Note
   
      THE MATRIX

      !%@:: A DIRECTORY OF ELECTRONIC MAIL ADDRESSING & NETWORKS

Other Publications

      BOARDWATCH MAGAZINE

      SOFTWARE ENGINEERING FOUNDATIONS [a work in progress]
    
Books, Electronic Newsletters, and Cyber-Artifacts Received

      ARTCOM NEWSLETTER

      FACTSHEET FIVE

Protocols for Meaningful Discussions: ARTICLE by Kent Palmer

       A consideration of how philosophy discussions might be made more 
       useful and their history accessible by using a voluntary protocol.

Thoughtful Communications: EDITORIAL

    Closing remarks.


<<<<<<<<<<<<Thinknet Electronic Newsletter (c) 1991 Kent Palmer.>>>>>>>>>>>>>


-----------------------------------------------------------------------------
HOW TO GET YOUR COPY                                                      kdp
----------------------------------------------------------------------------- 

*Price*

The electronic form is FREE. 

Hardcopies cost money for reproduction, postage, and handling.

*Subscriptions*

Send an e-mail message to the following address:

            thinknet at world.std.com

Your message should be of the following form:

SEND THINKNET TO YourFullName AT YourEmailAddress

Some mailing lists do not include your return mailing address if you use the 
reply function of your mail reader so you must make sure your return e-mail 
address is in the body of your message.

Thinknet file is long, about 1113 lines; 7136 words; 51795 bytes.

You will be added to the thinknet subscription list. You will get all further
issues unless you unsubscribe.

*Bulletin Boards*

Thinknet will be posted in the WELL philosophy conference in a topic.

The WELL 
27 Gate Five Road, Sausalito, CA 94965 
modem 415-332-6106
voice 415-332-4335

Also on GEnie in the Philosophy category under the Religion and
Ethics Bulletin Board. 

GEnie Client Services 1-800-638-9636


*PHILOS-L Listserver*

You will eventually be able to get the thinknet newsletter from a listserver.

Send the message 'GET THINKNET DOC' to 'LISTSERV at LIVERPOOL.AC.UK'.

If you get an error message try the regular thinknet address.

*Or if all else fails*

            THINKNET
            PO BOX 8383
            ORANGE CA 92664-8383
            UNITED STATES


==============================END=THINKNET=FILE=============================

From ross at psych.psy.uq.oz.au  Wed Oct 23 04:23:43 1991
From: ross at psych.psy.uq.oz.au (Ross Gayler)
Date: Wed, 23 Oct 1991 18:23:43 +1000
Subject: one-shot learning
Message-ID: <9110230823.AA28466@psych.psy.uq.oz.au>

Ernst Dow (ernst at lilly.com) writes (in the context of one-shot or one-trial
learning):

>But in this case, we are talking memorization, not generalization. You may
>be able to identify the painting you saw before, but could you make the
>leap to recognizing all other abstract paintings?

My interest is in analogical retrieval and not one-trial learning (except to
the extent that it is necessary for 'truly cognitive' capabilities).  The
literature on analogy stresses the role that goals play in determining the
apparent similarity (and hence generalisation) of entities.  That is, in
analogy the generalisation pattern emerges at recall time rather than being
completely determined at storage time.  For such a (post-hoc) generaliser
it makes sense to attempt to memorise everything.  This contrasts with the
approach of most BP work where the system learns an internal representation
(read that as set of hidden units and weights) that supports a particular
pre-specified pattern of generalisation.

I realise that there is more to life than analogical recall and some
generalisation is based on literal similarity etc, but I am just stating the
extreme position for simplicity.

Ross Gayler
ross at psych.psy.uq.oz.au

From pluto at cs.UCSD.EDU  Mon Oct 21 19:29:59 1991
From: pluto at cs.UCSD.EDU (Mark Plutowksi)
Date: Mon, 21 Oct 91 16:29:59 PDT
Subject: Redundancy 
Message-ID: <9110212329.AA12326@tournesol.ucsd.edu>

Scott Fahlman writes:

::	
::	I guess you could measure redundancy by seeing if some subset of the
::	training data set produces essentially the same gradient vector as the full
::	set.  Probably statisticians have good ways of talking about this
::	redundancy business -- unfortunately, I don't know the right vocabulary.


Indeed they do; however, they begin from a more general perspective:
for a particular "n", where "n" is the number of exemplars we are going to
train on, call a set of "n" exemplars optimal if better generalization can 
not be obtained by training on any other set of "n" exemplars.
This criterion is called "Integrated Mean Squared Error."  
See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. 

Using appropriate approximations, we can use this to obtain what you suggest.  
Results for the case of clean data are currently available in
Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD
CSE department (see [Plutowski & White, 1991].)  Basically, given a set of
candidate training examples, we select a subset which if trained upon
give a gradient highly correlated with the gradient obtained by
training upon the entire set.  This results in a concise set of exemplars 
representative (in a precise sense) of the entire set.  
Preliminary empirical results indicate that the end result is what we
originally desired: training upon this well chosen subset results in 
generalization close to that obtained by training upon the entire set.  


Tom Dietterich writes:

::	
::	There has been a fair amount of work in decision-tree learning on the
::	issue of breaking large training sets into smaller batches.  In 1980,
::	Quinlan introduced a method called "windowing" in which a small sample
::	(or window) of the training data is initially drawn at random.  The
::	algorithm is trained on this window and then tested on the remainder of
::	the data (that was excluded from the window).  Then, some fraction of
::	the misclassified examples (possibly all of them) are added to the
::	window.
::	
::	Generally speaking, in noise-free domains, windowing works quite well.
::	A very high-performing decision tree can be learned with a relatively
::	small window.  However, for noisy data, the general experience has
::	been that the window eventually grows to include the entire training set.
::	Jason Catlett (Sydney U) recently completed his dissertation on
::	testing windowing and various other related tricks on datasets of
::	roughly 100K examples (straight classification problems).  I recommend
::	his papers and thesis.
::	
::	His main conclusion is that if you want high performance, you need to
::	look at all of the data.
	

Could you provide a reference to the work demonstrating the performance 
of windowing on clean data?  And could you provide an e-mail address for
Jason Catlett?   I am in the process of setting up benchmarking experiments
for the technique I mentioned above.   Although I consider the more general
task of fitting arbitrary functional mappings, these works seem relevant.

Thanks,

=================
== Mark Plutowski
Computer Science and Engineering 0114
University of California, San Diego
La Jolla,  CA


-----------
REFERENCES:
-----------

Box,G., and N.Draper. 1987.
	{\bf Empirical Model-Building and Response Surfaces.}
	Wiley, New York. 

Khuri, A.I., and J.A.Cornell. 1987.
	{\bf Response Surfaces (Designs and Analyses)}.
	Marcel Dekker, Inc., New York. 

Myers, Raymond H., and A.I. Khuri, W.H. Carter, Jr. 1989. 
	``Response Surface Methodology: 1966-1988.'' 
	{\em Technometrics}. vol.31, no.2.

Plutowski, Mark E., and Halbert White. 1991.
	``Active selection of training examples for network learning 
	in noiseless environments.''  
	Technical Report No. CS91-180, 
	Department of Computer Science and Engineering, 
	The University of California, San Diego. 92093-0114.
	Accepted pending revision by IEEE Transactions on Neural Networks.


	---- Here are some other related works: --------


Cohn, David, Les Atlas, and Richard Ladner. 1990.
	``Training connectionist networks with queries and selective sampling.''
	{\em Advances in Neural Information Processing Systems 2,}
	Proc. of the Neural Information Processing Systems Conference.
	Morgan Kaufmann, San Mateo, California.
	
Hwang, Jenq-Neng, J.J. Choi, Seho Oh, and Robert J. Marks III. 1990.
	``Query learning based on boundary search and gradient 
	computation of trained multilayer perceptrons. '' 
	{\em Proc. IJCNN 1990, San Diego. The 
	International Joint Conference on Neural Networks.}  IEEE press.  


From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Mon Oct 21 21:27:08 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Mon, 21 Oct 91 21:27:08 -0400
Subject: Redundancy 
In-Reply-To: Your message of Mon, 21 Oct 91 16:29:59 -0800.
             <9110212329.AA12326@tournesol.ucsd.edu> 
Message-ID: <mailman.461.1149540231.24850.connectionists@cs.cmu.edu>


    ::	I guess you could measure redundancy by seeing if some subset of the
    ::	training data set produces essentially the same gradient vector as the full
    ::	set.  Probably statisticians have good ways of talking about this
    ::	redundancy business -- unfortunately, I don't know the right vocabulary.
    
    Indeed they do; however, they begin from a more general perspective:
    for a particular "n", where "n" is the number of exemplars we are going to
    train on, call a set of "n" exemplars optimal if better generalization can 
    not be obtained by training on any other set of "n" exemplars.
    This criterion is called "Integrated Mean Squared Error."  
    See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. 
    
    Using appropriate approximations, we can use this to obtain what you suggest.  
    Results for the case of clean data are currently available in
    Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD
    CSE department (see [Plutowski & White, 1991].)  Basically, given a set of
    candidate training examples, we select a subset which if trained upon
    give a gradient highly correlated with the gradient obtained by
    training upon the entire set.  This results in a concise set of exemplars 
    representative (in a precise sense) of the entire set.  
    Preliminary empirical results indicate that the end result is what we
    originally desired: training upon this well chosen subset results in 
    generalization close to that obtained by training upon the entire set.  
    
Thanks for the references.  This is a useful beginning, but doesn't seem to
address the problem we were discussing.  In many real-world problems, the
following constraints hold:

1. We do not have direct access to "the entire set".  In fact, this set may
well be infinite.  All we can do is collect some number of samples, and
there is usually a cost for obtaining each sample.

2. Rather than hand-crafting a training set by choosing all its elements,
we want to choose an appropriate "n" and then pick "n" samples at random
from the set we are trying to model.  Of course, if collecting samples is
cheap and network training is expensive, you might throw some samples away
and not use them in the training set.  I don't *think* that this would ever
improve generalization, but it might lead to faster training without
hurting generalization.

3. The data may not be "clean".  The structure we are trying to model may
be masked by a lot of random noise.

Do you know of any work on how to pick an optimal "n" under these
conditions?  I would guess that this sort of problem is already
well-studied in statistics; if not, it seems like a good research topic for
someone with the proper background.

-- Scott Fahlman

From pluto at cs.UCSD.EDU  Mon Oct 21 21:54:29 1991
From: pluto at cs.UCSD.EDU (Mark Plutowksi)
Date: Mon, 21 Oct 91 18:54:29 PDT
Subject: Redundancy
Message-ID: <9110220154.AA12390@tournesol.ucsd.edu>


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
..in response to your message, included here:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

::	To: Mark Plutowksi <pluto at cs.UCSD.EDU>
::	Cc: connectionists at CS.CMU.EDU
::	Subject: Re: Redundancy 
::	In-Reply-To: Your message of Mon, 21 Oct 91 16:29:59 -0800.
::	             <9110212329.AA12326 at tournesol.ucsd.edu> 
::	Date: Mon, 21 Oct 91 21:27:08 -0400
::	From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU
::	
::	    ::	I guess you could measure redundancy by seeing if some subset of the
::	    ::	training data set produces essentially the same gradient vector as the full
::	    ::	set.  Probably statisticians have good ways of talking about this
::	    ::	redundancy business -- unfortunately, I don't know the right vocabulary.
::	    
::	    Indeed they do; however, they begin from a more general perspective:
::	    for a particular "n", where "n" is the number of exemplars we are going to
::	    train on, call a set of "n" exemplars optimal if better generalization can 
::	    not be obtained by training on any other set of "n" exemplars.
::	    This criterion is called "Integrated Mean Squared Error."  
::	    See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. 
::	    
::	    Using appropriate approximations, we can use this to obtain what you suggest.  
::	    Results for the case of clean data are currently available in
::	    Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD
::	    CSE department (see [Plutowski & White, 1991].)  Basically, given a set of
::	    candidate training examples, we select a subset which if trained upon
::	    give a gradient highly correlated with the gradient obtained by
::	    training upon the entire set.  This results in a concise set of exemplars 
::	    representative (in a precise sense) of the entire set.  
::	    Preliminary empirical results indicate that the end result is what we
::	    originally desired: training upon this well chosen subset results in 
::	    generalization close to that obtained by training upon the entire set.  
::	    
::	Thanks for the references.  This is a useful beginning, but doesn't seem to
::	address the problem we were discussing.  In many real-world problems, the
::	following constraints hold:
::	
::	1. We do not have direct access to "the entire set".  In fact, this set may
::	well be infinite.  All we can do is collect some number of samples, and
::	there is usually a cost for obtaining each sample.
::	
::	2. Rather than hand-crafting a training set by choosing all its elements,
::	we want to choose an appropriate "n" and then pick "n" samples at random
::	from the set we are trying to model.  Of course, if collecting samples is
::	cheap and network training is expensive, you might throw some samples away
::	and not use them in the training set.  I don't *think* that this would ever
::	improve generalization, but it might lead to faster training without
::	hurting generalization.
::	
::	3. The data may not be "clean".  The structure we are trying to model may
::	be masked by a lot of random noise.
::	
::	Do you know of any work on how to pick an optimal "n" under these
::	conditions?  I would guess that this sort of problem is already
::	well-studied in statistics; if not, it seems like a good research topic for
::	someone with the proper background.
::	
::	-- Scott Fahlman
::	

I don't know of a feasible way of choosing such an "n".  
Instead, I obtain a greedy approximation to it.
What we do (as reported in the tech report by Plutowski & White) 
is sequentially grow the training set, first finding
an "optimal" training set of size 1, then fitting the network to this
training set, appending the training set with a new exemplar selected from
the set of available candidates, obtaining a training set of size 2 which
is "approximately optimal",  fitting this set,  appending a third exemplar, etc,
continuing the process until the network fit obtained by training over the
exemplars fits the rest of the available examples within the desired tolerance.

I have no idea as
to how close the resulting training sets are to being truly IMSE-optimal.
But, they are much more concise than the original set - and so far, 
at least on the  toy problems I have tried so far,
it has resulted in a computational benefit, apparently because training on the
smaller set of exemplars provides an informative gradient at much lower 
cost than is required to obtain a gradient over all of the available examples.
The more the redundancy in the data, the more the computational benefit.

Of course, more extensive testing is required (and in progress.)

= Mark Plutowski


From 72247.2225 at CompuServe.COM  Mon Oct 21 23:05:00 1991
From: 72247.2225 at CompuServe.COM (Larry Fast)
Date: 21 Oct 91 23:05:00 EDT
Subject: Backprop Feedback Gain
Message-ID: <911022030500_72247.2225_EHL25-1@CompuServe.COM>

I'm expanding the PDP Backprop program (McClelland&Rumlhart version 1.1) to
compensate for the following problem:
 
As Backprop passes the error back thru multiple layers, the gradient has
a built in tendency to decay.  At the output the maximum slope of 
the 1/( 1 + e(-sum)) activation function is 0.5.
Each successive layer multiplies this slope by a maximum of 0.5.
The maximum gains at various layers (where n is the output layer) is:
max slope at layer n   = 0.5
max slope at layer n-2 = 0.125
max slope at layer n-3 = 0.0625
max slope at layer n-4 = 0.03125 ....
 
It has been suggested (by a couple of sources) that an attempt should be
made to have each layer learn at the same rate.  To this end, I'm installing
a gain factor on error being backpropagated.
 
The new error function is:  errorPropGain * act * (1 - act)
The nominal value that makes sense is 2 (or more).  This would allow at least
the maximum learning rate to propagate unattenuated.
 
Has anyone else tried this, or any other method of flattening out the learning
rate in deep layers.  Any info regarding more recent releases of PDP or
a users' group would also be helpful.
 
Please respond directly to  72247.2225 at compuserve.com
 
	Thanks, Larry Fast


From max.coltheart at mrc-apu.cam.ac.uk  Mon Oct 21 23:04:38 1991
From: max.coltheart at mrc-apu.cam.ac.uk (max.coltheart@mrc-apu.cam.ac.uk)
Date: Tue, 22 Oct 1991 11:04:38 +0800
Subject: redundancy and generalization
Message-ID: <18650.9110221006@sirius.mrc-apu.cam.ac.uk>

Consider the eight words PAT PAD CAT CAD POT POD COT COD. Give a net the
task
of translating these from letters to phonemes. Choose any subset of, say,
four
items as the training set and after training to asymptote test performance
on
the other four. Even with a training set that contains all the information
needed for the test set (e.g. PAT POD CAT COD exemplifies every
letter-phoneme
pairing twice), the various architectures we have been trying score 0% on
the
generalization set (in this example, the net learns nothing about the third
letter so in the generalisation test translates PAD as "pat", POT as "pod",
COT as "cod" and CAD as "cat". Is this problem, trivial for rule-learning
algorithms, insoluble for any system that learns by error-correction?
 
Tom Dietterich writes:
 
>Generally speaking, in noise-free domains, windowing works quite well.
>A very high-performing decision tree can be learned with a relatively
>small window.  However, for noisy data, the general experience has
>been that the window eventually grows to include the entire training set.
>Jason Catlett (Sydney U) recently completed his dissertation on
>testing windowing and various other related tricks on datasets of
>roughly 100K examples (straight classification problems).  I recommend
>his papers and thesis.
>
>His main conclusion is that if you want high performance, you need to
>look at all of the data.
"The window eventually grows to include the entire training set" = "the
system is incapable of generalizing accurately ". Note that noise isn't the
problem. In
my example, there's no noise, and no generalization
 
Max Coltheart
max.coltheart at mrc-apu.cam.ac.uk
 

From ahg at eng.cam.ac.uk  Tue Oct 22 05:20:21 1991
From: ahg at eng.cam.ac.uk (A.H. Gee)
Date: Tue, 22 Oct 91 10:20:21 +0100
Subject: No subject
Message-ID: <22398.9110220920@tw700.eng.cam.ac.uk>

************** PLEASE DO NOT FORWARD TO OTHER NEWSGOUPS ****************

The following technical report has been placed in the neuroprose 
archives at Ohio State University:

                NEURAL NETWORKS AND COMBINATORIAL
                 OPTIMIZATION PROBLEMS - THE KEY
                     TO A SUCCESSFUL MAPPING

          Andrew Gee, Sreeram Aiyer and Richard Prager

	       Technical Report CUED/F-INFENG/TR 77

	               Cambridge University
		      Engineering Department 
		        Trumpington Street 
		        Cambridge CB2 1PZ 
			     England 


                            Abstract

For several years now there has been much research interest in the use
of Hopfield networks to solve combinatorial optimization problems.
Although initial results were disappointing, it has since been
demonstrated how modified network dynamics and better problem mapping
can greatly improve the solution quality. The aim of this paper is to
build on this progress by presenting a new analytical framework in
which problem mappings can be evaluated without recourse to purely
experimental means. A linearized analysis of the Hopfield network's
dynamics forms the main theory of the paper, followed by a series of
experiments in which some problem mappings are investigated in the
context of these dynamics. In all cases the experimental results are
compatible with the linearized theory, and observed weaknesses in the
mappings are fully explained within the framework. What emerges is a
largely analytical technique for evaluating candidate problem
mappings, without having to resort to the more usual trial and error.

************************ How to obtain a copy ************************

a) Via FTP:

unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get gee.opt_map.ps.Z
ftp> quit
unix> uncompress gee.opt_map.ps.Z
unix> lpr gee.opt_map.ps (or however you print PostScript)

Please note that a couple of the figures in the paper were produced
on an Apple Mac, and the resulting PostScript is not quite standard.
People using an Apple LaserWriter should have no problems though.

b) Via postal mail:

Request a hardcopy from

Andrew Gee,
Speech Laboratory,
Cambridge University Engineering Department, 
Trumpington Street, 
Cambridge CB2 1PZ,
England.

or email me: ahg at eng.cam.ac.uk


From dlb at ukc.ac.uk  Wed Oct 23 08:10:16 1991
From: dlb at ukc.ac.uk (dlb@ukc.ac.uk)
Date: Wed, 23 Oct 91 13:10:16 +0100
Subject: Research Fellowship (UK)
Message-ID: <mailman.462.1149540231.24850.connectionists@cs.cmu.edu>


Research Fellowship in Neural Networks:

         Investigation of Digitally Implemented Neural Networks
               Based on Novel Goal-Seeking Principles

                 UNIVERSITY OF KENT AT CANTERBURY
               Electronic Engineering Laboratories

Applications are invited for a Research Fellowship in the Electronic
Engineering Laboratories at the University of Kent to work on an
SERC-funded project on digitally implemented neural networks.
The project, part of an on-going programme of work in neural networks,
will investigate the properties and applications of novel artificial
neural networks based on Boolean processing nodes and embodying local
low-level goal-seeking principles.

Applicants should have a good Honours degree in electronic engineering
or computer science/engineering and should preferably hold a Ph.D. degree
in an appropriate area.  Applicants with previous experience in the field
of neural networks or image analysis would be especially welcome.

The Digital Systems Research Group in the Electronic Engineering Laboratories
have a very strong research programme in computational architectures for
pattern processing, with a particular emphasis on neural network
architectures.  Extensive facilities to support this work are available, 
including both central and in-house computing systems, and a dedicated
workstation will be available for this project.  Technician
support will also be provided.

The appointment is for a three year period and is available from
1st January 1992.  The salary is on the scale 11969 - 14170 pounds.
informal enquiries may be made to Dr. Michael Fairhurst or
Dr. David Bisset on  +44 227-764000, or by e-mail to dlb at ukc.ac.uk

Further particulars and application forms are available from
The Personnel Office, The University of Kent at Canterbury,
Canterbury, Kent, CT2 7NZ, England, quoting reference A92/13. Telephone
+44 227 475482 or 764000 x3915. The closing date is 1st November 1991.

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Wed Oct 23 11:23:19 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Wed, 23 Oct 91 11:23:19 -0400
Subject: Continuous vs. Batch learning 
In-Reply-To: Your message of Tue, 22 Oct 91 16:17:52 -0800.
             <9110222317.AA22627@sanger.bio.uci.edu> 
Message-ID: <mailman.463.1149540231.24850.connectionists@cs.cmu.edu>


    It is pretty clear to me that biological neural networks have all adapted
    to prefer the continuous learning technique...
    
    Anyway, perhaps we should take an example from nature, which has been
    optimizing things far longer than we have!
    
Sure, but with a totally different technology.  Give me 10^9 processors,
10^13 active, complex connections, and 3-D packing, and make short-term
memory scarce, slow, and unreliable, and I'd pick continuous learning as
well.  And it wouldn't even take me a billion years to make the decision.

-- Scott Fahlman

From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu  Wed Oct 23 14:13:31 1991
From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (BO XU)
Date: Wed, 23 Oct 91 13:13:31 EST
Subject: Paper
Message-ID: <mailman.464.1149540231.24850.connectionists@cs.cmu.edu>

Because this is the first time I place paper into neuroprose, I have brought
lots of troubles to Jordan Pollack of Ohio State.  We don't know whether
it's due to my postscript file's problem (I generated the ps file on MacWrite
II by pressing and holding the command key and the "F" or "K"  key together
before clicking the "OK" button in the print dialogue menu)  or not, the ps
file cannot be printed at Jordan's place.  We retried it several times, and
he still cannot see it after processing it.  However, the ps file inside the
Inbox can be traced from UNIX.

So we decide to leave the paper inside the Inbox subdirectory and announce
it with a caveat that it may not work.  I am sorry for this delay and
inconvenience, and I will be very glad to know more methods to generate ps
files from MacWrite II which will have a good behavior at neuroprose archive.
Thanks in advance.

The procedure to get the ps file from the Inbox is as follows:

unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose/Inbox
ftp> binary
ftp> get ppnn.ps6
ftp> quit
unix> lpr ppnn.ps6

I want to thank Jordan for his great help since last week.  I appreciate
very much his instructions and patience in retrying different versions of
ps files I sent to him.


Bo Xu
Indiana University
itgt500 at indycms.iupui.edu

From steck at spock.wsu.ukans.edu  Wed Oct 23 15:11:40 1991
From: steck at spock.wsu.ukans.edu (jim steck (ME)
Date: Wed, 23 Oct 91 14:11:40 -0500
Subject: Batch Mode Parallel Implementations
Message-ID: <9110231911.AA01043@spock.wsu.UKans.EDU>


     S. Kollias and D. Anastassiou presented an interesting approximate
second order training algorithm using a Least Squares Estimation
Technique at IJCNN 1988  (IEEE Transactions on Circuits and Systems
vol 36 no. 8 ).

     This algorithm is interesting because it updates the weights with
each training pair, but performs the update using  information saved
from all previous training pairs.  The algorithm includes a parameter
called a forgetting factor which causes information from the previous
training pairs to slowly be discounted (or forgotten).  This is 
essentially learning somewhere in between "batch learning" and
"on line learning".

     As an appoximate second order method, it is somewhat computationally
intensive; however, the method is easily and productively vectorized
on parallel architectures.

Jim Steck


From wray at ptolemy.arc.nasa.gov  Wed Oct 23 18:33:42 1991
From: wray at ptolemy.arc.nasa.gov (Wray Buntine)
Date: Wed, 23 Oct 91 15:33:42 PDT
Subject: Paper Announcement (Neuroprose)
In-Reply-To: Barak Pearlmutter's message of Mon, 21 Oct 91 13:41:35 -0400 <9110211741.AA03347@james.psych.yale.edu>
Message-ID: <9110232233.AA17716@ptolemy.arc.nasa.gov>


>		      Simplifying Neural Network
>		     Soft Weight-Sharing Measures
>				  by
>			 Soft Weight-Measure
>			 Soft Weight Sharing
>	
>			  Barak Pearlmutter
>		       Department of Psychology
>		      P.O. Box 11A Yale Station


I enjoyed this take-off immensely.  

Determining good regularisers (or priors) is a major problem facing
feed-forward network research (and related representations), so I also
enjoyed the original Nowlan-Hinton paper.  Dramatic performance
improvements can be got by careful choice of regulariser/prior (I know
this from my tree research), and its a bit of a black art right now,
though I have some good directions.  Nowlan & Hinton suggest a strong
theoretical basis exists for their approach (see their section 8), so
perhaps we'll see more of this style, and "cleaner" versions to keep
the theoreticians happy.

By the way, at CLNL in Berkeley in August I expressed the view that
this problem: i.e.

Regularizers
------------
	for a given network/activation-function configuration,
	what are suitable parameterised families of regularizes,
	and how might the parameters be set from the knowledge
	of the particular application being addressed
NB.  the setting of the $\lambda$ tradeoff term in Nowlan & Hinton's
     equation (1) has several fairly elegant and practical solutions

along with:

Training
--------
	decision-theoretic/bounded-rationality approaches to 
	batch vs. block (sub-batch) vs. pattern updates during gradient 
	descent (i.e. of back-prop.)
	(i.e. the Fahlman-LeCunn-English-Grajski-et-al. discussion,
	      or the batch update vs. stochastic update problem)
	and subsequent addition of second-order gradient methods

as two of the most pressing problems to make feed-forward networks
a "mature" technology that will then supercede many earlier 
non-neural methods.  

Wray Buntine
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 244-17                          fax:    (415) 604 6997
Moffett Field, CA, 94035 		  email:  wray at ptolemy.arc.nasa.gov


PS.thanks also to Martin Moller for adding some meat to the Training
   problem:
>     An interesting observation is that the number of blocks needed
>     to make an update is growing during learning so that after a certain
>     number of epochs the blocksize is equal to the number of patterns.
>     When this happens the algorithm is equal to a traditional batch-mode
>     algorithm and no validation is needed anymore.
  When explaining batch update vs. stochastic update to people,
  I always use this behaviour as an example of what a decision-theoretic 
  training scheme **should** do, so I'm glad you've confirmed it
  experimentally.


From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu  Wed Oct 23 20:46:29 1991
From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (BO XU)
Date: Wed, 23 Oct 91 19:46:29 EST
Subject: Paper
Message-ID: <mailman.465.1149540231.24850.connectionists@cs.cmu.edu>

A moment ago I received a message from Jordan telling me that he can see
the ppnn.ps6 file now and he has put it into neuroprose subdirectory named
xu.ppnn.ps.Z.  I am very glad to hear this news and also sorry for possible
inconvenience to you.  Please don't follow the procedure for ppnn.ps6 in
Inbox (ppnn.ps6 may not be there anymore).  Instead, following is the
procedure to get the paper "PPNN: A Faster Learning and Better Generalizing
Neural Net":

unix> ftp archive.cis.ohio-state.edu
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get xu.ppnn.ps.Z
ftp> quit
unix> uncompress xu.ppnn.ps.Z
unix> lpr xu.ppnn.ps (or however you print postscript)

Thanks to Jordan again for his continuing efforts.

Bo Xu
Indiana University
itgt500 at indycms.iupui.edu

From karit at spine.hut.fi  Thu Oct 24 05:53:53 1991
From: karit at spine.hut.fi (Kari Torkkola)
Date: Thu, 24 Oct 91 11:53:53 +0200
Subject: Speech recognition research job in Switzerland (REPOST)
Message-ID: <9110240953.AA01337@spine.hut.fi.hut.fi>

----------------------------------------------------------------------------
RESEARCH POSITIONS AVAILABLE IN SPEECH PROCESSING (repost)

The newly created "Institut Dalle Molle
d'Intelligence Artificielle Perceptive" (IDIAP)
in Martigny, Switzerland seeks to hire qualified
researchers in the area of automatic
speech recognition. Candidates should be
able to conduct independent research in a UNIX
environment on the basis of solid theoretical
and applied knowledge. Salaries will be aligned
with those offered by the Swiss government
for equivalent positions. Researchers are expected
to begin activity in the beginning of 1992.

IDIAP is supported by the Dalle Molle
Foundation along with public-sector partners at
the local and federal levels (in Switzerland).

IDIAP is the third institute of
artificial intelligence supported by
the Dalle Molle Foundation, the others
being ISSCO (attached to the University
of Geneva) and IDSIA (situated in Lugano).
The new institute maintains close
contact with these latter centers as well
as with the Polytechnical School of Lausanne
and the University of Geneva.

Applications for a research position at IDIAP
should include the following elements: 
- a curriculum vitae 
- sample publications or technical reports
- a brief description of the research programme
  that the candidate wishes to pursue
- a list of personal references.

Applications are due by December 1, 1991 and may
be sent to the address below:

Daniel Osherson
IDIAP
Case Postale 609
CH-1920 Martigny
SWITZERLAND

For further information by e-mail, 
contact:

osherson at idiap.ch (Daniel Osherson, director) or
karit at idiap.ch    (Kari Torkkola,   researcher)

Please use the latter email address only for
inquiries concerning speech recognition research.


From prechelt at ira.uka.de  Thu Oct 24 11:16:36 1991
From: prechelt at ira.uka.de (prechelt@ira.uka.de)
Date: Thu, 24 Oct 91 16:16:36 +0100
Subject: Terminology (was: batch-mode parallel implementations)
Message-ID: <mailman.467.1149540233.24850.connectionists@cs.cmu.edu>


I noticed a lot of inconsistent use of terminology concerning 
the frequency of weight update in Backprop learning.

I would like to make a suggestion for the meaning of 
certain terms, that is not based on the democratic aspect
of what is used most often, but on investigations in 
a dictionary:

There are three cases:

(a) update after only ONE single example has been seen
(b) update after ALL of the examples have been seen
(c) something in between


The terms used are epoch, block, batch, sample, continuous, on-line.

An EPOCH is (thus saith my dictionary) not only a section of 
time or history (an "era"), but also a turning point.
This should make EPOCH the preferred term for case (b),
because the end of the training set clearly is such a
turning point.

A BATCH is a set of some size, a pile of things or so; with
some inherent need for the information about its size.
Thus it is a good candidate for case (c) and there should 
always be some indication of the size either as an absolute
number, as a fraction of training set size or by some 
qualitative criterion.

BLOCK could be a perhaps even better word for the same, for
computer scientists, because blocks are always groups of a certain 
number of similar objects and the word does not have the danger of
misunderstanding that stems from the term "batch-processing" from
the early days of data processing, where everything was being
executed completely, before you received the results.
Unfortunately, for reasons of other connotations, confusion of 
Block with Epoch is nevertheless very likely.

A SAMPLE is a part picked from a whole, usually for test purposes.
Although it is not absolutely clear, that a sample is just a
single object, in my ears the word tends to sound so.
Thus it should be indicating case (a).

CONTINOUS is a bad term to use, because the individual 
examples are not cut into parts, so BP is always discrete.

ON-LINE usually means something like "available without physical
action, merely by execution of software" and is of course
completely inappropriate to learning, except perhaps where there is 
an infinite training set constantly floating through the machine.


SUMMARY:
--------

Let us use 'Epoch' for (b), 'Batch' for (c) and 'Sample' for (a).
Let us avoid 'continous', 'on-line' and 'block' as much as possible.


I think as scientists we should exercise some discipline in the use
of language, especially when confusion is as close as in the
area of learning systems...  :->

Please direct all comments and flames to me.

  Lutz


Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  D-7500 Karlsruhe 1;  Germany   | they get
(Voice: ++49/721/608-4317, FAX: ++49/721/697760)        | less simple.

From oden at herky.cs.uiowa.edu  Thu Oct 24 12:11:12 1991
From: oden at herky.cs.uiowa.edu (Gregg Oden)
Date: Thu, 24 Oct 91 11:11:12 -0500
Subject: Batch mode in nature?
Message-ID: <9110241611.AA26933@herky.cs.uiowa.edu>


Steve Potter asks
>  I cant think of any biological examples of batch learning, in which
>  sensory data are saved until a certain number of them can be somehow
>  averaged together and conclusions made and remembered. Any ideas?

If by 'sensory data' you mean the most peripheral, unanalyzed input
representations, then probably not.  Otherwise, yes: it has been a
long-term recurring theme in the psychological literature on the
development of concepts that exemplars are remembered with a great
deal of specific detail until a sufficient corpus of them have been
acquired to support the abstraction of a general concept.
(Subsequently, idiosyncratic details may be lost/suppressed through
assimilation to the encompassing category.)  This notion is supported
by the intuitive experience of reflective recognition of regularities;
i. e., insight.  In recent years, it has also gained empirical support
from experimental work, most notably by Lee Brooks and his colleagues.

Some of this was briefly discussed in my chapter in the Annual Review
of Psychology, 1987.  (See also Oden & Lopes, "On the internal structure
of fuzzy subjective categories" in Recent Developments in Fuzzy Set and
Possibility Theory, R. Yager, ed., 1982.)

Gregg Oden
Psychology & Computer Science
U. of Iowa


From huyser at mithril.stanford.edu  Thu Oct 24 18:27:42 1991
From: huyser at mithril.stanford.edu (Karen Huyser)
Date: Thu, 24 Oct 91 15:27:42 PDT
Subject: learning and memory
Message-ID: <9110242227.AA27923@mithril.stanford.edu>

It seems to me people are confusing very different things in the recent
discussion of learning (one-shot, generalization, etc).  A posting from
Ross Gayler quotes Ernst Dow as saying (in the context of one-shot
learning):

> You may be able to identify the painting you saw before, but could you
> make the leap to recognizing all other abstract paintings?

To have the experience of seeing a painting and to be able to recall
the memory of the experience is one kind of learning and memory.
To be told by someone that the painting is of a type called "abstract" is
to add a category label, another kind of learning and memory.  However,
to recognize another painting as abstract or imitate the painting style
one must form a sufficiently rich concept to be able to make a category
with the label "abstract" and the original painting as one member of the
class.  For most humans, this involves questions, insightful answers,
and many more examples of paintings.  As a completely separate conceptual
skill, consider the learning and concept-formation task that goes on
while doing research.  How does it come about that one day we look at
a set of phenomena in a new way, with new concepts and categories?

There are many different skills that appear under the labels "learning"
and "memory".

Karen Huyser
huyser at mojave.stanford.edu

From bill at nsma.arizona.edu  Thu Oct 24 23:04:21 1991
From: bill at nsma.arizona.edu (Bill Skaggs)
Date: Thu, 24 Oct 91 20:04:21 MST
Subject: Continuous vs. Batch learning
Message-ID: <9110250304.AA07667@nsma.arizona.edu>

>It is pretty clear to me that biological neural networks have all adapted
>to prefer the continuous learning technique, as we can verify for humans
>by remembering something that we only saw (or heard, etc.) once.  One-trial
>learning paradigms abound in the behavioral literature.  I cant think of 
>any biological examples of batch learning, in which sensory data are
>saved until a certain number of them can be somehow averaged together
>and conclusions made and remembered. Any ideas?  

  David Marr's theory of the hippocampus proposed that it (the 
hippocampus) is an intermediate-term memory storage device,
performing one-shot learning of experiences and then holding
them for a period of days or weeks until they can be evaluated
for significance and then gradually moved into the neocortex for
permanent storage.

  In my humble opinion this is still the best available
theory of what the hippocampus does.  Some of the details have
changed, but the basic idea still makes sense.

  Patrick Lynn has recently been exploring a more abstract version
of Marr's idea, using a "buffer" of example patterns to train a
recurrent back-prop net, with new patterns going into the
buffer, hanging around for a while, then dropping out.  He
has found that under certain conditions buffering gives
better performance than learning each pattern only when it
is presented.

(Reference:  "Simple memory: a theory for archicortex." D. Marr, 1971,
Phil Trans Roy Soc B 262: 23-81.)

	-- Bill Skaggs

From gary at cs.UCSD.EDU  Fri Oct 25 21:59:28 1991
From: gary at cs.UCSD.EDU (Gary Cottrell)
Date: Fri, 25 Oct 91 18:59:28 PDT
Subject: Seminar abstract: The Sanguine Algorithm
Message-ID: <9110260159.AA09259@desi.ucsd.edu>


                                       SEMINAR

                New approaches to learning in Connectionist Networks

                                Garrison W. Cottrell
                                  Richard K. Belew
                          Institute for Neural Declamation
                Condominium Community College of Southern California


               Previous approaches to learning in recurrent networks  often
          involve  batch  learning: A large amount of effort is expended in
          deciding which way to move in weight space, then a little step is
          taken.  We propose a new algorithm for learning in large networks
          which is orders of magnitude more efficient than batch  learning.
          Based  on the realization that many nearby points in weight space
          are worse  than  where  we  are  now,  we  propose  the  sanguine
          algorithm.   The basic idea is to become more happy with where we
          are, rather than going to all the  work  of  moving.   Hence  the
          approach  is  quite  simple:  Randomly  sample  a nearby point in
          weight space.  Compute the error functional based on that  point.
          If  it  is  better than the current point, repeat until we find a
          nearby point that is worse.  Now, here's the real trick: Once  we
          find  a  point  worse off than where we are now, we stay where we
          are and increment a "happiness function".   That  is,  we  search
          until  we  find  a  place  that  we  can "look down on" in weight
          space[1].

               Now, in order to remain happy with where we are may  involve
          a certain amount of minor work to keep this point in weight space
          looking good.  For example, we could change the error  functional
          until  this  point  looks  better than most other points we find.
          Towards this end,  we  can  apply  recent  techniques  (Nowlan  &
          Hinton, 1991) to make the error functional soft and flabby.  Then
          we can stretch the error any way we like.  This approach can also
          be extended to replace computationally expensive "weight-sharing"
          techniques.  If we make the weights soft and flabby, then lifting
          them  becomes much easier since part of the weight always remains
          on the ground, and sharing the burden of  large  weights  becomes
          unnecessary.  Note that this can be done completely locally.

               We have applied this novel learning procedure to the problem
          of time series prediction.  Using the Mackey-Glass equations with
          dimension 3.5, we give the network values at 0,  6,  12,  and  18
          time units back in time to predict the value of the time series 6
          time units into the  future.  Using  the  Sanguine  Algorithm,  a
          network  with  only  two hidden units rapidly converges to a soft
          error functional.  Of course, the network has  no  idea  of  what
          value will come next; however, the happiness function shows it is
          quite blissful in its ignorance.  We propose that this  technique
          will   have   wide   application   in  Republican  approaches  to
          government.
          ____________________
             [1]Thus the pet name for our algorithm is the "Nyah Nyah Algo-
          rithm".

From steck at spock.wsu.ukans.edu  Sat Oct 26 13:49:10 1991
From: steck at spock.wsu.ukans.edu (jim steck (ME)
Date: Sat, 26 Oct 91 12:49:10 -0500
Subject: Batch Learning and Parallel Implementation
Message-ID: <9110261749.AA04481@spock.wsu.UKans.EDU>


Regarding Parallel implementations of Batch and online learning....
 
     S. Kollias and D. Anastassiou presented an interesting approximate
second order training algorithm using a Least Squares Estimation
Technique at IJCNN 1988  (IEEE Transactions on Circuits and Systems
vol 36 no. 8 ).
 
     This algorithm is interesting because it updates the weights with
each training pair, but performs the update using  information saved
from all previous training pairs.  The algorithm includes a parameter
called a forgetting factor which causes information from the previous
training pairs to slowly be discounted (or forgotten).  This is 
basically a type of learning somewhere inbetween "batch" learning and
"on line" learning.
 
     As an appoximate second order method, it is somewhat computationally
intensive; however, the method is easily and productively vectorized
on parallel architectures.
 
Jim Steck
Wichita State University
 

From todd at galadriel.stanford.edu  Fri Oct 25 17:50:47 1991
From: todd at galadriel.stanford.edu (todd@galadriel.stanford.edu)
Date: Fri, 25 Oct 91 14:50:47 PDT
Subject: MUSIC AND CONNECTIONISM Book Announcement
Message-ID: <9110252150.AA02708@galadriel.stanford.edu>


			      BOOK ANNOUNCEMENT:

			   MUSIC AND CONNECTIONISM
				  edited by
		       Peter M. Todd and D. Gareth Loy


MUSIC AND CONNECTIONISM is now available from MIT Press.  This 280-pp. book
contains a wide variety of recent research in the applications of neural
networks and other connectionist methods to the problems of musical listening
and understanding, performance, composition, and aesthetics.  It consists of a
core of articles that originally appeared in the Computer Music Journal, along
with several new articles by Kohonen, Mozer, Bharucha, and others, and new
addenda to the original articles describing the authors' most recent work.
Topics covered range from models of psychological processing of pitches,
chords, and melodies, to algorithmic composition and performance factors.  A
wide variety of connectionist models are employed as well, including
back-propagation in time, Kohonen feature maps, ART networks, and Jordan- and
Elman-style networks.  We've also included a discussion generated by the
Computer Music Journal articles on the use and place of connectionist systems
in artistic endeavors.  A more detailed description of the book is provided
below (from the jacket text), along with the complete table of contents.

We hope this book will be of use to a wide variety of readers, including
neural network researchers interested in a broad, challenging, and fun new
area of application, cognitive scientists and music psychologists looking for
robust new models of musical behavior, and artists seeking to learn more about
a potentially very useful technology.

MUSIC AND CONNECTIONISM can be found in bookstores that carry MIT Press
publications, or can be purchased directly from MIT Press by calling their
toll-free order number, 1-800-356-0343, and giving the operator this catalog
number: 1CSAT 503, and this book code: TODMH.  By phone and mail-order, the
price is $39.95; in stores, it will probably be $45 (there is some confusion
with the publisher on this point, so I wanted to give out the detailed
information for phone orders to save people some money).

Please drop me a line if you have any questions, and especially if you take up
the gauntlet and pursue research or applications in this area!

                             cheers,
                             peter todd


*****************************************************************************
			   Music and Connectionism
		  edited by Peter M. Todd and D. Gareth Loy

As one of our highest expressions of thought and creativity, music has always
been a difficult realm to capture, model, and understand.  The connectionist
paradigm, now beginning to provide insights into many realms of human
behavior, offers a new and unified viewpoint from which to investigate the
subtleties of musical experience.  \fIMusic and Connectionism\fP provides a
fresh approach to both fields, using techniques of connectionism and parallel
distributed processing to look at a wide range of topics in music research,
from pitch perception to chord fingering to composition.

The contributors, leading researchers in both music psychology and neural
networks, address the challenges and opportunities of musical applications of
network models.  The result is a current and thorough survey that advances our
understanding of musical perception, cognition, composition, and performance
and of the design and analysis of networks.

Music and Connectionism is based on a core of articles originally appearing as
two special issues of the Computer Music Journal.  These have been augmented
with addenda covering more recent research by the authors.  The book opens
with tutorial chapters introducing neural networks in a musical context and
relevant aspects of previous computer music research, making this a
self-contained text.  There are many new chapters, along with new section
introductions, summaries of related work, and a final debate on the artistic
implications of connectionist methods.

Peter M. Todd is a doctoral candidate in the PDP Research Group of the
Psychology Department at Stanford University.  Gareth Loy DMA is an
award-winning composer, member of the Board of Directors of the Computer Music
Association, lecturer in the Music Department of UC San Diego, and member of
the technical staff of Frox Inc.

Contents:

Preface and Introduction
                Peter M. Todd and D. Gareth Loy

Part 1: Background
	Machine Tongues XII: Neural Networks
		Mark Dolson	    		
	Connectionism and Musiconomy
		D. Gareth Loy	       		

Part 2: Perception and Cognition
	A Neural Net Model for Pitch Perception
		Hajime Sano and B. Keith Jenkins
	Connectionist Models for Tonal Analysis
		Don L. Scarborough, Ben O. Miller, and Jacqueline A. Jones
	The Representation of Pitch in a Neural Net Model of Chord Classification
		Bernice Laden and Douglas H. Keefe
	Pitch, Harmony, and Neural Nets:  A Psychological Perspective
		Jamshed J. Bharucha	  		 
	The Ontogenesis of Tonal Semantics:  Results of a Computer Study
		Marc Leman  				 
	Modeling the Perception of Tonal Structure with Neural Nets
		Jamshed J. Bharucha and Peter M. Todd	 
	Using Connectionist Models to Explore Complex Musical Patterns
		Robert O. Gjerdingen			 
	The Quantization of Musical Time: A Connectionist Approach
		Peter Desain and Henkjan Honing		 

Part 3: Applications
	A Connectionist Approach to Algorithmic Composition
		Peter M. Todd					
	Connectionist Music Composition Based on Melodic, Stylistic, and 
          Psychophysical Constraints
		Michael C. Mozer                                
	Creation By Refinement and the Problem of Algorithmic Music Composition
		J.P. Lewis 				        
	A Nonheuristic Automatic Composing Method
		Teuvo Kohonen, Pauli Laine, Kalev Tiits, and Kari Torkkola
	Fingering for String Instruments with the Optimum Path Paradigm
		Samir I. Sayegh				        

Part 4: Conclusions
	Letter from Otto Laske 					
	Responses to Laske by Todd and Loy			
	Further Research and Directions
	      Peter M. Todd 					

List of Author Addresses                                        

From white at teetot.acusd.edu  Fri Oct 25 19:49:14 1991
From: white at teetot.acusd.edu (Ray White)
Date: Fri, 25 Oct 91 16:49:14 -0700
Subject: No subject
Message-ID: <9110252349.AA27577@teetot.acusd.edu>

Larry Fast writes:

> I'm expanding the PDP Backprop program (McClelland&Rumlhart version 1.1) to
> compensate for the following problem:
 
> As Backprop passes the error back thru multiple layers, the gradient has
> a built in tendency to decay.  At the output the maximum slope of 
> the 1/( 1 + e(-sum)) activation function is 0.5.
> Each successive layer multiplies this slope by a maximum of 0.5.
.....
 
> It has been suggested (by a couple of sources) that an attempt should be
> made to have each layer learn at the same rate. ...
 
> The new error function is:  errorPropGain * act * (1 - act)

This suggests to me that we are too strongly wedded to precisely
f(sum) = 1/( 1 + e(-sum)) as the squashing function. That function
certainly does have a maximum slope of 0.25.

A nice way to increase that maximum slope is to choose a slightly different
squashing function.  For example f(sum) = 1/( 1 + e(-4*sum)) would fill
the bill, or if you'd rather have your output run from -1 to +1, then
tanh(sum) would work.  I think that such changes in the squashing function
should automatically improve the maximum-slope situation, essentially by
doing the "errorPropGain" bookkeeping for you.

Such solutions are static fixes. I suggested a dynamic adjustment of the
learning parameter for recurrent backprop at IJCNN - 90 in San Diego
(The Learning Rate in Back-Propagation Systems: an Application of Newton's
Method, IJCNN 90, vol I, p 679). The method amounts to dividing the
learning rate parameter by the square of the gradient of the output
function (subject to an empirical minimum divisor). One should be able
to do something similar with feedforward systems, perhaps on a layer by
layer basis.

- Ray White (white at teetot.acusd.edu)


Please respond directly to  72247.2225 at compuserve.com
 
	Thanks, Larry Fast


From BUTUROVIC%BUEF78%yubgef51.bitnet at BITNET.CC.CMU.EDU  Sun Oct 27 14:17:00 1991
From: BUTUROVIC%BUEF78%yubgef51.bitnet at BITNET.CC.CMU.EDU (BUTUROVIC%BUEF78%yubgef51.bitnet@BITNET.CC.CMU.EDU)
Date: Sun, 27 Oct 1991 21:17 +0200
Subject: forward propagation
Message-ID: <2B147310A0000F63@yubgef51.bitnet>

I am interested in training multi-layer perceptron without using
back-propagation (BP) of the error.
 
MLP training by means of the back-propagation (BP) algorithm is in fact
minimization of the criterion function using the ordinary gradient-descent
minimization algorithm. For this, the computation of derivatives is
necessary. Now, it is off course possible to optimize multi-variable function
without computation of derivatives. One of effective algorithms
of this type is simplex algorithm [1], so it seems logical to utilize it for
MLP training. There are two advantages in avoiding derivatives: first,
transfer functions of the individual neurons may be non-differentiable.
Second, BP utilizes a criterion function that must be
written in the form of the average squared difference between target
and actual outputs (there are variants to this, but, for the
purpose of this discussion, they vary insignificantly), and the
derivative of this function with respect to the weights must be
computable. Using simplex, i. e. not using derivatives,
this limitation can be avoided, as long as the function to be
minimized can be measured. This can be important for applications in
control where we are sometimes not able to express criterion function
as a function of network parameters.
 
There is one serious limitation regarding this algorithm, and it is
spatial complexity. It requires roughly N*N memory locations, where
N is the number of variables (network weights). In practice, this
limits the size of the network to a couple of thousands of weights.
 
In order to verify the behavior of the algorithm, I performed
extensive experiments with Ljubomir Citkusev of the Boston University.
We trained MLP to perform classification tasks on three data sets. In short,
the results obtained indicate that training of the network using simplex
can be done successfully. However, BP is more effective, regarding both
classification accuracy (i. e., function approximation accuracy), and
computational complexity (number of iterations). We didn't yet verify
the ability of the algorithm to train networks with non-differentiable
transfer functions or criterion functions that can not be computed
analitically.
 
It is puzzling that in [2] Minsky and Papert claimed the training
of the perceptrons with hidden layers to be impossible, while at that
time (1969.) there was available effective algorithm for precisely
that task. While BP was shown to be superior in our experiments, they
could have done some quite satisfactory training of the multi-layer
networks when they wrote the book. I tried to talk to Minsky about
this, but I couldn't do it.
 
I would like to hear people's opinion on this idea. Also, it would be
beneficial to know if anyone is aware of similar work.
 
        Thanks, Ljubomir Buturovic, University of Belgrade
 
References
[1] Nelder, J. A., and Mead, R. 1965, Computer Journal, vol. 7,
p. 308.
[2] M. Minsky, and S. Papert, Perceptrons: An Introduction to
Computational Geometry, MIT Press, 1969.
 

From kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET  Mon Oct 28 09:49:58 1991
From: kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET (Nitin Indurkhya)
Date: Mon, 28 Oct 91 09:49:58 JST
Subject: Robinson's vowel dataset
Message-ID: <9110280049.AA00241@hcrlgw.crl.hitachi.co.jp>

Does anyone have any NEW results on Robinson's vowel dataset. I am aware of
the original results given in his thesis:

A. Robinson. "Dynamic Error Propagation Networks", PhD Thesis, Cambridge Univ
1989.

Please send me mail,

	thanks
	Nitin Indurkhya

	(nitin at crl.hitachi.co.jp)

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Mon Oct 28 00:10:20 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Mon, 28 Oct 91 00:10:20 EST
Subject: Announcement of NIPS Workshop
Message-ID: <mailman.468.1149540233.24850.connectionists@cs.cmu.edu>


The Neural Information Processing Systems Conference will be followed by a
program of workshops in Vail, Colorado on December 6 and 7, 1991.  The
following one-day workshop will be offered on December 6:


	      Constructive and Destructive Learning Algorithms

		   Workshop Leader: Scott E. Fahlman
		       School of Computer Science
		       Carnegie Mellon University
			  Pittsburgh, PA 15213
		      Internet: fahlman at cs.cmu.edu

Most existing neural network learning algorithms work by adjusting
connection weights in a fixed network.  Recently we have seen the emergence
of new learning algorithms that alter the network's topology as they learn.
Some of these algorithms start with excess connections and remove any that
are not needed; others start with a sparse network and add hidden units as
needed, sometimes in multiple layers; some algorithms do both.  These
algorithms eliminate the need to guess in advance what network topology
will best fit a given problem.  In addition, some of these algorithms claim
significant improvements in learning speed and generalization.

A successful two-day workshop on this topic was presented at the NIPS-90
conference.  A number of algorithms were presented by their authors and
were critically evaluated.  The past year has seen a great deal of
additional work in this area, so a second workshop on this topic seems
appropriate.  We will briefly review the major algorithms presented last
year.  Then we will turn to more recent developments, including both new
algorithms and experience gained in using the older ones.  Finally, we will
consider current trends and will try to identify open problems for future
research.

I would like to hear from people who are interested in presenting new
algorithms or results at this workshop.  I would particularly like to hear
from people with application results or comparative studies using
algorithms of this kind.  The tentative plan, depending on the response we
get, is allow 15-20 minutes for each presentation, with ample time for
discussion.  If you would like to present something, please send a short
description to Scott Fahlman, at the internet address listed above.

For Cascade-Correlation fans, I will be presenting a new variation called
"Cascade 2" that performs better than the original in a number of
situations, especially in problems with continuous analog outputs.


From tesauro at watson.ibm.com  Mon Oct 28 11:41:58 1991
From: tesauro at watson.ibm.com (Gerald Tesauro)
Date: Mon, 28 Oct 91 11:41:58 EST
Subject: Program information: NIPS91 Workshops
Message-ID: <mailman.469.1149540233.24850.connectionists@cs.cmu.edu>

The NIPS91 post-conference workshops will take place Dec. 5-7, 1991,
at the Marriott Mark Resort Hotel in Vail, Colorado. The following
message gives information on the program schedule and local
arrangements, and is organized as follows:
    I. Summary schedule
    II. Workshop schedule
    III. Arrangements information
    IV. Workshop abstracts

I. Summary Schedule:

        Thursday, Dec. 5th     5:00 pm           Registration Open
                               7:00 pm           Orientation Meeting
                               8:00 pm           Reception

        Friday, Dec. 6th       7:00 am           Breakfast
                               7:30 - 9:30 am    Workshop Sessions
                               4:30 - 6:30 pm    Workshop Sessions
                               7:00 pm           Banquet

        Saturday, Dec. 7th     7:00 am           Breakfast
                               7:30 - 9:30 am    Workshop Sessions
                               4:30 - 6:30 pm    Workshop Sessions
                               6:30 - 7:00 pm    Wrap-up
                               7:30 pm           Barbecue Dinner
                                                    (optional)


II. Workshop schedule:

        Friday, Dec. 6th:

        Character recognition
        Projection pursuit and neural networks
        Constructive and destructive learning algorithms II
        Modularity in connectionist models of cognition
        VLSI neural networks and neurocomputers (1st day)
        Recurrent networks: theory and applications (1st day)
        Active learning and control (1st day)
        Self-organization and unsupervised learning in vision (1st day)
        Developments in Bayesian methods for neural networks (1st day)

        Saturday, Dec. 7th:

        Oscillations and correlations in neural information processing
        Optimization of neural network architectures for speech recognition
        Genetic algorithms and neural networks
        Complexity issues in neural computation and learning
        Computer vision vs. network vision
        VLSI neural networks and neurocomputers (2nd day)
        Recurrent networks: theory and applications (2nd day)
        Active learning and control (2nd day)
        Self-organization and unsupervised learning in vision (2nd day)
        Developments in Bayesian methods for neural networks (2nd day)


III. Arrangements information:

        Accomodations:

           The conference sessions will be held in the banquet area at
              Marriott Mark Resort, at Vail CO, 90 miles west of Denver.
              For accomodations, call the Mariott at (303)-476-4444.  Our
              room rate is $74 (single or double).  Condos for larger
              groups can be arranged through Destination Resorts, at
              (303)-476-1350.

        Registration:

           Registration fee for the workshops is $100 ($50 for students).

        Transportation:

           CME (Colorado Mountain Express) will be running special shuttles
              from the Sheraton in Denver up to the Marriott in Vail Thursday
              afternoon at a price of $31.00 per person.  Call them at 1-800-
              525-6363, at least 24 hours in advance, to reserve and give a
              credit card number for prepayment.  CME also runs shuttles down
              from Vail to the Denver airport, same price, on Sunday at many
              convenient times.  The earlier you call CME, the more vans will
              be made available for our use.  Be sure to mention our special
              group code "NIPS".

           Hertz has a desk in the Sheraton, and will rent cars at a weekend
              rate for the trip up to Vail and back to the airport in Denver.
              This is an unlimited mileage rate; prices start at $60 (three
              days, plus tax).  To make reservations call the Sheraton at
              1-800-552-7030 and ask for Kevin Kline at the Hertz desk.

        Skiing:

           Skiing at Vail can be expensive.  The lift tickets this year were
           slated to rise to $40 per day.  The conference has negotiated
           very attractive group rates for tickets bought in advance:

              $56  for a 2-day ticket
              $84  for a 3-day ticket
              $108 for a 4-day ticket

           You can purchase these by sending a check to the conference
              registration office:  NIPS*91 Registration, Siemens Research
              Center, 755 College Road East, Princeton, NJ 08540.
              The tickets will be printed for us, and available
              when we get to Vail on Thursday evening.

           There are several sources for rental boots and skis in Vail.
              The rental shop at the lifts and Banner Sports (located in
              the Marriott) are offering the following packages to those
              who identify themselves as NIPS attendees:

                                     skis, boots, poles       skis, poles
              standard package            $  8 / day           $6 / day

              performance package         $ 11 / day           $9 / day

           Banner will, as extra incentives, stay open for us after the
           Thursday orientation meeting, and give a 10% discount on
           anything else in the store.

        Optional Gourmet barbecue dinner(!):

           Finally, besides the conference banquet, included in the
           registration fee, there will be an optional dinner on Saturday
           night at Booco's Station, a few miles outside of Vail and world
           famous for its barbecued meats and special sauces.  Dinner will
           include transportation (if you need it), appetizers,
           all-you-can-eat barbecue, cornbread, vegetables, dessert,
           and more than 40 kinds of beer at the cash bar.  Tickets will
           be on sale at the Sheraton and at the Marriott.  Price: $27.


IV. Workshop Abstracts:
=========================================================================
Modularity in Connectionist Models of Cognition

Organizer: Jordan Pollack, Ohio State Univ.

Speakers: Michael Mozer, Univ of Colorado
          Robert Jacobs, MIT
          John Barnden, New Mexico State University
          Rik Belew, UCSD

Abstract:
Classical modular theories of mind presume mental "organs" - function
specific, put in place by evolution - which communicate in a symbolic
language of thought.  In the 1980's, Connectionists radically rejected
this view in favor of more integrated architectures, uniform learning
systems which would be very tightly coupled and communicate through
many feedforward and feedback connections. However, as connectionist
attempts at cognitive modeling have gotten more ambitious, ad-hoc
modular structuring has become more prevalent. But there are concerns
regarding how much architectural bias is allowable.  There has been a
flurry of work on resolving these concerns by seeking the principles
by which modularity could arise in connectionist architectures. This
will involve solving several major problems - data decomposition,
structural credit assignment, and shared adaptive representations.
This workshop will bring together proponents of modular connectionist
architectures to discuss research direction, recent progress, and
long-term challenges.
=========================================================================
Character Recognition
Organizers: C. L. Wilson and M. D. Garris, National Institute
of Standards and Technology

Speakers: Jon Hull, SUNY Buffalo
	  Tom Vogl, ERIM
	  Jim Keeler, MCC
	  Chris Schofield, Nestor
	  C. L. Wilson, NIST
	  R. G. Casey, IBM

Abstract:
This workshop will consider issues related to present and future testing
needs for character recognition including:
   1) What is user experience in using the NIST and other
      publicly available databases?
   2) What types of databases will be required in the future?
   3) What are future testing needs, such as x-y coordinate stream or
      gray level data?
   4) How can the evaluation of current research problems, such as
      segmentation, be enhanced through carefully designed databases,
      standard testing procedures, and automated evaluation methodologies.
   5) Is the incorporation of context important in testing?
   6) What other issues face the research and development of large scale
      recognition systems?
The target audience includes those interested in and/or working on
hand print recognition and developers who wish to include character
recognition as part of systems to recognize documents.
=========================================================================
Genetic Algorithms and Neural Networks

Organizer: Rik Belew, Univ. of Calif. at San Diego

Speakers: Rik Belew and Dave Rogers

Abstract:
This workshop will examine theoretical and algorithmic
interactions between GA and NNet techniques, as well as
models of the evolutionary constraints on nervous systems.
Specific topics include:
   1) Comparison and composition of global GA sampling techniques
      with the local (gradient) search of NNet methods.
   2) Use of the GA to evolve additional higher-order function
      approximation terms (``hidden units'').
   3) The dis/advantages of GA recombination and its impact
      on appropriate representations for NNets.
   4) Trade-offs between NNet training time and GA generational time.
   5) Parallel implementations of GAs that facilitate NNet simulation.
   6) A role for ontogenesis between GA evolution and NNet learning.
   7) The role optimality (doesn't!) play in evolution
=========================================================================
Projection Pursuit and Neural Networks

Organizers: Ying Zhao, Chris Atkeson and Peter Huber, MIT

Speakers: R.Douglas Martin, University of Washington
	  John Moody, Yale University
	  Ying Zhao, MIT
	  Andrew R. Barron, University of Illinois
	  Nathan Intrator, Brown University
	  Trevor Hastie, Bell Labs

Abstract: Projection Pursuit is a nonparametric statistical technique
to find "interesting" low dimensional projections of high dimensional
data sets. We hope to improve our understanding of neural networks
and projection pursuit by discussing issues such as fast training
algorithms based on PP, duality with kernel approximation, possible
avoidance of the "curse of dimensionality", and the sample complexity
for PP.
=========================================================================
Constructive and Destructive Learning Algorithms II

Organizer: Scott E. Fahlman, Carnegie Mellon University

Speakers: TBA

Abstract:
Recently we have seen the emergence of new learning algorithms that
alter the network's topology.  Some of these algorithms start with
excess connections and remove any that are not needed; others start
with a sparse network and add hidden units as needed, sometimes in
multiple layers; some algorithms do both.  In a two-day workshop
on this topic at NIPS-90, a number of learning algorithms that
modify network topology were presented by their authors and were
critically evaluated.  The past year has seen a great deal of
additional work in this area.  We will briefly review the major
algorithms presented last year.  Then we will turn to more recent
developments, including both new algorithms and experience gained
in using the older ones.  Finally, we will consider current trends
and will try to identify open problems for future research.
=========================================================================
Oscillations and Correlations in Neural Information Processing

Organizer: Ernst Niebur, Caltech

Speakers: Bard Ermentrout, U. of Pittsburgh
	  Hennric Jokeit, U. of Munich
	  Marius Usher, Weizmann Institute
	  Ernst Niebur, Caltech

Abstract:
This workshop will address models proposed for tasks like tieing
together the different parts of one object in the visual field
or for binding the different representations of an object in
different cortical areas. Both oscillation-based models as well
as alternative models based on phase coherence (correlations)
will be considered in the light of the latest experimental findings.
=========================================================================
Optimization of Neural Network Architectures for Speech Recognition

Organizers: Uli Bodenhausen, Universitaet Karlsruhe
	    Alex Waibel, Carnegie Mellon University

Speakers: Kenichi Iso, NEC Corporation, Japan
	  Patrich Haffner, CNET, France
	  Mike Franzini, Telefonica I + D, Spain

Abstract:
A variety of neural network algorithms have recently been applied to
speech recognition tasks. Besides having learning algorithms for
weights, optimization of the network architectures is required to
achieve good performance. Also of critical importance is the
optimization of neural network architectures within hybrid systems
for best performance of the system as a whole. Parameters that have
to be optimized within these constraints include the number of hidden
units, number of hidden layers, time-delays, connectivity within the
network, input windows, the number of network modules, number of states
and others. The proposed workshop intends to discuss and evaluate the
importance of these architectural parameters and different integration
strategies for speech recognition systems. Participating researchers
interested in speech recognition are welcome to present short case
studies on the optimization of neural networks, preferably with an
evaluation of the optimization steps. The workshop could also be of
interest to researchers working on constructive/destructive learning
algorithms because the relevance of different architectural parameters
should be considered for the design of these algorithms.
=========================================================================
SELF-ORGANIZATION AND UNSUPERVISED LEARNING IN VISION

Organizer: Jonathan A. Marshall, Univ. of North Carolina

Speakers: Suzanna Becker, University of Toronto
	  Irving Biederman, University of Southern California
	  Thomas H. Brown, Yale University
	  Joachim M. Buhmann, Lawrence Livermore National Laboratory
	  Heinrich Bulthoff, Brown University
	  Edward Callaway, Duke University
	  Allan Dobbins, McGill University
	  Gillian Einstein, Duke University
	  Charles Gilbert, The Rockefeller Universty
	  John E. Hummel, UCLA
	  Daniel Kersten, University of Minnesota
	  David Knill, University of Minnesota
	  Laurence T. Maloney, New York University
	  Jonathan A. Marshall, University of North Carolina at Chapel Hill
	  Paul Munro, University of Pittsburgh
	  Albert L. Nigrin, American University
	  Alice O'Toole, The University of Texas at Dallas
	  Jurgen Schmidhuber, University of Colorado
	  Nicol Schraudolph, University of California at San Diego
	  Michael P. Stryker, University of California at San Francisco
	  Patrick Thomas, Technische Universitat Muenchen
	  Rich Zemel, University of Toronto

Abstract:
This workshop considers the role that unsupervised learning
procedures (e.g. Hebb-type rules) may play in the self-organization
of cortical structures involved in the processing of visual
information. Researchers in visual neuroscience, visual psychophysics
and neural network modeling will be brought together to address
head-on the key issue of how animal visual systems got the way
they are. We hope that this will lead to a better understanding
of the factors that shape the structure of animal visual systems,
as well as better models of the neurophysiological processes
underlying vision.
=========================================================================
Developments in Bayesian methods for neural networks
Organizers: David MacKay, Caltech
	    Steve Nowlan, Salk Institute

Abstract:
The first day of this workshop will be 50% tutorial in content,
reviewing some new ways Bayesian methods may be applied to neural
networks.  The rest of the workshop will be devoted to discussions of
the frontiers and challenges facing Bayesian work in neural networks,
including issues such as Monte Carlo clustering, data selection,
active query learning, prediction of generalisation, missing inputs,
unlabelled data and discriminative training, Discussion will be
moderated by John Bridle.

Speakers: Radford Neal
	  Jurgen Schmidhuber
	  John Moody
	  David Haussler + Michael Kearns
	  Sara Solla + Esther Levin
	  Steve Renals

Reading up before the workshop
------------------------------
People intending to attend this workshop are encouraged to obtain
preprints of relevant material before NIPS. A selection of preprints
are available by anonymous ftp, as follows:

unix> ftp hope.caltech.edu		(or ftp 131.215.4.231)
login: anonymous
password: <your name>
ftp> cd pub/mackay
ftp> get README.NIPS
ftp> quit

Then read the file README.NIPS for further information.
Problems? Contact David MacKay, mackay at hope.caltech.edu
=========================================================================
Active Learning and Control

Organizers: David Cohn, Univ. of Washington
	    Don Sofge, MIT

Speakers: C. Atkeson, MIT
	  A. Barto, Univ. of Massachussetts, Amherst
	  J. Hwang, Univ. of Washington
	  M. Jordan, MIT
	  A. Moore, MIT
	  J. Schmidhuber, University of Colorado, Boulder
	  R. Sutton, GTE
	  S. Thrun, Carnegie-Mellon University

Abstract:
An "active" learning system is one that is not merely a passive
observer of its environment, but instead play an active role in
determining its inputs.  This definition includes classification
networks that query for values in "interesting" parts of their domain,
learning systems that actively "explore" their environment, and
adaptive controllers that learn how to produce control outputs to
achieve a goal.

Common facets of these problems include building world models in
complex domains, exploring a domain to safely and efficiently, and,
planning future actions based on one's model.

In this workshop, our main focus will be to address key unsolved
problems which may be holding up progress on these problems rather
than presenting polished, finished results. Our hopes are that
unsolved problems in one field may be able to draw on insight from
research in other fields.
=========================================================================
Computer Vision vs Network Vision
Organizers: John Mayhew and Terry Sejnowski

Speakers: TBA

Abstract:
Computer vision has developed a methodology based on sound
engineering practice:  1.  Break the problem down into well-defined
subproblems and mathematically analyze each part;  2.  Develop efficient
algorithms for each module; 3. Implement each algorithm with the best
available technology.  These are Marr's three levels: computational,
algorithmic, and implementational.

In contrast, proponents of neural networks have developed a different
methodology:  1.  Find a good representation for the input data that makes
explicit the features needed to solve the problem; 2.  Use learning algorithms
to cluster and categorize the data; 3.  Glue together networks that solve
different parts of the problem with more learning.  Networks are memory
intensive and constraints from the hardware level are as important as
constraints from the computational level.

This workshop is intended to provoke a lively and free-wheeling
discussion of the central issues in vision.
=========================================================================
Complexity Issues in Neural Computation and Learning

Organizers: Kai-Yeung Sui and Vwani Roychowdhury, Stanford Univ.

Speakers: TBA

Abstract: The goal of this workshop is to address recent developments
in understanding the capabilities and limitations of various models
for neural computation and learning. Topics will include: 1) circuit
complexity of neural networks, 2) capacity of neural networks, and
3) complexity issues in learning algorithms.
=========================================================================
RECURRENT NETWORKS: THEORY AND APPLICATIONS

Organizers: Luis Borges de Almeida, INESC
	    C. Lee Giles, NEC Research Institute
	    Richard Rohwer, Edinburgh University

Speakers: TBA

Abstract:
Recurrent neural networks have a very large potential for handling
dynamical / sequential problems, e.g. recognition and classification
of time-dependent signals like speech, modelling and control of
dynamical systems, learning of grammars and symbolic processing, etc.
However, the fulfillment of this potential remains an important
open issue. Training algorithms are very inefficient in terms of
memory and computational demands.  Little is known about convenient
architectures. The number of known successful applications is very
limited.  This is true even for static applications (operation in the
"fixed point mode").

The first day of this two-day workshop will focus on the outstanding
theoretical issues in recurrent neural networks, and the second
day will examine existing and potential real-world applications.
=========================================================================
VLSI Neural Networks and Neurocomputers

Organizers: Clifford Lau, Office of Naval Research
	    Jim Burr, Stanford University

Speakers: TBA

Abstract:
This two-day workshop will address the latest advances in VLSI
implementations of neural nets, and the design of high performance
neurocomputers.  We will present an updated list of currently
available neurochips, and discuss a wide range of issues, including:
   1) Design issues: Advantage and disadvantage of analog and digital
      approaches; how much arithmetic precision is necessary;
      which algorithms have been implemented; importantance of on-chip
      learning; neurochip design in existing CAD environment.
   2) Performance issues: Critical factors to achieve robust performance;
      Tradeoffs between capacity and performance; scaling limits to
      constructing large neural networks.
   3) Use of neurochips: What input/output devices are necessary;
      what programming support environment is necessary.
   4) Application areas for supercomputing neurocomputers

From zeiden at cs.wisc.edu  Mon Oct 28 10:30:14 1991
From: zeiden at cs.wisc.edu (zeiden@cs.wisc.edu)
Date: Mon, 28 Oct 91 09:30:14 CST
Subject: tech report available in NEUROPROSE 
Message-ID: <9110281530.AA29229@ai.cs.wisc.edu>

I have placed the following tech report in the NEUROPROSE ftp
archive at Ohio State, under the name zeidenberg.containment.ps.Z

Implementing Spatial Relations in Neural Nets: The Case of
Figure/Ground and Containment

Matthew Zeidenberg
zeiden at cs.wisc.edu

A neural network system that computes the relation of containment
between objects in a retina-like input array is described. This
system is multi-layer, and operates by recognizing and segmenting the
objects in the input to place them in separated arrays. The figure of
each object, that is, the set of all pixels on the perimeter of or
contained in the object, is computed for each object, using a method
that involves a connectionist implementation of a standard algorithm
using parity networks. These figures are then used to compute containment
relations between the objects in the input.

ftp Instructions:

unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get zeidenberg.containment.ps.Z
ftp> quit
unix> uncompress zeidenberg.containment.ps.Z
unix> lpr zeidenberg.containment.ps (or other command to print
postscript)


From black at seismo.CSS.GOV  Mon Oct 28 12:01:00 1991
From: black at seismo.CSS.GOV (Mike Black)
Date: Mon, 28 Oct 91 12:01:00 EST
Subject: What is current technology in Analog Neural Nets?
Message-ID: <9110281701.AA21092@beno.CSS.GOV>


I've seen little discussion and have found no references to work
in analog neural networks.  If you can provide some references or
indicate what your current work is I'll summarize.  These are
the goals for my current research:

Given an analog data source (e.g. pulse generator):

1.  Recognize pulses (for example a single shot square wave) and reject
"noise" (i.e. triangular wave) at rates of at least 10MHz (that is, it
should be able to deal with a minimum 100ns pulse width).

2.  Provide the trigger for an external digitizer to grab the resultant
"good" pulses.

3.  Be software controllable (hardware should be able to be updated by
remote control).


Please forward any current work or capability in this area to:
black at beno.css.gov

>> -------------------------------------------------------------------------------
>> : usenet: black at beno.CSS.GOV   :  land line: 407-494-5853  : I want a computer:
>> : real home: Melbourne, FL     :  home line: 407-242-8619  : that does it all!:
>> -------------------------------------------------------------------------------


From lissie!botsec7!botsec1!dcl at uunet.UU.NET  Mon Oct 28 13:54:54 1991
From: lissie!botsec7!botsec1!dcl at uunet.UU.NET (David Lambert)
Date: Mon, 28 Oct 91 13:54:54 EST
Subject: Resource Allocation Network (RAN)
Message-ID: <9110281854.AA20399@botsec1.bot.COM>


Dear Connectionists:

Has anyone tried to implement the Resource Allocation Network
of John Platt (NIPS 3 and Neural Computation V3 #2)?  I have
a first cut at an implementation, and so far I have not been
able to approach his published results.

I'd be very interested in corresponding with anyone who has
tried this algorithm.

Also, if anyone has a means of reaching John Platt, I'd love
to hear about it.  I've been calling Synaptics in San Jose for
over a week now, and there don't seem to be any humans that
work there...only voice mail.

Thanks

David Lambert
dcl at object.com or dcl at panix.com

From khosla at latcs1.lat.oz.au  Mon Oct 28 22:32:44 1991
From: khosla at latcs1.lat.oz.au (Rajiv Khosla)
Date: Tue, 29 Oct 91 14:32:44 +1100
Subject: Spatial crosstalk and modular NN architecture
Message-ID: <9110290332.AA18704@latcs1.lat.oz.au>


Dear Connectionists,

          This is regarding my problem of making a 28-11-26, binary
input/output neural network work. Thanks to everyone who sent me the
replies. Its working nice and kicking. Best results are achieved by
connecting the input layer to the output layer. 

               Thanks once again
                                                Rajiv

From terry at jeeves.UCSD.EDU  Tue Oct 29 02:35:07 1991
From: terry at jeeves.UCSD.EDU (Terry Sejnowski)
Date: Mon, 28 Oct 91 23:35:07 PST
Subject: Continuous vs. Batch learning
Message-ID: <9110290735.AA01748@jeeves.UCSD.EDU>

There is evidence that the hippocampus is doing something like
batch mode teaching for neocortex.  The hippocampus is needed for
one-shot learning, also called declarative or episodic learning.
It seems to be storing up a lot of examples and over a period of
months transfers this informaiton to cortex, where it is stored
in a more categorical representation.

Terry

-----

From smieja at jargon.gmd.de  Tue Oct 29 05:14:40 1991
From: smieja at jargon.gmd.de (Frank Smieja)
Date: Tue, 29 Oct 91 11:14:40 +0100
Subject: Batch methods versus stochastic methods...
In-Reply-To: mmoller@daimi.aau.dk's message of Mon, 21 Oct 91 13:13:06 +0100
Message-ID: <9110291014.AA24169@jargon.gmd.de>

-) Unfortunately, we do not have any datasets of the proper size.
-) So I would appreciate if anyone could inform me about where to find big 
-) datasets that are public available.
-) 
-) -- Martin M
-) 
-) -----------------------------------------------------------------------
-) Martin F. Moller	       	email: mmoller at daimi.aau.dk
-) Computer Science Department	phone: +45 86202711 5223
-) Aarhus University		fax:    +45 86135725
-) Ny Munkegade, Building 540
-) 8000 Aarhus C
-) Denmark
-) ----------------------------------------------------------------------

I demonstrated in my paper "MLP Solutions, Generalization and Hidden
Unit Representations" in the DANIP (Distributed And Neural Information
Processing) conference in Bonn, Germany, April 1989 (ed: Kindermann &
Linden, pub: Oldenbourg Verlag), how one might "synthetically"
construct a training set of any size of inputs/outputs, that may be
generalized, insofar that the "regularities" beloved by our networks
are guaranteed to exist, since they are used to generate the training
set pairs, but not visible to the network until the examples are seen,
and the learning results in "emergent generalization".  I used this
method in the paper to study a small diagnosis problem, but scaling up
is no problem.

If you cannot get hold of this book, and would like to see the paper,
I can make it available in the neuroprose archive (unfortunately
without figures, but they are not needed to explain the method).  If
this is also difficult, I will send hard copies to interested parties.

Please send such requests directly to me (smieja at gmdzi.uucp) and I
will either reply directly or to the bboard.

	-Frank Smieja


From joachim at gmdzi.gmd.de  Tue Oct 29 12:57:47 1991
From: joachim at gmdzi.gmd.de (Joachim Diederich)
Date: Tue, 29 Oct 91 16:57:47 -0100
Subject: New Paper
Message-ID: <9110291557.AA14221@gmdzi.gmd.de>

The following paper has been placed in the Neuroprose archives at
Ohio  State.   The  file  is "diederich.hybrid.ps.Z." See ftp in-
structions below.


         Efficient Question Answering in a Hybrid System

           Joachim Diederich (1,2) & Debra L. Long (2)

 (1) German National Research Center for Computer Science (GMD)
               Schloss Birlinghoven, P.O. Box 1240
                  D-5205 St.Augustin 1, Germany

                  (2) Department of Psychology
                 University of California, Davis
                     Davis, CA 95616, U.S.A.


ABSTRACT:

A connectionist model for answering open-class questions  in  the
context of text processing is presented. The system answers ques-
tions from different question categories, such  as  "How,"  Why,"
and  "Consequence" questions. These question categories have been
identified in several empirical studies (Graesser & Clark,  1985;
Graesser,  1990). The system responds to a question by generating
a set of possible answers that are weighted  according  to  their
plausibility.  Search is performed by means of a massively paral-
lel, directed spreading activation process.  The  search  process
operates  on  several knowledge sources (i.e., connectionist net-
works) that are learned or explicitly built-in. Spreading activa-
tion involves the use of signature messages (Lange & Dyer, 1989).
Signature  messages  are  numeric  values  that  are   propagated
throughout  the  networks  and  identify  a  particular  question
category (this makes the system hybrid). Binder units  that  gate
the flow of activation between textual units receive these signa-
tures and change their states. That is, the binder  units  either
block the spread of activation or allow the flow of activation in
a certain direction. The process results in a pattern of  activa-
tion  that  represents a set of candidate answers based on avail-
able knowledge sources.

This paper will appear in the IJCNN-91 Singapore Proceedings.


unix> ftp archive.cis.ohio-state.edu
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get diederich.hybrid.ps.Z
ftp> quit
unix> uncompress diederich.hybrid.ps.Z
unix> lpr diederich.hybrid.ps


Joachim Diederich
German National Research Center for Computer Science (GMD)
P.O. Box 1240
D-5205 St. Augustin 1
Germany

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Tue Oct 29 15:23:46 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Tue, 29 Oct 91 15:23:46 EST
Subject: Robinson's vowel dataset 
In-Reply-To: Your message of Mon, 28 Oct 91 09:49:58 +0200.
             <9110280049.AA00241@hcrlgw.crl.hitachi.co.jp> 
Message-ID: <mailman.470.1149540234.24850.connectionists@cs.cmu.edu>


    Does anyone have any NEW results on Robinson's vowel dataset. I am aware of
    the original results given in his thesis:
    
    A. Robinson. "Dynamic Error Propagation Networks", PhD Thesis, Cambridge Univ
    1989.
    
I don't know of any more recent publications on this problem.  I got some
rather good results using Cascade-Correlation:

(train 300 300 25))
SigOff 0.10, WtRng 1.00, WtMul 1.00
OMu 2.00, OEps 1.00, ODcy 0.0300, OPat 12, OChange 0.010
IMu 2.00, IEps 10.00, IDcy 0.0300, IPat 8, IChange 0.030
Utype :SIGMOID, Otype :SIGMOID, RawErr NIL, Pool 32

Trial 0:  181 of 462 cases wrong, 281 right, 60.82% @ 23 hidden
Trial 1:  174 of 462 cases wrong, 288 right, 62.34% @ 11 hidden
Trial 2:  193 of 462 cases wrong, 269 right, 58.23% @ 24 hidden
Trial 3:  174 of 462 cases wrong, 279 right, 60.39% @ 15 hidden
Trial 4:  180 of 462 cases wrong, 282 right, 61.04% @ 24 hidden
Trial 5:  186 of 462 cases wrong, 276 right, 59.74% @ 17 hidden
Trial 6:  188 of 462 cases wrong, 274 right, 59.31% @ 11 hidden
Trial 7:  174 of 462 cases wrong, 288 right, 62.34% @ 15 hidden
Trial 8:  173 of 462 cases wrong, 289 right, 62.55% @ 13 hidden
Trial 9:  170 of 462 cases wrong, 292 right, 63.20% @ 18 hidden
Avg:      180 of 462 cases wrong, 282 right, 61.03% @ 17 hidden

The test set was run after each output training phase and the best value
obtained is the one reported.

The best results obtained by Robinson were 260 right (56%) for nearest
neighbor, and 253 right (55%) for 528 Gaussian nodes or 88 square nodes.
Backprop with 88 sigmoids never got better than  234 (51%).

I've never published these results, because I think they are a bit of a
cheat.  The problem is that I played around with the decay factor and other
parameters until I got good results on the test set.  It's not clear that
the same setting would give equally good performance on a new test set that
I had never seen.  Also, in all cases the algorithm obtained a solid level
of 59% or so, but then wandered up and down, in no particular pattern, as
new units were added.  I can get a good number -- up to 63% -- by grabbing
the best point on this random walk, but I don't honestly believe that the
network at that point would give equally good results on new test data
drawn from the same distribution.

What we really need is a much larger data set for this problem.  Then we
could split the set into training data (a larger set, offering much better
generalization), cross-validation data (used to determine when training
should stop), and final test data, never used in training.  The the current
set is so small that it's not possible to split things up this way.

-- Scott Fahlman

From kak at max.ee.lsu.edu  Tue Oct 29 16:36:52 1991
From: kak at max.ee.lsu.edu (Dr. S. Kak)
Date: Tue, 29 Oct 91 15:36:52 CST
Subject: No subject
Message-ID: <9110292136.AA01849@max.ee.lsu.edu>

	    CALL FOR PAPERS
	    
            Special Issue On

     NETWORKS FOR NEURAL PROCESSING
      

Circuits,  Systems, and  Signal  Processing

Guest Editors: W.A. Porter, University of Alabama, Huntsville
               S.C. Kak, Louisiana State University, Baton Rouge


Papers are solicited on the theoretical foundations, challenging
applications and efficient parallel architectures for neural computing.
Suggested topics include: training for generalization, use of higher
order moments, rapid training algorithms, nonbinary design, optimization
networks, and mapping networks. Papers which critique and/or compare
recent developments in neural computation are also of interest.

Papers should be prepared according to the Information for Contributors
on the inside back cover of Circuits, Systems, and Signal Processing.
Papers should be submitted in triplicate by January 20, 1992 in care of:

Professor William A. Porter
Department of Electrical and Computer Engineering
The Univesity of Alabama in Huntsville
Huntsville, AL 35899
[Tel. (205) 895-6858]
 
For further information contact Professor S.C. Kak at kak at max.ee.lsu.edu
or contact Professor W.A. Porter. 

From dlukas at PARK.BU.EDU  Tue Oct 29 13:34:49 1991
From: dlukas at PARK.BU.EDU (David Lukas)
Date: Tue, 29 Oct 91 13:34:49 -0500
Subject: Faculty position in Cognitive & Neural Systems at Boston University
Message-ID: <9110291834.AA29864@cns.bu.edu>


                       Assistant Professor 

                   Cognitive and Neural Systems 

                        Boston University 


Boston University seeks to hire a tenure track assistant professor 

starting in Fall 1992 for its graduate Department of Cognitive and 

Neural Systems. The Department offers an integrated curriculum 

offering the full range of psychological, neurobiological, and 

computational concepts, models, and methods in the fields 

of neural networks, computational neuroscience, parallel distributed 

processing, and biological information processing, in which Boston 

University is a leader. Candidates should have extensive analytic 

or computational research experience in modelling nonlinear neural

networks, especially in one or more of the areas: learning, speech

and language processing, adaptive pattern recognition, cognitive

information processing, and adaptive sensory-motor control. Send a 

complete curriculum vitae and three letters of recommendation to

Stephen Grossberg, Chairman, Search Committee, Department of Cognitive 

and Neural Systems, Room 240, 111 Cummington Street, Boston University, 

Boston, MA  02215, no later than January 1, 1992. Boston University 

is an Equal Opportunity/Affirmative Action employer.


If you have questions or require further information, please reply
to Carol Jefferson---caroly at cns.bu.edu.


From demers at cs.UCSD.EDU  Tue Oct 29 16:48:36 1991
From: demers at cs.UCSD.EDU (David DeMers)
Date: Tue, 29 Oct 91 13:48:36 PST
Subject: Generalization
Message-ID: <9110292148.AA15810@beowulf.ucsd.edu>


A short while back there was a discussion of generalization;
I recall contributions by Wolpert and Goldfarb, among others.
I didn't save the exchanges, however I'd like to look at
them now.  Unfortunately, I can't seem to connect up to the
archive to retrieve the mailings.  If anyone has most of the
discussion still lying around, I'd appreciate it if you
could mail it to me; also, I'd appreciate anyone's opinion on 
"what is generalization" in 250 words or less :-)

I do have most of David Wolpert's papers, so don't need
another copy of them...

Thanks for any help,

Dave


From hcard at ee.UManitoba.CA  Wed Oct 30 15:14:07 1991
From: hcard at ee.UManitoba.CA (hcard@ee.UManitoba.CA)
Date: Wed, 30 Oct 91 14:14:07 CST
Subject: batch learning
Message-ID: <9110302014.AA00760@card.ee.umanitoba.ca>


In the PDP books batch learning accumulates error derivatives from each pattern rather than simply their contributions to the total error, before making weight changes. It seems that gradient descent ought to add all the errors before taking any derivatives. Any comments?

Howard Card

From petsche at learning.siemens.com  Wed Oct 30 15:33:38 1991
From: petsche at learning.siemens.com (Thomas Petsche)
Date: Wed, 30 Oct 91 15:33:38 EST
Subject: NIPS travel (limited cheap airfare)
Message-ID: <9110302033.AA12077@learning.siemens.com>

FYI:  United has a special fare program available until tomorrow.  We
just booked a round trip from Newark to Denver (leave Monday morning
and return Sunday morning) for $250.


From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Tue Oct 29 22:59:19 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Tue, 29 Oct 91 22:59:19 EST
Subject: Resource Allocation Network (RAN) 
In-Reply-To: Your message of Mon, 28 Oct 91 13:54:54 -0500.
             <9110281854.AA20399@botsec1.bot.COM> 
Message-ID: <mailman.471.1149540234.24850.connectionists@cs.cmu.edu>


Have you tried E-mail?  I exchanged some mail with him a month or so ago:

John Platt <platt at synaptics.com>

-- Scott

From ruizdeangulo%ispgro.cern.ch at BITNET.CC.CMU.EDU  Wed Oct 30 04:55:26 1991
From: ruizdeangulo%ispgro.cern.ch at BITNET.CC.CMU.EDU (ruizdeangulo%ispgro.cern.ch@BITNET.CC.CMU.EDU)
Date: Wed, 30 Oct 91 10:55:26 +0100
Subject: batch-continous-one shot
Message-ID: <9110300955.AA03462@dxmint.cern.ch>

Referring to the batch-continous-one-shot learning discussion, in the reference
bellow we describe an algorithm that can be labeled as one-shot learning. I
think it fits well with the Plutowski and White method described recently.
 
 
>What we do (as reported in the tech report by Plutowski & White)
>is sequentially grow the training set, first finding
>an "optimal" training set of size 1, then fitting the network to this
>training set, appending the training set with a new exemplar selected from
>the set of available candidates, obtaining a training set of size 2 which
>is "approximately optimal",  fitting this set,  appending a third exemplar,
 etc,
>continuing the process until the network fit obtained by training over the
>exemplars fits the rest of the available examples within the desired tolerance.
 
 The MDL (Minimal disturbance learning) algorithm introduces a new exemplar
minimizing an estimation of the loss function (error increment) over the old
patterns.It makes a little search for this optimization but whatever the
stopping point(for this search), perfect recall of the new exemplar is gotten.
The network is not forced to assume any special kind of local-representation.
 
 
Ruiz de Angulo,V.,Torras, C.(1991) Minimally disturbing Learning. In the
proceedings of the IWANN 91.Springer Verlag
 

From edelman at wisdom.weizmann.ac.il  Thu Oct 31 04:08:00 1991
From: edelman at wisdom.weizmann.ac.il (Shimon Edelman)
Date: Thu, 31 Oct 91 11:08+0200
Subject: Resource Allocation Network (RAN)
In-Reply-To: <9110281854.AA20399@botsec1.bot.COM>
Message-ID: <19911031090807.2.EDELMAN@YAD.weizmann.ac.il>

A similar technique of RBF center allocation, in conjunction with 
other modifications of RBF learning, was successful in replicating
human performance in the difficult visual task of hyperacuity 
vernier discrimination. See AI Memo 1271, "Synthesis of visual
modules from examples: learning hyperacuity", by T. Poggio, 
M. Fahle and S. Edelman (January 1991). Center allocation is
discussed there on p.7.

-Shimon Edelman
edelman at wisdom.weizmann.ac.il

From dfausett at zach.fit.edu  Thu Oct 31 09:48:43 1991
From: dfausett at zach.fit.edu ( Donald W. Fausett)
Date: Thu, 31 Oct 91 09:48:43 -0500
Subject: What is current technology in Analog Neural Nets?
Message-ID: <9110311448.AA02454@zach.fit.edu>

	Prof. Bernard Widrow at Stanford University (EE Dept) would
be a likely source to stir you in the right direction.  Locally, you
might try Prof. Hal Brown at FIT (EE Dept).
	Good luck.  -- Don Fausett

From lissie!botsec7!botsec1!dcl at UUNET.uu.net  Thu Oct 31 10:13:26 1991
From: lissie!botsec7!botsec1!dcl at UUNET.uu.net (David Lambert)
Date: Thu, 31 Oct 91 10:13:26 EST
Subject: Resource Allocation Network (RAN)
Message-ID: <9110311513.AA24956@botsec1.bot.COM>


Hi.  Thanks to all respondents concerning my RAN question.
I managed to get in touch with John Platt, and he was most
helpful.

John Platt writes:

>   Someone forwarded me your posting on the connectionist mailing list..
> Could you please follow up, and say that you have successfully used
> RAN?  It would be nice to leave an impression of a working algorithm...

My sincere apologies for being lax in my courtesies, John.  You're
right, of course.

I got RAN working just fine, and it works as well (if not better than)
advertised.

To those who asked for a copy of the resulting code, I'll probably release
it sometime soon, through one mechanism or another.

Thanks again.

David Lambert
dcl at object.com or dcl at panix.com

From B344DSL at UTARLG.UTA.edu  Wed Oct  9 23:55:00 1991
From: B344DSL at UTARLG.UTA.edu (B344DSL@UTARLG.UTA.edu)
Date: Wed, 9 Oct 1991 22:55 CDT
Subject: Announcement and call for abstracts for Feb. conference
Message-ID: <01GBK4XORVOW000MGU@utarlg.uta.edu>

ANNOUNCEMENT AND CALL FOR ABSTRACTS

          WORKSHOP ON

OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS?

Sponsored by the Metroplex Institute for Neural Dynamics (MIND) and the Texas
SIG of the International Neural Network Society (INNS).  To be held at a loca-
tion to be announced in the Dallas-Fort Worth area, Thursday through Saturday,
February 6-8, 1992.

Confirmed speakers include:

	Stephen Grossberg (Boston University)
	Stephen Hampson (University of California, Irvine)
	Karl Pribram (Radford University)
	Harold Szu (Naval Surface Warfare Center)
	Graham Tattersall (University of East Anglia)

The focus of this conference will be twofold: (1) how to optimize different
aspects of neural and cognitive function and (2) whether particular natural or
artificial solutions to specific neural or cognitive problems are in fact opti-
mal.  Specific problems to which these optimality considerations are applied will be taken from many areas including goal direction and planning, adaptive cat-
egorization, sensory perception, and motor control.

The talks will be an hour each for invited speakers and 45 minutes each for contributed speakers, with time afterwards for questions.  Speakers will not be re-
quired to write a paper, but will be invited to contribute chapters to a book 
several months after the conference.  Books based on two previous MIND conferen-
ces  -- on Motivation, Emotion, and Goal Direction in Neural Networks and NeuralNetworks for Knowledge Representation and Inference -- are now being published
by Lawrence Erlbaum Associates.

Registration for the conference will be $80 for non-students, $20 for students,
with a $10 rebate for MIND or Texas SIG membership.  We will try to arrange for
discounted air fares from American Airlines as we have done in the past.  Those
interested in presenting should send me a short (1-3 paragraph) abstract by
December 1, 1991, using either e-mail, FAX, or snail mail.  Notification of ac-
ceptance will be given December 15, 1991.  We will not be holding parallel ses-
sions, so there are limitations on the number of speakers.  However, individu-
als who send high-quality abstracts that cannot be accommodated in actual talks
will have space to present their work in posters at the conference, and will 
also be invited to contribute to the book.

	Prof. Daniel S. Levine
	Department of Mathematics
	University of Texas at Arlington
	Arlington, TX 76019-0408 

	e-mail: b344dsl at utarlg.uta.edu
	FAX: 817-794-5802
	Telephone: 817-273-3598

From smagt at fwi.uva.nl  Wed Oct  2 04:28:55 1991
From: smagt at fwi.uva.nl (Patrick van der Smagt)
Date: Wed, 2 Oct 91 09:28:55 +0100
Subject: reprint announcement
Message-ID: <9110020828.AA20879@fwi.uva.nl>

I mentioned a paper some time ago about neural robotics control.
Popular demand made me decide to make it available by anonymous ftp
from neuroprose.
---------------------------------------------------------------------
The following reprint is available by ftp from the neuroprose
archive at archive.cis.ohio-state.edu:

        A real-time learning neural robot controller

                P. Patrick van der Smagt
                    Ben J. A. Kr\"ose

             Department of Computer Systems
                University of Amsterdam
                Kruislaan 403, 1098 SJ
               Amsterdam, The Netherlands


                        ABSTRACT

A neurally based adaptive controller for a 6 degrees of freedom
(DOF) robot manipulator with only rotary joints and a hand-held
camera is described.  The task of the system is to place the
manipulator directly above an object that is observed by the
camera (i.e., 2D hand-eye coordination).  The requirement of
adaptivity results in a system which does not make use of any
inverse kinematics formulas or other detailed knowledge of the
plant; instead, it should be self-supervising and adapt on-line.

The proposed neural system will directly translate the
preprocessed sensory data to joint displacements.  It controls
the plant in a feedback loop.  The robot arm may make a sequence
of moves before the target is reached, when in the meantime the
network learns from experience.  The network is shown to adapt
quickly (in only tens of trials) and form a correct mapping from
input to output domain.

Here's how to get the reprint from neuroprose:

        unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
        Name: anonymous
        Password: neuron
        ftp> cd pub/neuroprose
        ftp> binary
        ftp> get smagt.rtcontrol.ps.Z
        ftp> quit
        unix> uncompress smagt.rtcontrol.ps.Z
        unix> lpr smagt.rtcontrol.ps (or however you print postscript)

Questions or comments can be sent to me at:

        Patrick van der Smagt
        Department of Computer Systems
        University of Amsterdam
        Kruislaan 403, 1098 SJ
        Amsterdam, The Netherlands
        email: smagt at fwi.uva.nl
        fax: +31 20 525 7490
        phone: +31 20 525 7524

From LAUTRUP at nbivax.nbi.dk  Wed Oct  2 04:05:00 1991
From: LAUTRUP at nbivax.nbi.dk (Benny Lautrup)
Date: Wed, 2 Oct 1991 09:05 +0100 (NBI, Copenhagen)
Subject: preprint
Message-ID: <1F984C7800023236@nbivax.nbi.dk>


New preprint

    Uniqueness of Parisi's Scheme for Replica Symmetry Breaking

                  B. Lautrup
                  Computational Neural Network Center
                  The Niels Bohr Institute
                  Blegdamsvej 17
                  2100 Copenhagen, Denmark

   Abstract:
   Replica symmetry breaking in spin glass models is investigated
   using elements of the theory of permutation groups. It is shown
   how the various types of symmetry breaking gives rise to special
   algebras and that Parisi's scheme may be uniquely characterized
   by two simple conditions on these algebras, namely transposition
   symmetry and simple extensibility. An alternative to the Parisi
   scheme is shown to be unacceptable.


   The paper may be retrieved by anonymous ftp from

   nbibel.nbi.dk  (129.142.100.11)

   in the directory pub/neuroprose under the name

   lautrup.parisi.ps.Z

   It is a compressed postscript file.

Regards

Benny Lautrup

From mclennan at cs.utk.edu  Wed Oct  2 15:10:21 1991
From: mclennan at cs.utk.edu (mclennan@cs.utk.edu)
Date: Wed, 2 Oct 91 15:10:21 -0400
Subject: report available
Message-ID: <9110021910.AA12670@maclennan.cs.utk.edu>


    ** Please do not forward to other boards.  Thank you. **

The following technical report has been placed in the Neuroprose
archives at Ohio State.  Ftp instructions follow the abstract.
N.B.  The uncompressed file is long (2.07 MB), so you may have to
use the -s (symbolic link) option on lpr to print it.

      -----------------------------------------------------
      Gabor Representations of Spatiotemporal Visual Images

                         Bruce MacLennan
                   Computer Science Department
                     University of Tennessee
                       Knoxville, TN 37996
                      maclennan at cs.utk.edu

                   Technical Report CS-91-144

                            ABSTRACT:

We review Gabor's Uncertainty Principle and the limits it  places
on the representation of any signal.  Representations in terms of
Gabor elementary functions (Gaussian-modulated sinusoids), which
are optimal in terms of this uncertainty principle, are compared
with Fourier and wavelet representations.  We also review
Daugman's evidence for representations based on two-dimensional
Gabor functions in mammalian visual cortex.  We suggest three-
dimensional Gabor elementary functions as a model for motion
selectivity in complex and hypercomplex cells in visual cortex.
This model also suggests a computational role for low frequency
oscillations (such as the alpha rhythm) in visual cortex.

A preliminary version of this paper was presented at the workshop
``Foundational Methods for Behavioral and Computational Neurosci-
ences,'' Georgetown University, May 13-15, 1991.

      -----------------------------------------------------
                        FTP INSTRUCTIONS
Either use the Getps script, or do the following:

     unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary
     ftp> get maclennan.gabor.ps.Z
     ftp> quit
     unix> uncompress maclennan.gabor.ps.Z
     unix> lpr -s maclennan.gabor.ps (or however you print postscript)

If you need hardcopy, then send your request to:

                       library at cs.utk.edu

     Bruce MacLennan
     Department of Computer Science
     107 Ayres Hall
     The University of Tennessee
     Knoxville, TN 37996-1301

     (615)974-0994/5067
     FAX: (615)974-4404
     maclennan at cs.utk.edu

From M.Stannett at dcs.sheffield.ac.uk  Wed Oct  2 16:30:06 1991
From: M.Stannett at dcs.sheffield.ac.uk (M.Stannett@dcs.sheffield.ac.uk)
Date: Wed, 2 Oct 91 16:30:06 BST
Subject: Concurrent semantics
Message-ID: <9110021530.AA04587@sun5.dcs.sheffield.ac.uk>

Dear All,

		IF THIS MESSAGE ISN'T RELEVANT TO YOU,
		PLEASE PASS IT TO SOMEONE TO WHOM IT IS.

One of my major delights in computer science is the nature of concurrent
semantics, and especially the "non-interleaving" models like Mazurkiewicz
trace language and their analogues  (these are models which represent
so-called 'true' concurrency, rather than trying to flatten everything down
into sequences of actions). Nonetheless, I readily admit that the more
standard "interleaving" models are fascinating in their own right as well.
In any case, I'm certain we're all trying to solve the same problems, but
merely approaching them from slightly different angles - in ten years time,
we'll be wondering what all the disagreement was about ....

{{{ CONNECTIONISTS: concurrent semantics is concerned with working out
what complex concurrent systems are actually doing, and how properly to
represent their behaviour. Applying the standard sequential interpretations
to concurrent systems can sometimes lead to misleading results. Consequently,
I would argue that finding a deep understanding of the nature of complex
networks probably involves  exactly the same problems as are currently faced
by concurrent semantics theorists.  It might prove extremely fruitful to
see some colloborations between the two fields }}}

As far as I can work out, there seems to be only negligible contact between
the many groups working in the area.  I'd like to see  some sort of elecronic
forum for discussing ideas in the area - even if we can't work together, at
least we might be able to exchange ideas rapidly from time to time.

Please let me know if you'd be interested in joining in a sort of loosely
confederated "concurrency club" or whatever. Obviously, there's be no funding
to speak of, but then, given sufficient enthusiasm, we shouldn't need any.
(At least, not yet). Provided the task isn't TOO time-consuming, I'll happily
channel messages to interested parties for the time-being.

Thanks for reading!

Mike Stannett
( M.Stannett @ uk.ac.sheffield.dcs )


From et at eng.cam.ac.uk  Wed Oct  2 10:31:19 1991
From: et at eng.cam.ac.uk (E. Tzirkel-Hancock)
Date: Wed, 2 Oct 91 15:31:19 +0100
Subject: Technical Report Available
Message-ID: <24638.9110021431@tw700.eng.cam.ac.uk>

The following report has been placed in the neuroprose archives at
Ohio State University:

		    STABLE CONTROL OF NONLINEAR  
	           SYSTEMS USING NEURAL NETWORKS

		Eli Tzirkel-Hancock & Frank Fallside

	        Technical Report CUED/F-INFENG/TR.81 

	             Cambridge University
		    Engineering Department 
		      Trumpington Street 
		       Cambridge CB2 1PZ 
			    England 

                            Abstract

A neural network based direct control architecture is presented, 
that achieves output tracking for a class of continuous time nonlinear 
plants, for which the nonlinearities are unknown. The controller employs 
neural networks to perform approximate input/output plant linearization. 
The network parameters are adapted according to a stability principle. 
The architecture is based on a modification of a method previously 
proposed by the authors, where the modification comprises adding a 
sliding control term to the controller. This modification serves two 
purposes: first, as suggested by Sanner and Slotine, sliding control
compensates for plant uncertainties outside the state region where the 
networks are used, thus providing global stability; second, the sliding 
control compensates for inherent network approximation errors, hence 
improving tracking performance. 

A complete stability and tracking error convergence proof is given and 
the setting of the controller parameters is discussed. It is demonstrated 
that as a result of using sliding control, better use of the network's 
approximation ability can be achieved, and the asymptotic tracking error 
can be made dependent only on inherent network approximation errors and 
the frequency range of unmodeled dynamical modes. Two simulations are 
provided to demonstrate the features of the control method. 


************************ How to obtain a copy ************************

a) via FTP:

% ftp archive.cis.ohio-state.edu
..
Name (archive.cis.ohio-state.edu): anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get tzirkel.control_tr81.ps.Z
ftp> quit
% uncompress tzirkel.control_tr81.ps.Z
% lp         tzirkel.control_tr81.ps

b) via postal mail:

Request a hardcopy from

Eli Tzirkel, et at eng.cam.ac.uk
Speech Laboratory
Cambridge University Engineering Department 
Trumpington Street, Cambridge CB2 1PZ 
England 


From STIVA%IRMKANT.BITNET at vma.cc.cmu.edu  Thu Oct  3 11:41:47 1991
From: STIVA%IRMKANT.BITNET at vma.cc.cmu.edu (stefano nolfi)
Date: Thu, 03 Oct 91 11:41:47 EDT
Subject: Technical Report Available
Message-ID: <mailman.448.1149591234.29955.connectionists@cs.cmu.edu>


The following technical report is available.

Send request to STIVA at IRMKANT.BITNET

DO NOT REPLAY TO THIS MESSAGE

------------------------------------------------------------------------


                  Learning, Behavior, and Evolution

    Domenico Parisi          Stefano Nolfi          Federico Cecconi
                       Institute of Psychology
                             CNR - Rome
                    e-mail: stiva at irmkant.Bitnet


                               Abstract

We present simulations of  evolutionary processes operating on populations
of  neural networks  to  show  how  learning  and  behavior  can influence
evolution within a  strictly Darwinian framework. Learning  can accelerate
the  evolutionary process both  when  learning  tasks correlated  with the
fitness criterion  and  when random learning tasks  are used. Furthermore,
an ability  to learn  a task can emerge and  be transmitted evolutionarily
for both correlated and  uncorrelated tasks. Finally, behavior that allows
the individual to self-select the incoming stimuli can influence evolution
by  becoming  one  of  the  factors that determine the observed phenotypic
fitness on which  selective reproduction  is  based. For  all  the effects
demonstrated,  we  advance  a   consistent   explanation  in  terms  of  a
multidimensional weight space  for  neural networks, a fitness surface for
the evolutionary task,  and  a  performance surface for the learning task.


This paper will be presented at ECAL-91 - European Conference on Artificial
Life, December 1991, Paris.

From mre1 at it-research-institute.brighton.ac.uk  Thu Oct  3 09:20:50 1991
From: mre1 at it-research-institute.brighton.ac.uk (Mark Evans)
Date: Thu, 3 Oct 91 09:20:50 BST
Subject: IJCNN '91 Singapore - Request to share a room
Message-ID: <1583.9110030820@itri.bton.ac.uk>


I will be attending IJCNN '91 in Singapore on the 18-21 November where
I will be presenting a paper. I would be interested in hearing from
anyone who would like to share a twin room for the duration of the
conference. (I am about to book myself a room or I could pay you if you
have already booked a room.) I am PhD student at Brighton Polytechnic,
UK working in the field of computer vision and neural networks.

Anyone interested ?

#################################################
#                                               #
# M.R. Evans               mre1 at itri.bton.ac.uk #
# Research Assistant       mre1 at itri.uucp       #
#                                               #
# ITRI,                                         #
# Brighton Polytechnic,                         #
# Lewes Road,                                   #
# BRIGHTON,                                     #
# E. Sussex,                                    #
# BN2 4AT.                                      #
#                                               #
# Tel: +44 273 642915/642900                    #
# Fax: +44 273 606653                           #
#                                               #
#################################################


From kak at max.ee.lsu.edu  Thu Oct  3 10:38:55 1991
From: kak at max.ee.lsu.edu (Dr. S. Kak)
Date: Thu, 3 Oct 91 09:38:55 CDT
Subject: TR's available
Message-ID: <9110031438.AA14174@max.ee.lsu.edu>

Please send me a copy of your report.


Subhash Kak
Professor of Electrical & Computer Engineering
Louisiana State University
Baton Rouge, LA 70803-5901


From M.Stannett at dcs.sheffield.ac.uk  Fri Oct  4 16:12:17 1991
From: M.Stannett at dcs.sheffield.ac.uk (M.Stannett@dcs.sheffield.ac.uk)
Date: Fri, 4 Oct 91 16:12:17 BST
Subject: concurrent semantics mailing list
Message-ID: <9110041512.AA06164@sun5.dcs.sheffield.ac.uk>

Hello again!

A number of subscribers to CONNECTIONISTS have indicated they haven't come
across concurrent semantics (which may explain Chris Tofts' comments below).
I'll send you a quick summary of the subject area in a few days' time, and
try to show why it's relevant to connectionist researchers. 

Meanwhile ... two respondents have indicated that appropriate electronic fora
already exist for the discussion of concurrent semantics, while others
have demonstrated that (like me) they have no information about these
fora. Since there's no point setting up a third system in competition with
the other two I now know about, I enclose the details below. (If the other
are indeed distinct, perhaps they should consider merging ...)


---  Included message #1  ---

From: Miranda Mowbray <mjfm at com.hp.hpl.hplb>
Hello Mike,
            Yes, this is a very good idea [...] There is already a Concurrency
mailing list and archive, specially designed as a forum for rapid exchange of
ideas between different groups working in Concurrency.  It's been running for
some time now and I'm surprised you haven't heard of it.

It's run by Albert Meyer at MIT.  To join, send a message to 
concurrency at theory.lcs.mit.edu saying that you'd like to be on the mailing 
list.  You'll get information about archive files available.

This is a high quality forum and I recommend joining.  I also recommend
that you tell anyone else who replies to your message and wants to be 
in a concurrency club.  I don't see why you should go to the trouble of
setting up your own separate club when one already exists, unless your
version has specific local interests which are not catered for by Albert
Meyer's;  in any case what you *mustn't* do is set up a second forum which
will keep people ignorant of the first, after all the whole point is
to get everyone together!

Thankyou for your public-spiritedness,
                                       Yours,
                                              Miranda.


--- Included message(s) # 2/3 ---

From: Chris Tofts <cmnt at uk.ac.ed.dcs>
Subject: Re: Concurrent semantics

Hi Mike,
 interesting idea, at a symposium on complex systems in the states last year
I suggested using ideas from algebraic concurrency theory to a collection
of people working in neural nets etc, they not only seemed remarkable
uniterested but failed to see any link. It seems that any connections (sic)
will have to be exposed from the theoretical side. There already exists
a news group for concurrency which is used, are you suggesting something
other than this??
All the best,
Chris.

From: C.Tofts at uk.ac.bath.gdr

I believe its mail.concurrency, at least that's what its called in edinburgh.
Ask your local news guru,
All the best,
Chris.
--- End of included messages ---


From wray at ptolemy.arc.nasa.gov  Fri Oct  4 19:11:01 1991
From: wray at ptolemy.arc.nasa.gov (Wray Buntine)
Date: Fri, 4 Oct 91 16:11:01 PDT
Subject: tree classification code available for comparative studies
Message-ID: <9110042311.AA01252@ptolemy.arc.nasa.gov>


I've made the following report available on the Neuroprose Archive
(cheops.cis.ohio-state.edu) as
		buntine.treecode.ps.Z
not because I think connectionists are "deeply" interested in tree learning
research but because I think it would be a handy resource for
comparative studies:
	1)    systems such as CART/C4 are recognised programs
	      for benchmarking supervised learning systems against
	2)    home-grown reimplementations can be buggy and a timesink
	3)    if your problem has some inherent structure and a few
	      key indicator variables then trees may be a good thing to
	      try as well
	4)    trees typically don't work well with purely numeric
	      data or with problems with many variables all giving some
	      minor contribution to the prediction being made

The IND Tree Package we developed here incorporates some of early C4,
most of the classification trees component of CART (no regression) along
with some more recent Bayesian/MDL approaches that sometimes work better.

You can obtain LaTeX source for the following introductory report
if you email to:

	ind at kronos.arc.nasa.gov

and ask for "About the IND Tree Package".

---------------------------------------
	
	About the IND Tree Package
	
	Wray Buntine, RIACS 
        NASA Ames Research Center 
        Mail Stop 269-2 
        Moffet Field, CA 94035 

 	September 29, 1991

   This note introduces the IND Tree Package to prospective procurers and
   those users/installers looking at IND for the first time.
   IND does supervised learning using classification trees.
   IND integrates features from Breiman {\it et al.}'s CART and Quinlan's C4 
   with newer Bayesian and minimum encoding methods for growing classification
   trees, and provides an experimental control suite on top.  The package comes
   with a manual, ``man'' entries, and a guide to tree methods and research.
   Information about obtaining IND, performance statistics,
   documentation, authorship, copyright, installation, etc., are given.
   IND is currently under development, although it has been used considerably
   since late 1989.  IND is implemented in C under UNIX.

----------------------------------------

Wray Buntine
RIACS (Research Inst. for Advanced Comp. Sc.)
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 244-17                          fax:    (415) 604 6997
Moffett Field, CA, 94035 		  email:  wray at ptolemy.arc.nasa.gov

From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU  Sun Oct  6 09:31:00 1991
From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU)
Date: Sun, 6 Oct 91 09:31 U
Subject: Thank's for help.
Message-ID: <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU>


From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU  Sun Oct  6 09:33:00 1991
From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU)
Date: Sun, 6 Oct 91 09:33 U
Subject: Thank's for help
Message-ID: <01GBEHHAOVDCD7QHLX@BITNET.CC.CMU.EDU>


From tackett at ipla00.dnet.hac.com  Sun Oct  6 00:08:02 1991
From: tackett at ipla00.dnet.hac.com (Walter Alden Tackett)
Date: Sun, 6 Oct 91 00:08:02 EDT
Subject: tree classification code available for comparative studies
Message-ID: <9110060708.AA10023@ipla00.ipl.hac.com>

Wray Buntine <wray at ptolemy.arc.nasa.gov> writes:
> not because I think connectionists are "deeply" interested in tree learning

...only in *dendritic* trees, maybe? ;-)
-wt

From aboulang at BBN.COM  Sun Oct  6 11:40:36 1991
From: aboulang at BBN.COM (aboulang@BBN.COM)
Date: Sun, 6 Oct 91 11:40:36 EDT
Subject: Detailed Balance
In-Reply-To: 7923509%TWNCTU01.BITNET@bitnet.cc.cmu.edu's message of Sun, 6 Oct 91 09:31 U <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU>
Message-ID: <mailman.449.1149591234.29955.connectionists@cs.cmu.edu>


   The property (2) is called detailed balance resulting in a Gibbs
   distribution for the probability to find the system in a particular
   state. The rule (1) is an update procedure for the spin Sk which
   ensure detailed balance provided that E is an energy. Both principles
   are fundamental facts of statistical mechanics of neural networks
   (or if you prefer result from an maximum entropy analysis of neural
   nets). The book by Hertz Krogh and Palmer summerizes all that in
   a nice way. The book title is "Introduction to Neural Computation".


We really should be saying that detailed balance in sampling implies a
Gibbs distribution, but that the Gibbs distribution does not imply the
use of a sampling procedure with detailed balance. There is some new
work on this:

J. Marroquin & A. Ramerez
"Stochastic Cellular Automata with Gibbsian Invariant Measures"
IEEE Trans Information Theory May(*), 1991

* I can't find the paper so I may have the month wrong.

This is potentially good news to people trying to get annealing-type
algorithms to work for fine-grained MIMD parallelism.

Regrads,
Albert Boulanger
aboulanger at bbn.com

From M.Stannett at dcs.sheffield.ac.uk  Sun Oct  6 00:10:13 1991
From: M.Stannett at dcs.sheffield.ac.uk (Mike Stannett)
Date: Sun, 6 Oct 91 00:10:13 BST
Subject: summary of concurrent semantics
Message-ID: <9110052310.AA15255@dcs.sheffield.ac.uk>


((This message is just over two pages of A4 long))

A very brief (incomplete) summary of concurrent semantics
---------------------------------------------------------

	(This description reflects my personal bias towards
	trace models; I apologise in advance to anyone who
	feels I've given an unbalanced account of the field.)

You will recall Russell's demonstration that mathematics early this
century was built on very dodgy ground. The search was on, and still
is, for a formal theory of mathematics itself - why is it sensible to
discuss some sets but not others?

This purely mathematical problem led directly to many aspects of
computer science that are now taken for granted. For example, Skolem
(c. 1934) realised that the derivation of Russell's paradox could be
avoided by introducing the notion of definition-by-recursion.
Meanwhile, Church was developing the lambda-calculus, Post was working
on his production systems, and Turing was introducing his machine
models and computational AI.

As a result, there is a wealth of structure available for discussing
the underlying nature of computational processes themselves. This is
essential in some cases. For example, we need to ensure that the code
we produce will generate the same behaviour when compiled on two
different systems; consequently, we need some way of describing the
semantics of this code (i.e. what it's supposed to mean) which is
machine-independent. There are several approaches to this problem, with
perhaps the most mathematical being 'denotational semantics', under
which all programs can be regarded as functional - a program becomes a
function which maps abstract 'inputs' to abstract 'outputs'.

For concurrent systems, this 'functional' view is insufficient. A
standard example concerns the use of shared variables: from a purely
sequential point of view, the two programs

	prog1:  x=0; x++; x++
	prog2:  x+0; x+=2

are identical, since they implement the same overall function. From the
concurrent point of view, they are NOT identical, because they can
interact with a third process in different ways. For example, if we run
first prog1 and then prog2 in the context of

	prog3:  x=10;

then the possible values of x on termination of the combined systems
are different

	prog1 | prog3  :  2, 10, 11, 12, error
	prog2 | prog3  :  2, 10, 12, error

depending on precisely when prog3 gets executed.

Accordingly, much of concurrent semantics is based on the idea that
processes should be regarded as active agents which interact with each
other. For example, we would reject the notion that the variable x is
just a passive entity which is operated upon; instead it becomes an
agent in its own right, which interacts with the processes that update
it.

Many solutions to the problem of correctly representing the semantics
of concurrent systems have been developed, and can be roughly divided
into two 'schools' - interleaving and non-interleaving. According to
the interleaving version, the sequences of activities that might be
performed by two systems running concurrently are just the
interleavings of the sequences for the systems taken individually. This
is the approach adopted in (the standard theories of) CCS and CSP. The
non-interleaving school argues that this representation is
inappropriate, and indeed unnecessary, since models of 'true'
concurrency are easy to develop (e.g. Petri nets). In the middle
ground, there are models such as 'Mazurkiewicz trace theory' which
consider the behaviour of a concurrent system to be represented by the
collection of ALL its possible action-sequences (rather than accepting
the notion that any one of these traces will do as a valid
representation). Nor is this a complete list of the approaches used;
for example, there is a growing tendency to use models based on
category theory and general topology, but I can't reasonably include
these in a short summary (besides, I don't know enough about them to
represent them accurately).

The key differences between the different approaches are in the way
they treat the relationship between time and causality. Given that we
are trying to describe a system based on the possible observations of
its behaviour, we have to be careful when we impute relationships that
may not exist. It may just happen, for example, that one event in a
system is always followed by another - but this doesn't mean that they
are causally related. Sometimes this doesn't matter, but problems can
arise when we introduce additional processes with which to interact. It
becomes very difficult to work out precisely how the models of
individual processes should be 'stuck together' to get a valid model of
the combined system.

Presumably this problem is reflected in difficulties faced by
connectionists in deciding what happens when large nets are considered
to be made up of more manageable sub-nets. Do you have a general theory
yet for deciding

	* what process is computed by a given net ?
	* what process is computed by a given combination of
	  smaller nets ?

If not, perhaps our two different disciplines could benefit from
talking to one another.


Some sources
============

Probably the best sources for results in semantics and concurrency are
the many volumes of the "Lecture Notes in Computer Science" series
from Springer-Verlag. In addition,


CCS:  The standard text is
	Robin Milner 1989 Communication and Concurrency
	Prentice-Hall International

CSP:  The standard text is
	C.A.R. (Tony) Hoare 1985 Communicating Sequential Processes
	Prentice Hall International

A good collection of papers that demonstrates the relationships
between the many approaches to concurrent semantics is

	Kwiatkowska M.Z., Shields M.W, and Thomas R.M. (eds)
	Semantics for concurrency, Leicester 1990
	BCS/Springer 'Workshops in Computing'
	ISBN 3-540-19625-0

I've also got a couple of recent tech. reports concerning
generalisations of trace theory for those who want them, but be
warned that these are of a highly technical nature, and may not be
of much relevance to you just yet. These are

	Kwiatkowska M.Z. and Stannett M.
	On transfinite traces
	CS-91-06

	Stannett M.
	Trace convergence over infinite alphabets
	CS-91-08

Best wishes,   Mike Stannett.

From rba at vintage.bellcore.com  Mon Oct  7 15:23:15 1991
From: rba at vintage.bellcore.com (Bob Allen)
Date: Mon, 7 Oct 91 15:23:15 -0400
Subject: No subject
Message-ID: <9110071923.AA12445@vintage.bellcore.com>

Subject:  Student Travel Grants for NIPS'91

Modest financial support for travel to the Neural Information Processing Systems 
(NIPS, Denver Dec 2-5, 1991) conference is available
to students and other young researchers who are active in neural networks research.
Those requesting support should send a one-page summary of their background and
research interests, a cirriculum vitae, and their email address to:

Dr. R.B. Allen
NIPS Treasurer
Bellcore
MRE 2A-367
445 South Street
Morristown, NJ  07960-1910

Travel grant check for those receiving awards will be available at the
conference registration desk.


From 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU  Tue Oct  8 13:00:00 1991
From: 7923509%TWNCTU01.BITNET at BITNET.CC.CMU.EDU (7923509%TWNCTU01.BITNET@BITNET.CC.CMU.EDU)
Date: Tue, 8 Oct 91 13:00 U
Subject: Thank's
Message-ID: <01GBHGORN4ZKD7Q01U@BITNET.CC.CMU.EDU>


From aboulang%BBN.COM at CARNEGIE.BITNET  Sun Oct  6 11:40:36 1991
From: aboulang%BBN.COM at CARNEGIE.BITNET (aboulang%BBN.COM@CARNEGIE.BITNET)
Date: Sun, 6 Oct 91 11:40:36 EDT
Subject: Detailed Balance
In-Reply-To: 7923509%TWNCTU01.BITNET@bitnet.cc.cmu.edu's message of Sun, 6 Oct
 91 09:31 U <01GBEGUMTIJKD7QHLX@BITNET.CC.CMU.EDU>
Message-ID: <01GBFAUK6QK0D7QISN@BITNET.CC.CMU.EDU>

We really should be saying that detailed balance in sampling implies a
Gibbs distribution, but that the Gibbs distribution does not imply the
use of a sampling procedure with detailed balance. There is some new
work on this:
 
J. Marroquin & A. Ramerez
"Stochastic Cellular Automata with Gibbsian Invariant Measures"
IEEE Trans Information Theory May(*), 1991
 
* I can't find the paper so I may have the month wrong.
 
This is potentially good news to people trying to get annealing-type
algorithms to work for fine-grained MIMD parallelism.
 
Regrads,
Albert Boulanger
aboulanger at bbn.com

From PAR%DM0MPI11.BITNET at BITNET.CC.CMU.EDU  Tue Oct  8 12:03:11 1991
From: PAR%DM0MPI11.BITNET at BITNET.CC.CMU.EDU (Pal Ribarics)
Date: Tue, 08 Oct 91 16:03:11 GMT
Subject: NN Workshop
Message-ID: <01GBI4QH7740D7POVG@BITNET.CC.CMU.EDU>

*******************************************************************************
 
Dear Colleague ,
 
we would like to remind you of the deadline for sending abstracts to
the topical workshop on Neural Networks within the
 
  Second International Workshop on Software Engineering, Artificial
  Intelligence and Expert Systems for High Energy and Nuclear Physics
 
Talks will be selected by the Organizing Committee on the basis of a
detailed abstract to be submitted before:
 
                      15 October, 1991.
 
to the address below. You will also find a registration form which
was sent to you in a prior mail.
 
Best regards
 
B. Denby
C. Kiesling
C. Peterson
P. Ribarics
========================================================================
 
 
                    SECOND INTERNATIONAL WORKSHOP ON
 
              SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE
 
          AND EXPERT SYSTEMS FOR HIGH ENERGY AND NUCLEAR PHYSICS
 
                                1992
 
                          January 13 - 18
 
            L'AGELONDE  FRANCE-TELECOM  LA LONDE LES MAURES
 
                         BP 64      F-83250
 
 
                            REGISTRATION
 
 
NAME:                                     FIRSTNAME:
 
LABORATORY:                               COUNTRY
 
ADDRESS:
 
TEL:             FAX:                 TELEX:              E-MAIL:
 
 
HOTEL RESERVATION (Number of persons):
 
 
In the following you are expected to answer with the corresponding number
or character from the list above.
However if your interest is not mentioned in the list give a full
description.
 
 
WOULD YOU BE INTERESTED TO JOIN A WORKING GROUP OF THE ASTEC PROJECT ?
 
                       YES/NO
 
GROUP:
 
SUBGROUP:
 
 
WOULD YOU LIKE TO ATTEND TOPICAL WORKSHOPS OR TUTORIALS ?
 
 
WORKSHOPS:
 
 
TUTORIALS:
 
 
WOULD YOU LIKE TO PRESENT A TALK ?
 
                       YES/NO
 
TALK TITLE:
 
 
To be considered by the organizing committee, send an extended abstract
 
                     before Oct. 15, 1991 to:
 
 
 Michele Jouhet                 Marie-claude Fert
   CERN                          L.A.P.P. - IN2P3
  PPE-ADM                        B.P. 110
CH-1211 Geneve 23                F-74941 Annecy-Le-Vieux
            SWITZERLAND                        FRANCE
 
Tel:    (41) 22 767 21 23      Tel: (33) 50 23 32 45
Fax:    (41) 22 767 65 55      Fax: (33) 50 27 94 95
Telex:  419 000                Telex: 385 180 F
E-mail: jouhet at CERNVM
 
 
Workshop fee  :    700 FFr.               Student :   500 FFr.
 
Accommodation :   2000 FFr.    Accompagning Person: +1200 FFr.
 
To be paid by check:
   Title:    International Workshop
             CREDIT LYONNAIS/Agence Internationale
   Bank:     30002
   Guichet:  1000
   Account:  909154 V
   Address:  LYON REPUBLIQUE
 
The accommodation includes: hotel-room, breakfast, lunch and dinner
for 6 days.
 
Tennis, mountain bike and other activities will be available.
 
 
Denis Perret-Gallix Tel: (41) 22 767 62 93 E-mail: Perretg at CERNVM
                    Fax: (41) 22 782 89 23

From squires at cs.wisc.edu  Wed Oct  9 03:22:37 1991
From: squires at cs.wisc.edu (Charles Squires)
Date: Wed, 9 Oct 91 02:22:37 -0500
Subject: 3 reports available
Message-ID: <9110090722.AA17071@mozzarella.cs.wisc.edu>


           *** PLEASE DO NOT FORWARD TO OTHER LISTS ***

The following three working papers have been placed in the neuroprose
archive:

-Maclin, R. and Shavlik, J.W., Refining Algorithms with Knowledge-Based
 Neural Networks:  Improving the Chou-Fasman Algorithm for Protein Folding,
 Machine Learning Research Group Working Paper 91-2.

      Neuroprose file name:  maclin.fskbann.ps.Z

-Scott, G.M., Shavlik, J.W., and Ray, W.H., Refining PID Controllers
 using Neural Networks, Machine Learning Research Group Working Paper 91-3.

      Neuroprose file name:  scott.nnpid.ps.Z

-Towell, G.G. and Shavlik, J.W., The Extraction of Refined Rules from
 Knowledge-Based Neural Networks, Machine Learning Research Group Working
 Paper 91-4.

      Neuroprose file name:  towell.interpretation.ps.Z

The abstract of each paper and ftp instructions follow:

----------

    Refining Algorithms with Knowledge-Based Neural Networks:
     Improving the Chou-Fasman Algorithm for Protein Folding

                         Richard Maclin
                         Jude W. Shavlik

                     Computer Sciences Dept.
                University of Wisconsin - Madison
                    email: maclin at cs.wisc.edu


     We describe a method for using machine  learning  to  refine
algorithms represented as generalized finite-state automata.  The
knowledge in an automaton  is  translated  into  a  corresponding
artificial   neural   network,   and  then  refined  by  applying
backpropagation  to  a  set  of  examples.   Our  technique   for
translating  an  automaton  into  a  network  extends  the  KBANN
algorithm, a system that translates a set of propositional,  non-
recursive   rules  into  a  corresponding  neural  network.   The
topology and weights of the neural network are set  by  KBANN  so
that  the  network  represents  the  knowledge  in the rules.  We
present the extended system, FSKBANN, which  augments  the  KBANN
algorithm  to handle finite-state automata.  We employ FSKBANN to
refine the Chou-Fasman algorithm, a  method  for  predicting  how
globular  proteins  fold.   The  Chou-Fasman  algorithm cannot be
elegantly  formalized  using  non-recursive  rules,  but  can  be
concisely  described  as  a  finite-state  automaton.   Empirical
evidence shows that the refined  algorithm  FSKBANN  produces  is
statistically  significantly more accurate than both the original
Chou-Fasman algorithm and a  neural  network  trained  using  the
standard  approach.   We also provide extensive statistics on the
type of errors each of the three approaches makes and discuss the
need  for better definitions of solution quality for the protein-
folding problem.

----------

              Refining PID Controllers using Neural Networks

                   Gary M. Scott (Chemical Engineering)
                   Jude W. Shavlik (Computer Sciences)
                   W. Harmon Ray (Chemical Engineering)
                     University of Wisconsin


     The  KBANN  (Knowledge-Based  Artificial  Neural   Networks)
approach  uses  neural  networks  to refine knowledge that can be
written in the form of simple  propositional  rules.   We  extend
this idea further by presenting the MANNCON (Multivariable Artif-
icial Neural Network Control) algorithm by which the mathematical
equations governing a PID (Proportional-Integral-Derivative) con-
troller determine the topology and initial weights of a  network,
which  is  further  trained using backpropagation.  We apply this
method to the task of controlling the outflow and temperature  of
a water tank, producing statistically- significant gains in accu-
racy over both a standard neural  network  approach  and  a  non-
learning PID controller.  Furthermore, using the PID knowledge to
initialize the weights of the network produces statistically less
variation  in testset accuracy when compared to networks initial-
ized with small random numbers.

----------

     The Extraction of Refined Rules from Knowledge-Based Neural Networks

                         Geoffrey G. Towell
                           Jude W. Shavlik
            
                   Department of Computer Science
                      University of Wisconsin
                 E-mail Address: towell at cs.wisc.edu

Neural networks, despite their empirically-proven abilities, have been little
used for the refinement of existing knowledge because this task requires a
three-step process. First, knowledge in some form must be inserted into a
neural network. Second, the network must be refined. Third, knowledge must be
extracted from the network. We have previously described a method for the
first step of this process. Standard neural learning techniques can accomplish
the second step. In this paper, we propose and empirically evaluate a method
for the final, and possibly most difficult, step. This method efficiently
extracts symbolic rules from trained neural networks. The four major results
of empirical tests of this method are that the extracted rules:
(1) closely reproduce (and can even exceed) the accuracy
    of the network from which they are extracted; 
(2) are superior to the rules produced by
    methods that directly refine symbolic rules; 
(3) are superior to those produced by 
    previous techniques for extracting rules from trained neural networks;
(4) are ``human comprehensible.'' 
Thus, the method demonstrates that neural networks can be an effective tool
for the refinement of symbolic knowledge.  Moreover, the rule-extraction
technique developed herein contributes to the understanding of how symbolic
and connectionist approaches to artificial intelligence can be profitably
integrated.

----------

FTP Instructions:

     unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary

     ftp> get maclin.fskbann.ps.Z
OR...     get scott.nnpid.ps.Z
OR...     get towell.interpretation.ps.Z

     ftp> quit

     unix> uncompress maclin.fskbann.ps.Z
OR...      uncompress scott.nnpid.ps.Z
OR...      uncompress towell.interpretation.ps.Z

     unix> lpr maclin.fskbann.ps
OR...      lpr scott.nnpid.ps
OR...      lpr towell.interpretation.ps
           (or use whatever command you use to print PostScript)


From danielg at cogs.sussex.ac.uk  Wed Oct  9 07:07:27 1991
From: danielg at cogs.sussex.ac.uk (Daniel Glaser)
Date: Wed, 9 Oct 91 12:07:27 +0100
Subject: Restrictions on recurrent learning
Message-ID: <29747.9110091107@rsunx.cogs.susx.ac.uk>

I have been working on some simple recurrent networks as defined by
Jordan(1986) and Elman(1990), and am interested in the class of temporal
regularities that they can learn. In particular, how do they compare
with more general back propagation through time defined by the PDP
group(1986) and Werbos(1990) ?

In the Jordan/Elman nets, activation flows forward in time from
`copies' of units from previous cycles, and thus, during learning,
error only propagates backwards locally in time.

Does anyone know of any theoretical or empirical work on what these
different types of network can learn ?

If replies are addressed to me personally, I will post a summary in
due course.

Thanks
 
Daniel.

References:

Elman, J.~L. (1990).
Finding structure in time.
{\em Cognitive Science}, {\bf 14}:179--211.

Jordan, M.~I. (1986).
Attractor dynamics and parallelism in a connectionist sequential
 machine.
In {\em Proceedings of the Eighth Annual Meeting of the Cognitive
Science Society}, Hillsdale, NJ. Erlbaum.

Rumelhart, D.~E., McClelland, J.~L., \& Williams, R.~J. (1986).
Learning internal representations by error propagation.
In D.~E. Rumelhart \& J.~L. McClelland (Eds.), {\em Parallel
  Distributed Processing: Explorations in the Microstructure of Cognition},
  volume~1  chapter~8. Cambridge, MA: MIT Press/Bradford Books.

Werbos, P.~J. (1990).
Backpropagation through time: What it does and how to do it.
{\em Proceedings of the IEEE}, 78(10):1550--1560.


From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Wed Oct  9 14:05:33 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Wed, 09 Oct 91 14:05:33 -0400
Subject: Recurrent Cascade-Correlation Code
Message-ID: <mailman.450.1149591235.29955.connectionists@cs.cmu.edu>


Simulation code for the Recurrent Cascade-Correlation (RCC) algorithm,
previously available only in Common Lisp, has now been translated into C by
Conor Doherty of the University College of Dublin (Ireland).  This code is a
modification of the C program for original Cascade-Correlation, written by
Scott Crowder of Carnegie Mellon.  My thanks to Conor and Scott for their
help in making these programs available to the barbarian hordes who speak
only C.

For a description of this algorithm, see Scott E. Fahlman, "The Recurrent
Cascade-Correlation Architecture" in Advances in Neural Information
Processing Systems 3, edited by R. P. Lippmann, J. E. Moody, and D. S.
Touretzky, Morgan Kaufmann Publishers, 1991.  Alternatively, see the tech
report mentioned below.

The instructions for accessing any of this code via FTP are included at the
end of this message.

Scott E. Fahlman
School of Computer Science
Carnegie Mellon University

===========================================================================

Public-domain simulation programs for the Quickprop, Cascade-Correlation,
and Recurrent Cascade-Correlation learning algorithms are available via
anonymous FTP on the Internet.  This code is distributed without charge on
an "as is" basis.  There is no warranty of any kind by the authors or by
Carnegie-Mellon University.

Instructions for obtaining the code via FTP are included below.  If you
can't get it by FTP, contact me by E-mail (sef+ at cs.cmu.edu) and I'll try
*once* to mail it to you.  Specify whether you want the C or Lisp version.
If it bounces or your mailer rejects such a large message, I don't have
time to try a lot of other delivery methods.

I am maintaining an E-mail list of people using this code so that I can
notify them of any changes or problems that occur.  I would appreciate
hearing about any interesting applications of this code, and will try to
help with any problems people run into.  Of course, if the code is
incorporated into any products or larger systems, I would appreciate an
acknowledgement of where it came from.

If for some reason these programs do not work for you, please contact me
and I'll try to help.  Common errors: (1) Some people don't notice that the
symmetric sigmoid output units in cascor have a range of -0.5 to +0.5 (for
reasons that are mostly historical).  If you try to force this algorithm to
produce an output of +1.0 or +37.3, it isn't going to work.  (2) Note that
quickprop (which is used inside of Cascade-Correlation) is designed to
update the weights after every epoch, and it assumes that all the epochs
are identical.  If you try to run this code updating after every training
case, you will lose badly.  If you want to change the training set, it is
important to zero out the PREV-SLOPES and DELTAS vectors, and also to
re=build the caches in Cascade-Correlation.

HOW TO GET IT:

For people (at CMU, MIT, and soon some other places) with access to the
Andrew File System (AFS), you can access the files directly from directory
"/afs/cs.cmu.edu/project/connect/code".  This file system uses the same
syntactic conventions as BSD Unix: case sensitive names, slashes for
subdirectories, no version numbers, etc.  The protection scheme is a bit
different, but that shouldn't matter to people just trying to read these
files.

For people accessing these files via FTP:

1. Create an FTP connection from wherever you are to machine
"pt.cs.cmu.edu".  The internet address of this machine is 128.2.254.155,
for those who need it.

2. Log in as user "anonymous" with your own ID as password.  You may see an
error message that says "filenames may not have /.. in them" or something
like that.  Just ignore it.

3. Change remote directory to "/afs/cs/project/connect/code".  NOTE: you
must do this in a single operation.

4. At this point FTP should be able to get a listing of files in this
directory with DIR and fetch the ones you want with GET.  (The exact FTP
commands you use depend on your local FTP server.)

Current contents:

quickprop1.lisp		Original Common Lisp version of Quickprop.
quickprop1.c		C version by Terry Regier, U. Cal. Berkeley.
cascor1.lisp		Original Common Lisp version of Cascade-Correlation.
cascor1.c		C version by Scott Crowder, Carnegie Mellon
rcc1.lisp		Common Lisp version of Recurrent Cascade-Correlation.
rcc1.c			C version, trans. by Conor Doherty, Univ. Coll. Dublin
vowel.c			Code for Tony Robinson's vowel benchmark.
am4.tar.Z		Aspirin/Migraine code from MITRE.
backprop.lisp		Overlay for quickprop1.lisp.  Turns it into backprop.
---------------------------------------------------------------------------
Tech reports describing these algorithms can also be obtained via FTP.
These are Postscript files, processed with the Unix compress/uncompress
program.

Follow the steps for FTP access as above, but cd to directory

unix> ftp pt.cs.cmu.edu (or 128.2.254.155)
Name: anonymous
Password: <your user id>
ftp> cd /afs/cs/project/connect/tr
ftp> binary
ftp> get filename.ps.Z
ftp> quit
unix> uncompress filename.ps.Z
unix> lpr filename.ps   (or however you print postscript files)

For "filename", sustitute the following:

cascor-tr		Cascade-Correlation paper.
qp-tr			Paper on Quickprop and other backprop speedups.
rcc-tr			Recurrent Cascade-Correlation paper.
precision		Hoehfeld-Fahlman paper on Cascade-Correlation with
			limited numerical precision.


From B344DSL at UTARLG.UTA.EDU  Wed Oct  9 23:55:00 1991
From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU)
Date: Wed, 9 Oct 1991 22:55 CDT
Subject: Announcement and call for abstracts for Feb. conference
Message-ID: <01GBK4XORVOW000MGU@utarlg.uta.edu>

ANNOUNCEMENT AND CALL FOR ABSTRACTS

          WORKSHOP ON

OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS?

Sponsored by the Metroplex Institute for Neural Dynamics (MIND) and the Texas
SIG of the International Neural Network Society (INNS).  To be held at a loca-
tion to be announced in the Dallas-Fort Worth area, Thursday through Saturday,
February 6-8, 1992.

Confirmed speakers include:

	Stephen Grossberg (Boston University)
	Stephen Hampson (University of California, Irvine)
	Karl Pribram (Radford University)
	Harold Szu (Naval Surface Warfare Center)
	Graham Tattersall (University of East Anglia)

The focus of this conference will be twofold: (1) how to optimize different
aspects of neural and cognitive function and (2) whether particular natural or
artificial solutions to specific neural or cognitive problems are in fact opti-
mal.  Specific problems to which these optimality considerations are applied will be taken from many areas including goal direction and planning, adaptive cat-
egorization, sensory perception, and motor control.

The talks will be an hour each for invited speakers and 45 minutes each for contributed speakers, with time afterwards for questions.  Speakers will not be re-
quired to write a paper, but will be invited to contribute chapters to a book 
several months after the conference.  Books based on two previous MIND conferen-
ces  -- on Motivation, Emotion, and Goal Direction in Neural Networks and NeuralNetworks for Knowledge Representation and Inference -- are now being published
by Lawrence Erlbaum Associates.

Registration for the conference will be $80 for non-students, $20 for students,
with a $10 rebate for MIND or Texas SIG membership.  We will try to arrange for
discounted air fares from American Airlines as we have done in the past.  Those
interested in presenting should send me a short (1-3 paragraph) abstract by
December 1, 1991, using either e-mail, FAX, or snail mail.  Notification of ac-
ceptance will be given December 15, 1991.  We will not be holding parallel ses-
sions, so there are limitations on the number of speakers.  However, individu-
als who send high-quality abstracts that cannot be accommodated in actual talks
will have space to present their work in posters at the conference, and will 
also be invited to contribute to the book.

	Prof. Daniel S. Levine
	Department of Mathematics
	University of Texas at Arlington
	Arlington, TX 76019-0408 

	e-mail: b344dsl at utarlg.uta.edu
	FAX: 817-794-5802
	Telephone: 817-273-3598

From bessiere at imag.fr  Thu Oct 10 12:48:37 1991
From: bessiere at imag.fr (Pierre Bessiere)
Date: Thu, 10 Oct 1991 17:48:37 +0100
Subject: 4 reports available
Message-ID: <9110101648.AA09388@imag.imag.fr>


The following four papers/reports have been placed in the neuroprose
archive:

- Bessiere, P.; "Toward a synthetic cognitive paradigm: Probabilistic 
  Inference"; Conference COGNITIVA90, Madrid, Spain, 1990

      Neuroprose file name:  bessiere.cognitiva90.ps.Z

- Talbi, E-G. & Bessiere, P.; "A parallel genetic algorithm for the graph 
  partitioning problem"; ACM-ICS91 (Conference on Super Computing), Cologne,
  Germany, 1991

      Neuroprose file name:  bessiere.acm-ics91.ps.Z

- Bessiere, P., Chams, A. & Muntean, T.; "A virtual machine model for 
  artificial neural network programming"; INNC90 (International Neural
  Networks Conference), Paris, France, 1990

      Neuroprose file name:  bessiere.innc90.ps.Z

- Bessiere, P., Chams, A. & Chol, P.; "MENTAL: a virtual machine approach to
  artificial neural networks programming"; ESPRIT B.R.A. project NERVES (3049),
  final report, 1991

The abstract of each paper and ftp instructions follow:

                                ----------

        TOWARD A SYNTHETIC COGNITIVE PARADIGM: PROBABILISTIC INFERENCE

Cognitive science is a very active field of scientific interest. 
It turns out to be a "melting pot" of ideas coming from very 
different areas. One of the principal hopes is that some synthetic 
cognitive paradigms will emerge from this interdisciplinary 
"brain storming". The goal of this paper is to answer the question:
"Given the state of the art, is there any hints indicating the 
emergence of such synthetic paradigms?" The main thesis of 
the paper is that there is a good candidate, namely, the probabilistic 
inference paradigm.

In support of the above thesis the structure of the paper is as follows:
	- in a first part, we identify five criteria to qualify 
as a synthetic cognitive paradigm (validity, self consistency, 
competence, feasibility and mimetic power);
	- in the second paragraph, the principles of probabilistic 
inference are reviewed and justifications of validity and self 
consistency of this paradigm are given (Marr's computational level);
	- then, the competence criterion is discussed, considering 
the efficiency of probabilistic inference for dealing with the different 
classical cognitive riddles and analyzing the relationships of 
probabilistic inference with several of the usual connexionist 
formalisms (Marr's algorithmic level);
	- the criteria of feasibility (condition of computer implementation) 
and mimetic power (adequation with what is known of the architecture 
of the nervous system) are finally considered in the fourth 
part (Marr's implementation level).

As a conclusion, it will appear that probabilistic inference is at 
least a very interesting framework to get a synthetic overview of a 
number of works in the area and to identify and formalize the most 
puzzling questions. Some of these questions will be listed. 
In fact, probabilistic inference will appear finally to be
able to play the same role for computational cognitive science 
that formal logic has played for classical symbolic Artificial 
Intelligence: a sound mathematical foundation serving as a guide 
line, as a constant reference and as a source of inspiration.

                                ----------

     A PARALLEL GENETIC ALGORITHM FOR THE GRAPH PARTITIONING PROBLEM

Genetic algorithms are stochastic search and optimization techniques 
which can be usedf for a wide range of applications. This paper 
addresses the application of genetic algorithms to the graph 
partitioning problem. Standard genetic algorithms with large 
populations suffer from lack of efficiency (quite high execution time). 
A massively parallel genetic algorithm is proposed, an implementation on 
a SuperNode( of Transputers( and results of various benchmarks are given. 

The parallel algorithm shows a superlinear speed-up, in the sense 
that when multiplying the number of processors by p, the time 
spent to reach a solution with a given score, is divided by kp (k>1). 

A comparative analysis of our approach with hill-climbing algorithms 
and simulated annealing is also presented. The experimental 
measures show that our algorithm gives better results concerning both 
the quality of the solution and the time needed to reach it.

                                ----------

    A VIRTUAL MACHINE MODEL FOR ARTIFICIAL NEURAL NETWORK PROGRAMMING

This paper introduces the model of a virtual machine 
for A.N.N. (Artificial Neural Networks).

The context of this work is a collaborative project to study new 
V.L.S.I. implementations and new architectures for neuronal machines. The 
work consists in the specification and a prototype implementation of 
a description language for A.N.N., of the associated virtual 
machine, of the compiler between them and of the compilers mapping 
the virtual machine on different highly parallel computers.

In this short paper we present the virtual machine model which 
combines the features of various parallel programming paradigms. 
Our model allows, in particular, to have the same A.N.N. program 
running on both synchronous or asynchronous type of machines. In 
this framework a parallel architecture (S.M.A.R.T.) and a dynamically 
reconfigurable parallel machine of Transputers (SuperNode) are 
considered as target machines.

                                ----------

  MENTAL: A VIRTUAL MACHINE APPROACH TO ARTIFICIAL NEURAL NETWORKS PROGRAMMING
                          (ATTENTION: 100 pages)

This report treats (extensively) the same subject than the short paper
described just above. Some parts are extracted from the three previouly
presented papers.

                                ----------

These reports may be FTP from either neuroprose archives or from my
own server (IMAG):

How to get files from the Neuroprose archives?
______________________________________________

Anonymous ftp on:
	- archive.cis.ohio-state.edu (128.146.8.52)

mymachine>ftp archive.cis.ohio-state.edu
Name: anonymous
Password: yourname at youradress
ftp>cd pub/neuroprose
ftp>binary
ftp>get bessiere.foo.ps.Z
ftp>quit
mymachine>uncompress bessiere.foo.ps.Z

How to get files from IMAG?
___________________________

Anonymous ftp on:
	- 129.88.32.1

mymachine>ftp 129.88.32.1
Name: anonymous
Password: yourname at youradress
ftp>cd pub/SYMPA/NNandGA
ftp>binary
ftp>get bessiere.foo.ps.Z
ftp>quit
mymachine>uncompress bessiere.foo.ps.Z


-- 

Pierre BESSIERE
***************

IMAG/LGI                                  phone:
BP 53X                                    Work: 33/76.51.45.72
38041 Grenoble Cedex                      Home: 33/76.51.16.15
FRANCE                                    Fax:  33/76.44.66.75
                                          Telex:UJF 980 134 F

E-Mail: bessiere at imag.imag.fr

C'est au savant moderne que convient, plus qu'a tout autre, l'austere
conseil de Kipling: "Si tu peux voir s'ecrouler soudain l'ouvrage de ta vie, 
et te remettre au travail, si tu peux souffrir, lutter, mourrir sans murmurer,
tu seras un homme , mon fils." Dans l'oeuvre de la science seulement on peut 
aimer ce qu on detruit, on peut continuer le passe en le niant, on peut venerer
son maitre en le contredisant.       GASTON BACHELARD

From gary at cs.UCSD.EDU  Thu Oct 10 13:26:04 1991
From: gary at cs.UCSD.EDU (Gary Cottrell)
Date: Thu, 10 Oct 91 10:26:04 PDT
Subject: Restrictions on recurrent learning
Message-ID: <9110101726.AA24233@desi.ucsd.edu>

Fu-Sheng Tsung and I showed there were problems that a
hidden-recurrent (Elman-style) net can learn that an output-recurrent
Jordan net can't in our 1989 paper in IJCNN:

Tsung, Fu-Sheng and Cottrell, G. (1989) A sequential adder using recurrent
networks. In \fIProceedings of the International Joint Conference on
Neural Networks\fP, Washington, D.C.

A similar paper with some state space analysis is in:
Cottrell, G. and Fu-sheng Tsung (1991). Learning simple arithmetic procedures.
In J.A. Barnden & J.B. Pollack (Eds),
\fIAdvances in connectionist and neural computation theory, Vol 1:
High-level connectionist models\fP, Norwood: Ablex.

There are simple logical arguments that show that hidden-recurrent
nets are more powerful than output-recurrent nets. The bottom line is
that if there is a problem where the teaching signal forces
"forgetting" of the input, then a Jordan-style output-recurrent
network cannot respond to things that require remembering it.

Hal White also believes Elman nets are strictly more powerful than Jordan
nets, but I'm not sure he has a proof.

gary cottrell 619-534-6640 Sec'y: 619-534-5288 FAX: 619-534-7029
Computer Science and Engineering C-014
UCSD, 
La Jolla, Ca. 92093
gary at cs.ucsd.edu (INTERNET)
{ucbvax,decvax,akgua,dcdwest}!sdcsvax!gary (USENET)
gcottrell at ucsd.edu (BITNET)

From ECONEC at vax.oxford.ac.uk  Fri Oct 11 11:39:00 1991
From: ECONEC at vax.oxford.ac.uk (ECONEC@vax.oxford.ac.uk)
Date: Fri, 11 Oct 91 11:39 BST
Subject: REQUEST FOR INFORMATION: NNs AND ECONOMICS
Message-ID: <mailman.451.1149591235.29955.connectionists@cs.cmu.edu>

REQUEST FOR INFORMATION

I am studying for an MLitt/DPhil at the Oxford University and would be very
grateful for some help. This message is being transmitted to several relevant
lists and please feel free to forward it to anyone who might be interested.
Apologies in advance to anyone who gets fed up with seeing it!

1) REQUEST: I am interested in references and names for work broadly in the 
area of AI techniques applied to economics. To narrow this down, I am 
interested in AI as a tool for developing alternative models of economic 
behaviour than the traditional view of man as a perfectly informed calculating
machine! Because of the behavioural aspect and my preference for economic
theory I am hoping to avoid work that simply uses AI techniques to solve
traditional models faster. (GAs as function optimisers for instance.) Similarly
I am not seeking information on decision support or Expert Systems unless they
make some attempt (or claim) to emulate human decision making behaviour.
(Default Logics? Frames?) Please err on the side of completeness!

2) OFFER: Obviously I can provide summaries of my findings to various lists 
in the usual way. (Perhaps you could say where you saw my post so I can keep
the summaries relevant to each list.) What I would also like to do is find out
whether there is any interest in an adhoc email list of people working in this
area. Or if there is one already I would very much like to hear about it. I'm
sure such things have been going for years in the US but information here in the
UK seems very sparse. I would be quite happy to "maintain" an unofficial
bulletin board or mailing list if one does not exist.

Many thanks in advance for any help and please feel free to contact me on any
aspect of this posting.

Edmund Chattoe

SNAIL: LADY MARGARET HALL
       OXFORD
       OXON
       OX2 6QA


From lyn at dcs.exeter.ac.uk  Fri Oct 11 13:56:49 1991
From: lyn at dcs.exeter.ac.uk (Lyn Shackleton)
Date: Fri, 11 Oct 91 13:56:49 BST
Subject: special deal for Connection Science
Message-ID: <11273.9110111256@castor.dcs.exeter.ac.uk>

 
**********	CONNECTION SCIENCE SPECIAL ISSUE    ******************


     CONNECTIONIST MODELLING OF PSYCHOLOGICAL PROCESSES

VOLUME 3.2 (out now)


EDITOR 
Noel Sharkey

SPECIAL BOARD
Jim Anderson
Andy Barto
Thomas Bever
Glyn Humphreys
Walter Kintsch
Dennis Norris
Kim Plunkett
Ronan Reilly
Dave Rumelhart
Antony Sanford

CONTENTS

J R Levenick:NAPS: a connectionist implementation of cognitive maps.

A Pouget & S J Thorpe: Connectionist models of orientation
identification.

D R Shanks: A connectionist account of base-rate biases in
categorization.

A J O'Toole, K Deffenbacher, H Abdi & J Bartlett: Simulating the
"Other-race effect" as a problem in perceptual learning.

S Kaplan, M Sonntag & E Chown: Tracing recurrent activity of cognitive
elements (TRACE): a model of temporal dynamics in a cell assembly.

Research Notes:
A H Kawamoto & S N Kitzis: Time course of regular and irregular pronunciations.


A VERY SPECIAL DEAL FOR MEMBERS OF THE CONNECTIONISTS MAILING.

Prices for members of this list will now be:

North America 44 US Dollars (reduced from 126 dollars)

Elsewhere and U.K. 22 pounds sterling.
(Sterling checks must be drawn on a UK bank)

These rates start from 1st January 1992 (volume 4).

Conditions:
1. Personal use only (i.e. non-institutional).
2. Must subscribe from your private address.

You can receive a subscription form by emailing direct to the publisher:

email: carfax at ibmpcug.co.uk 
Say for the attention of David Green and say CONNECTIONISTS MAILING LIST.

noel


From mclennan at cs.utk.edu  Fri Oct 11 17:43:22 1991
From: mclennan at cs.utk.edu (mclennan@cs.utk.edu)
Date: Fri, 11 Oct 91 17:43:22 -0400
Subject: report: contin. symbol systems
Message-ID: <9110112143.AA01451@maclennan.cs.utk.edu>


    ** Please do not forward to other boards.  Thank you. **

The following technical report has been placed in the Neuroprose
archives at Ohio State.  Ftp instructions follow the abstract.
N.B.  The uncompressed file is long (1.82 MB), so you may have to
use the -s (symbolic link) option on lpr to print it.

      -----------------------------------------------------

                    Continuous Symbol Systems
                   The Logic of Connectionism

                         Bruce MacLennan
                   Computer Science Department
                     University of Tennessee
                       Knoxville, TN 37996
                      maclennan at cs.utk.edu

                   Technical Report CS-91-145

                            ABSTRACT:

It has been long assumed that knowledge and thought are most
naturally represented as _discrete_symbol_systems_ (calculi).
Thus a major contribution of connectionism is that it provides an
alternative model of knowledge and cognition that avoids many of
the limitations of the traditional approach.  But what idea
serves for connectionism the same unifying role that the idea of
a calculus served for the traditional theories? We claim it is
the idea of a _continuous_symbol_system_.

This paper presents a preliminary formulation of continuous sym-
bol systems and indicates how they may aid the understanding and
development of connectionist theories.  It begins with a brief
phenomenological analysis of the discrete and continuous; the aim
of this analysis is to directly contrast the two kinds of symbols
systems and identify their distinguishing characteristics. Next,
based on the phenomenological analysis and on other observations
of existing continuous symbol systems and connectionist models, I
sketch a mathematical characterization of these systems.  Finally
the paper turns to some applications of the theory and to its
implications for knowledge representation and the theory of com-
putation in a connectionist context. Specific problems addressed
include decomposition of connectionist spaces, representation of
recursive structures, properties of connectionist categories, and
decidability in continuous formal systems.

A preliminary version of this paper was presented at the workshop
"Neural Networks for Knowledge Representation, Fourth Annual
Workshop of the Metroplex Institute for Neural Dynamics (MIND),"
Westlake TX, October 4-6, 1990.  Also presented at "ConnectFest
1990," sponsored by Indiana University Center for Research in
Concepts and Cognition, November 3-4, 1990.

      -----------------------------------------------------
                        FTP INSTRUCTIONS

Either use "Getps maclennan.css.ps.Z", or do the following:

     unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary
     ftp> get maclennan.css.ps.Z
     ftp> quit
     unix> uncompress maclennan.css.ps.Z
     unix> lpr -s maclennan.css.ps (or however you print postscript)

Note that the postscript version is missing three (nonessential)
figures that have been pasted into the hardcopy version.  If you
need hardcopy, then send your request to:

                       library at cs.utk.edu

     Bruce MacLennan
     Department of Computer Science
     107 Ayres Hall
     The University of Tennessee
     Knoxville, TN 37996-1301

     (615)974-0994/5067
     FAX: (615)974-4404
     maclennan at cs.utk.edu


From david at cns.edinburgh.ac.uk  Sun Oct 13 17:20:34 1991
From: david at cns.edinburgh.ac.uk (David Willshaw)
Date: Sun, 13 Oct 91 17:20:34 BST
Subject: Computational Neureoscientist post
Message-ID: <4519.9110131620@subnode.cns.ed.ac.uk>


                         UNIVERSITY OF OXFORD
                 MRC Centre in Brain and Behaviour


The Medical Research Council has awarded a 7-year grant to establish a
Research Centre in Brain and Behaviour, based at the University of
Oxford, and also involving scientists from other universities
including Birmingham, Cambridge, Durham, Edinburgh and London.

The main theme of the Research Centre is the organisation, function,
development and disorders of the cerebral cortex, and central to this
theme is the exploration of the cortex as an instrument of
computation. To this end, the Centre carries out research involving
many different methodologies, in the areas of sensory systems,
learning and memory, and motor control.

Applications are invited for the post of Computational Neuroscientist
to work on theoretical aspects of learning and memory. The post will
be based at the University of Edinburgh, where the post-holder will be
expected to spend 80% of his/her time. The remaining time will be
spent in linking with complementary work being carried out by other
participants of the Centre, particularly at the universities of Oxford
and Cambridge.

A range of projects is available, and prospective applicants are
encouraged to discuss their plans with Dr David Willshaw of the
University of Edinburgh. Two possibilities which are compatible with
present work are:

1)  Development of a model of the mammalian hippocampal formation as
    an associative memory;

2)  Investigation of associative and error-correcting models of
    cerebellar function as implemented in a biologically realistic form.


This appointment, which is available from January 1992 for 2 years in
the first instance and potentially renewable for a further 4 years,
will be made on the RS1A scale (currently 11,969-19,073 pounds p.a.
with a discretionary scale rising to 21,391 pounds p.a.).

Applications (including the name and address of two referees) should
be sent to Ms Catherine Greasley, Administrative Secretary, MRC
Research Centre in Brain and Behaviour, Department of Experimental
Psychology, University of Oxford, South Parks Road, Oxford, OX1 3UD
(telephone (0865) 271364 - mornings only) no later than Friday 8
November 1991.


      The University of Oxford is an Equal Opportunities Employer


David Willshaw
Centre for Cognitive Science
2 Buccleuch Place
Edinburgh EH8 9LW
UK

Tel: (+44) 31 650 4404
Fax: (+44) 31 650 4587
Email: d.willshaw at edinburgh.ac.uk


From harnad at Princeton.EDU  Sun Oct 13 19:51:05 1991
From: harnad at Princeton.EDU (Stevan Harnad)
Date: Sun, 13 Oct 91 19:51:05 EDT
Subject: Newell's Unified Theories of Cognition: BBS Call for Book Reviewers
Message-ID: <9110132351.AA08163@psycho>

Below is the abstract of a book that will be accorded multiple book
review in Behavioral and Brain Sciences (BBS), an international,
interdisciplinary journal that provides Open Peer Commentary on
important and controversial current research in the biobehavioral and
cognitive sciences. Commentators must be current BBS Associates or
nominated by a current BBS Associate. To be considered as a commentator
on this book, to suggest other appropriate commentators, or for
information about how to become a BBS Associate, please send email to:

harnad at clarity.princeton.edu  or harnad at pucc.bitnet        or write to:
BBS, 20 Nassau Street, #240, Princeton NJ 08542  [tel: 609-921-7771]

To help us put together a balanced list of commentators, please give some
indication of the aspects of the topic on which you would bring your
areas of expertise to bear if you are selected as a commentator.
____________________________________________________________________
          BBS Multiple Book Review of:

		     UNIFIED THEORIES OF COGNITION
		     (Harvard University Press, 1990)

		     Allen Newell
		     School of Computer Science
		     Carnegie-Mellon University

This book presents the case that cognitive science should turn its
attention to developing theories of human cognition that cover the full
range of human perceptual, cognitive, and action phenomena. Cognitive
science has now produced a massive number of high quality regularities
with many microtheories that reveal important mechanisms. The need for
integration is pressing and will continue to increase. Equally
important, cognitive science now has the theoretical concepts and tools
to support serious attempts at unified theories. The argument is made
entirely by presenting an exemplar unified theory of cognition both to
show what a real unified theory would be like and to provide convincing
evidence that such theories are feasible. The exemplar is Soar, a
cognitive architecture realized as a software system. After a
detailed discussion of the architecture and its properties, with its
relation to the constraints on cognition in the real world and to
existing ideas in cognitive science, Soar is used as a theory for a
wide range of cognitive phenomena: immediate responses
(stimulus-response compatibility and the Sternberg phenomena); discrete
motor skills (transcription typing); memory and learning (episodic
memory and the acquisition of skill through practice); problem solving
(cryptarithmetic puzzles and syllogistic reasoning); language (sentence
verification and taking instructions); and development (transitions in
the balance beam task). The treatments vary in depth and adequacy, but
they clearly reveal a single, highly specific, operational theory that
works over the entire range of human cognition. Soar is presented as an
exemplar unified theory, not as the sole candidate. Cognitive science
is not ready yet for a single theory -- there must be multiple
attempts. But cognitive science must begin to work towards such unified
theories.

From kamil at apple.com  Mon Oct 14 19:41:34 1991
From: kamil at apple.com (Kamil A. Grajski)
Date: Mon, 14 Oct 91 16:41:34 -0700
Subject: batch-mode parallel implementations
Message-ID: <9110142341.AA19545@apple.com>

Hi folks,

In reviewing some implementations of back-prop type algorithms on
parallel machines, it is apparent that several such implementations
obtain their high performance because of batch-mode training.
What this means is that one operates on N independent training
patterns simultaneously and then collects all the weight update
information and reestimates once per N samples.  Example where
this has been used (among others) are the GF-111, MasPar, CM-2,
Warp (I think, at least for a self-org feature map implementation),
etc.  In many papers, I have read passing references to the fact that
real-time learning is preferred (in practice) over the theoretically
indicated batch-mode (so-called "true gradient") learning.  Some of
the arguments given include "faster" convergence and "better"
generalization.  Are the convergence and generalization arguments
linked at some deeper level of analysis?  (You could have fast
convergence which generalizes poorly, etc.) I have played with this
just a little bit on small speech and other datasets without reaching
any conclusive results.

I am wondering whether there have been some definitive studies,
theoretical and/or practical which really confront this issue?
How big an issue is this for people?  For example, would you NOT
look at a parallel design which assumes batch-mode training?

Kamil
P.S.  If this is a dead issue and I missed the funeral, I apologize.

================
Kamil A. Grajski
Apple Computer
(408) 974-1313
kamil at apple.com
================

From B344DSL at UTARLG.UTA.EDU  Mon Oct 14 14:14:00 1991
From: B344DSL at UTARLG.UTA.EDU (B344DSL@UTARLG.UTA.EDU)
Date: Mon, 14 Oct 1991 13:14 CDT
Subject: Announcement of talk by Pribram at Georgetown, Oct. 18
Message-ID: <01GBQK3RGXY80003LS@utarlg.uta.edu>

From:	IN%"PRUEITT at guvax.georgetown.edu"  "Paul S. Prueitt" 14-OCT-1991 12:29:48.05
To:	IN%"kpribram at ruacad.ac.runet.edu"  "kpribram"
CC:	IN%"duziakm at isnet.inmos.com"  "duziakm", IN%"pwerbos at note.nsf.gov"  "pwerbos", IN%"liwu at aic.nrl.navy.mil"  "liwu", IN%"kugler at rucs2.sunlab.cs.runet.edu"  "kugler", IN%"medsker at AUVM.BITNET"  "medsker", IN%"b344dsl at UTARLG.UTA.EDU"  "b344dsl", IN%"prueitt
Subj:	Pribram's Talk on Friday


From PRUEITT at guvax.georgetown.edu  Mon Oct 14 14:15:00 1991
From: PRUEITT at guvax.georgetown.edu (Paul S. Prueitt)
Date: 14 Oct 91 13:15:00 EST
Subject: Pribram's Talk on Friday
Message-ID: <01GBQIANVZ28000315@utarlg.uta.edu>


Please Communicate within your group

*********************Please post and forward on E-mail*******************

********************
Georgetown University 
Physics Department
and Neural Network Research Facility

1991-92 Colloquium Series on
Behavioral and Computational Neuroscience

Friday, October 18th
4:00 P.M. to 6:00 P.M.
Auditorium Room 112 Reiss Building, Georgetown University
Refreshments at 3:30 P.M. in Room 505

Dr. Karl Pribram
****************
Center for Brain Research and Informational Sciences, Radford University

Brain and Perception, 
Holonomy and Structure in Figural Processing

Dr. Pribram will discuss topics from his new book; Brain and Perception, 
Holonomy and Structure in Figural Processing.  A one hour prepared 
lecture is to be followed by a one hour discussion.  The book is 
now available from Dr. Edward J. Finn, Chairman of the G.U. Physics 
Department or from Lawrence Erlbaum Associates. 

Professor Pribram will autograph copies of the book after the Colloquium.
*************************************************************************


For additional information please call Edward Finn at 202-687-6231.

Parking: Use Georgetown Univ. Entrance One from Reservoir Road 
(Northern Boundary)


********************


From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Tue Oct 15 01:23:47 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Tue, 15 Oct 91 01:23:47 -0400
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Mon, 14 Oct 91 16:41:34 -0800.
             <9110142341.AA19545@apple.com> 
Message-ID: <mailman.452.1149591235.29955.connectionists@cs.cmu.edu>


I don't recall seeing any studies that claim better generalization for
per-sample or continuous updating than for per-epoch or batch updating.
Can you supply some citations?  The only reason I can think of for better
generalization in the per-sample case would be a weak sort of
simulated-annealing effect, with the random variation among individual
training samples helping to jiggle the system out of small local minima in
the vicinity of the best answer.

As for speed of convergence, continuous updating clearly beats per-epoch
updating if the training set is highly redundant.  To see this, imagine
taking a small set or training cases, duplicating that set 1000 times, and
presenting the resulting huge set as the training set.  Per-sample updating
would probably have converged on a good set of weights before the first
per-epoch weight adjustment is ever made.  Also, in some cases it just is
not practical to use per-epoch updating.  There may be a stream of
ever-changing data going by, and it may be impractical to store a large set
of samples from this data stream for repeated use.

On the other hand, it is rather dangerous to use continuous updating with
high learning rates or with techniques that adjust the learning rate based
on some sort of second-derivative estimate.  If you are not very careful, a
few atypical cases in a row can accelerate you right out of the solar
system.  Some techniques, such as quickprop and most of the conjugate
gradient methods, depend on the ability to look at the same set of training
examples more than once, so they inherently are per-epoch models.

In my opinion, the best solution in most situations is probably to use one
of the accelerated convergence methods and to update the weights after an
"epoch" whose size is chosen by the experimenter.  It must be sufficiently
large to give a reasonably stable picture of the overall gradient, but not
so large that the gradient is computed many times over before a
weight-update cycle occurs.  However, I am sure that this view is not
universally accepted: some people seem to believe that per-sample updating
is superior in all cases.

-- Scott Fahlman

From castillo at eel.upc.es  Tue Oct 15 11:16:28 1991
From: castillo at eel.upc.es (Francisco Castillo Cobo)
Date: Tue, 15 Oct 1991 11:16:28 UTC+0100
Subject: add to NEURAS-LIST
Message-ID: <"114*/S=castillo/OU=eel/O=upc/PRMD=iris/C=es/"@MHS>

	Hi, I am currently compiling a list of incremental (or growing) neural network
s, I have some already identified, including RCE and Tiling. I am interested in
 receiving additional references on the matter and would be glad to summarize t
he responses and se
nd them to  anyone who might be interested.

	Thanx!

	F.Castillo

From ecal at cgref.cemagref.fr  Tue Oct 15 08:06:13 1991
From: ecal at cgref.cemagref.fr (European Conference on Artificial Life)
Date: Tue, 15 Oct 91 12:06:13 GMT
Subject: ECAL91 programme
Message-ID: <9110151206.AA11528@cgref>

Please find enclosed an E-mail version of ECAL91 programme (more up-to-date 
than the paper programme). You can use the registration form enclosed,
granted that you send your payment by regular mail at the given address. 


=====================CUT HERE=====================CUT HERE======================

1st European Conference on Artificial Life
________________________________________________________________________________


		PROGRAMME - PROGRAMME - PROGRAMME- PROGRAMME

________________________________________________________________________________

EEEEEEE        CCCCCC         AA         LL                     99           11
EE            CC            AA AA        LL                   99  99         11
EE           CC            AA   AA       LL                  99    99        11
EE          CC            AA     AA      LL                   99   99        11
EEEEE       CC            AAAAAAAAA      LL                     99999        11
EE          CC            AA     AA      LL                        99        11
EE           CC           AA     AA      LL                        99        11
EE            CC          AA     AA      LL                       99         11
EEEEEEE        CCCCC      AA     AA      LLLLLLLL              9999          11

________________________________________________________________________________


To be held on December 11-13 1991


in	Centre des Congres de la Villette
	Salle Laser
	cite des Sciences et de l'Industrie
	Paris, France


Publisher : MIT Press / Bradford Books

Sponsors :

la Cite
CEMAGREF
Banque de France
CNR
Fondation de France
AFCET
Electricite de France
CREA
Offilib

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


Artificial life: a new scientific field


Artificial life embodies a recent and important conceptual step in modern
science: asserting that the core of intelligence and cognitive abilities is the
same as the capacity for living. Metaphorically, artificial life would see in
the modest insect rather than in the symbolic abilities of an expert the best
prototype for intelligence . What needs to be understood and characterized is
the class of processes that endow living creatures with their characteric
autonomy, key properties such as viability, abduction and adaptability. The
autonomy of the living beings is understood here both with regards to their
actions and to the way in which they shape their world into significance. This
exploration goes hand in hand with the theory, design and construction of
simple autonomous agents.

The recent surge of interest in 'artificial life' has to be understood in the
context of the long tradition inaugurated with cybernetics, seeking common
basis for the living and the artificial. Artificial life can take advantage of
the years of research in the tradition of symbolic computation that still
characterizes most of the research in artificial intelligence, as well as the
more recent explosive development of neural networks and connectionist
approaches. Artificial life also induces a renewal of a whole range of
engineering traditions, such as control theory and robotics, beyond classical
notions of goal and planning, into biologically inspired notions of viability
and adaptation, situatedness and operational closure, thus putting evolutionary
processes at the very center of the stage.

The first European meeting intends to highlight the practice of such autonomous
systems in all their forms, by hosting the presentation and discussion of the
most recent research in the area. Beyond research results, another main
intention of the meeting is to engage researchers and philosophers to examine
the epistemological basis of this new trend. Only a sustained analysis of the
main concepts and ideas can provide a fertile ground for important advances and
a change of research paradigm.


Conference Chairs : Paul Bourgine and Francisco Varela


Programme Committee :

	H. Bersini, B				Ch. G. Langton, USA
	R. Brooks, USA				J. A. Meyer, F
	J. Demongeot, F				H.Schwefel, FRG
	B. Goodwin, UK				D. Parisi, I
	S. Kauffman, USA


Organizing Committee :

	I. Alvarez				V. Douzal
	L. Bochereau				T. Fuhs
	G. Deffuant

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


Wednesday December 11


8:00    REGISTRATION

9:30    WELCOME ADDRESS
Paul BOURGINE, CEMAGREF - (F),  Francisco VARELA, CREA - (F)

9:45            AUTONOMOUS ROBOTS (I)

Invited lecture:
Rodney BROOKS, MIT - (USA)
"Robots and artificial life"

Uwe SCHNEPF, GMD - (FRG), Mukesh J. PATEL, University of Sussex - (UK)
"Concept formation as Emergent Phenomena"

Rolf PFEIFER, Free University of Brussels - (B), Paul VERSCHURE, Univ. of
California, Santa Cruz (USA)
"Distributed adaptive control : a paradigm for autonomous agents"

Break / refreshments

Tim SMITHERS, University of Edimburgh - (UK)
"Taking eliminative materialism seriously : a methodology for autonomous
systems research"

Leslie P. KAELBLING, Brown University - (USA)
"An adaptable mobile robot"

Pattie MAES, MIT - (USA)
"Learning behavior networks from experience"

13:15   LUNCH

14:30           SWARM INTELLIGENCE

Invited lecture:
Jean-Louis DENEUBOURG,  Free University of Brussels - (B)
"Swarm-made architecture"

Alberto COLORNI, Marco DORIGO, Vittorio MANIEZZO, Politecnico di Milano - (I)
"Distributed optimization by ant colonies"

Andrew M. ASSAD, Univ. of Illinois - (USA), Norman H. PACKARD, Inst. for
Scientific Interchange - (I)
"Emergent colonization in an artificial ecology"

Gerardo BENI, Susan HACKWOOD, Univ. of California, Riverside - (USA)
"The maximum entropy principle and sensing in swarm intelligence"

Break / refreshments

17:00           EPISTEMOLOGICAL ISSUES

Stefan HELMREICH, Stanford University - (USA)
"The historical and epistemological ground of von Neumann's theory of
self-reproducing automata and theory of games"

Jean-Luc DORMOY, EDF - (F), Sylvie KORNMAN,  LAFORIA - (F)
"Meta-knowledge, autonomy and (artificial) evolution : some lessons learnt so
far"


18:00           POSTERS AND DEMOS

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


Thursday December 12


9:00            EPISTEMOLOGICAL ISSUES (Continued)

R. Allen GARDNER, Beatrix T. GARDNER, University of Nevada - (USA)
"A feedforward model of animal learning"

Bernard MANDERICK, Free University of Brussels - (B)
"Selectionist systems as cognitive systems"

Break / refreshments

10:15           AUTONOMOUS ROBOTS (II)

Ian HORSWILL, MIT - (USA)
"Characterizing adaptation by constraint"

Didier KEYMEULEN, Jo DECUYPER, Free University of Brussels - (B)
"On the self-organizing properties of topological maps"

Piet SPIESSENS, Jan TORREELE, Free University of Brussels - (B)
"Massively parallel evolution of recurrent networks : an approach to temporal
processing"

Dave CLIFF, University of Sussex - (UK)
"Neural networks for visual tracking in an artificial  fly"

12:45   LUNCH

14:15           LEARNING AND EVOLUTION

Invited lecture:
Domenico PARISI, Stefano NOLFI, Federico CECCONI, CNR - (I)
"Learning, behaviour, and evolution"

Hugues BERSINI, Free University of Brussels - (B)
"Immune network and adaptive control"

Franck HOFFMEISTER, Thomas BACK , University of Dortmund - (FRG)
"Genetic self-learning"

Heinz MUHLENBEIN, GMD - (FRG)
"Darwin's continent cycle theory and its simulation by the Prisoner's dilemna"

Break / refreshments

Melanie MITCHELL, John H. HOLLAND, University of Michigan - (USA), Stephanie
FORREST, University of New Mexico - (USA)
"The royal road for genetic algorithms : fitness landscapes and GA performance"

Brad FULLMER, Risto MIIKKULAINEN,  University of Texas - (USA)
"Evolving finite state behaviour using marker-based genetic encoding of neural
networks"


18:00   Invited lecture:
Stuart KAUFMANN , University of Pennsylvania - (USA)
"Waiting for Carnot"


20:30   DINNER

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


Friday December 13


9:30            ADAPTIVE AND EVOLUTIONARY MECHANISMS


Barry McMULLIN, Dublin City University - (UK)
"The Holland alpha-Universes revisited"

Robert J. COLLINS, David R. JEFFERSON, University of California - (USA)
"The evolution of sexual selection and female choice"

Filippo MENCZER, Domenico PARISI,  CNR - (I)
"A model for the emergence of sex in evolving networks : adaptive advantage or
random drift ?"

Break / refreshments

Inman HARVEY, University of Sussex  - (UK)
"Species adaptation genetic algorithms : a basis for a continuing SAGA"

Jakob SKIPPER, Niels Bohr Institute - (Dk)
"The complete zoo evolution in a box"

Jeffrey HORN, University of Illinois - (USA)
"Measuring the evolving complexity of stimulus-response organisms"


13:15   LUNCH


14:30           CONCEPTUAL FOUNDATIONS


Hugues BERSINI, Free University of Brussels - (B)
"Animat's I"

Claus EMMECHE, Institute of Computer and Systems Sciences - (Dk)
"Life as an abstract phenomenon : is Artificial Life possible ?"

John STEWART - Paris (F)
"Life=cognition : the epistemological and ontological signifance of Artificial
Life"

Break / refreshments

Peter CARIANI, Boston - (USA)
"Some epistemological implications of devices which construct their own sensors
and effectors"

Mark A. BEDAU, Reed College - (USA)
"Philosophical aspects of Articial Life"


17:30   CONCLUDING REMARKS

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________

POSTER SESSION

Petr KURKA, Charles University - (Cz)
"Natural Selection in a population of automata"

Thomas BACK, University of Dortmund - (FRG)
"Self-adaptation in genetic algorithms"

Robert DAVIDGE, University of Sussex - (UK)
"Looking at life"

Hugo de GARIS, Free University of Brussels - (B)
"Streerable GenNets : the genetic programming of controllable behaviors in
GenNets"

Bruno MARCHAL, Free University of Brussels - (B)
"Amoeba, planaria and dreaming machines"

Alexis DROGOUL, Jacques FERBER, LAFORIA - (F)
"A behavioural simulation model for the study of emergent social structures"

Antonio RIZZO, CNR - (I), Neil BURGESS,  University of Manchester - (UK)
"Action based neural network for adaptive control : the tank case"

John R. KOZA, Stanford University - (USA)
"Evolving emergent wall following robotic behavior using the genetic
programming paradigm"

Bruno GAS, Rene NATOWICZ, ESIEE - (F)
"A non-supervised continuous learning model of neural network for temporal
sequence recognition"

Eric DEDIEU,  Emmanuel MAZER, IMAG - (F)
"The SWALLOW modeler : an approach to sensory relevance"

Gilles VENTURINI, ESIEE - (F)
"Characterizing the adaptation abilities of a class of genetic base machine
learning algorithms"

Barbara WEBB, Tim SMITHERS, University of Edimburgh - (UK)
"The connection between AI and biology in the study of behaviour"

Ulrich NEHMZOW, Tim SMITHERS, University of Edimburgh - (UK)
"Using motor actions for location recognition"

Stephen TODD, Wiliam LATHAM, IBM - (UK)
"Artificial life or surreal art?"

R.C. PATON , H. S. NWANA, M. J. SHAVE, T. J. BENCH-CAPON, University of
Liverpool - (UK)
"Computing at the tissue/organ level (with particular reference to the liver)"

Pierre BESSIERE, IMAG - (F)
"Genetic Algorithms applied to formal neural networks : parallel genetic
implementation of a Boltzmann machine and associated robotic experimentations"

Karl SIMS, Thinking Machines Corp. - (USA)
"Interactive evolution of dynamical systems"

Nicolas MEULEAU, CEMAGREF - (F)
"Co-evolution and mimetism : a program simulating road traffic"

Christian NOTTOLA, Frederic LEROY, Banque de France - (F)
"Dynamics of artificial markets

M. SNAITH, 0.HOLLAND, TAG - (UK)
"Application of the temporal difference learning to the neural control of
quadrupede locomotion"

Simon GOOS, Jean-Louis DENEUBOURG, Free University of Brussels - (B)
"Harvesting by a group of robots"

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


Registration Form


Name : ......................     First name : .......................

Firm   :..............................................................
Address : ............................................................
......................................................................
Zip code : .............  City : .....................................
Country : ................................
Phone : ............ Fax : ...............
Email : ..................................

Invoice to be sent to : ................................


Registration fees               Before 20/11/91         After 20/11/91
________________________________________________________________________________
Students*			o FF  750		o FF  750
University Members		o FF 1500		o FF 1750
Others				o FF 2200		o FF 2500
________________________________________________________________________________
* Student status proof required

These fees include all refreshments and lunches.


Payment (in french francs only, foreign cheques accepted):

o Cheques  (to be sent to ECAL 91)
      please note that all charges, if any, must be at the participants'
      expense.

o Banker's draft to the order of ECAL:

Credit Lyonnais, bank account 30002 08948 0000079087X 55 Versailles    StLouis,
F-78000. PLease ask your bank to arrange the transfer at no cost for the
beneficiary. Bank charges, if any, will be at the participants' expense.


Travel

Please, send me

	o  Domestic railway discount ticket SNCF (20%)
	o  Domestic flight discount ticket  Air Inter (35%)


Cancellations

Refunds of 50 % will be made if a written request is received before November
30. No refunds will be made for cancellations received after this date. In case
of conference cancellation beyond its control, ECAL organizing committee limits
its liability to the registration fees already paid.


		     Date                                       Signature


Send this form to :

	ECAL 91
	17 allee Gabrielle d'Estrees
	F-75019 Paris
	FRANCE

Further information concerning registration :
	Fax : (+33) 1 40 96 60 80
	Voice : (+33) 1 40 96 61 79
	E-mail : ecal at cemagref.fr

================================================================================

1st European Conference on Artificial Life
________________________________________________________________________________


General Information
___________________


Language

The conference will be conducted in English.

Accommodation

Hotel Forest Hill La Villette *** (5-minutes walk )
26 av. Corentin Cariou, Paris.
Tel : +33 1 44 72 15 30, fax: 33 1 44 72 15 80.
Single or double rooms: 480FF, special price for ECAL participants.

Hotel Arcade La Villette ** (5-minutes walk)
Tel : +33 1 40 38 04 04
Single: 390FF, double room: 420FF. Please reserve at least 30 days in advance.

Hotel Campanile Pantin **  (10-minutes walk)
Tel : +33 1 48 91 32 76
Single or double rooms: 335FF. Please reserve at least 45 days in advance.
Tel : +33 (1) 48 91 32 76

Reservation centers (other hotels):

Tel: 33 1 47 27 15 15 (500 to 700FF rooms).
Tel: 33 1 43 59 12 12. (Elysee 12 12).
Tel: 33 1 42 56 30 00, fax 33 1 42 89 42 97 (Paris Sejour Reservations)

Tourist information : 33 1 47 23 61 72

Cheaper accomodations are available at:
Centre de sejour Eugene Henaff
Tel 33 (1) 48 39 19 05


Entry visas
___________

For non European Community members, please check with the french consulate
whether you need a Visa.


Access to Paris cite des Sciences et de l'Industrie
___________________________________________________

La cite des Sciences et de l'Industrie is located in northeast Paris, at La
Villette Park, 30, avenue Corentin Cariou, 75019 Paris. It is 40 minutes from
Roissy and Orly airports. You can reach the Cite:

by car: Circular highway, Porte de la Villette exit. Parking available at quai
de la Charente and Boulevard Macdonald;
by metro: Line 7, Porte de la Villette station;
by bus: lines 150-152-250A-PC.

For information about the cite des Sciences, call 33 1 46 42 13 13
(round-the-clock), or by Minitel: 3615 code Villette.

From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu  Tue Oct 15 10:22:41 1991
From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu)
Date: Tue, 15 Oct 91 09:22:41 EST
Subject: Paper
Message-ID: <mailman.453.1149591235.29955.connectionists@cs.cmu.edu>

Following is the abstract of a paper accepted by IJCNN'91-SINGAPORE.
The main purpose of this paper was to attack the problems of slow rate of
convergence, local minima, and incapability of learning (under certain
preset criteria) etc problems associated with the original back-propagation
neural nets from an alternative viewpoint ---- topology ---- instead of the
learning algorithm and units responsive characteristics. It was shown in
this paper that the topology is a very important factor limiting the
performances of back-propagation neural networks besides the already studied
factors such as the learning algorithm and the units characteristics.

All comments are welcome.


        PPNN: A Faster Learning and Better Generalizing Neural Net


                                  Bo Xu
                           Indiana University

                               Liqing Zheng
                            Purdue University


  Abstract----It was pointed out in this paper that the planar topology of
current back-propagation neural network (BPNN) sets limits to solve the
slow convergence rate problem, local minima, and other problems associated
with BPNN.  The parallel probabilistic neural network (PPNN) using a new
neural network topology, stereotopology, was proposed to overcome these
problems.  The learning ability and the generalization ability of BPNN and
PPNN were compared for several problems.  The simulation results show
that PPNN was capable of learning any kinds of problems much faster than
BPNN and generalized better than BPNN too.  It was analyzed that the faster,
universal learnability of PPNN was due to the parallel characteristic of
PPNN's stereotopology, and the better generalization ability came from the
probabilistic characteristic of PPNN's memory retrieval rule.


Bo Xu
Indiana University
itgt500 at indycms.iupui.edu

From xiru at Think.COM  Tue Oct 15 11:35:55 1991
From: xiru at Think.COM (xiru Zhang)
Date: Tue, 15 Oct 91 11:35:55 EDT
Subject: batch-mode parallel implementations 
In-Reply-To: Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU's message of Tue, 15 Oct 91 01:23:47 -0400 <9110151441.AA10584@chaos.cs.brandeis.edu>
Message-ID: <9110151535.AA02757@yangtze.think.com>


From jcp at vaxserv.sarnoff.com  Wed Oct 16 12:03:09 1991
From: jcp at vaxserv.sarnoff.com (John Pearson W343 x2385)
Date: Wed, 16 Oct 91 12:03:09 EDT
Subject: batch-mode parallel implementations
Message-ID: <9110161603.AA09000@sarnoff.sarnoff.com>

Xiru Zhang stated:
>From the point of view of implementation, if a network is not large, there
>is not much you can parallelize if you do per-sample training.

Even in per-sample training one may be able to efficiently exploit a
parallel machine. Each processor simulates the same network but has a
different set of initial weights. The convergence time and performance
of a trained network can be very dependent on the initial weights.
I would appreciate being sent references that discuss this last statement.

John Pearson
David Sarnoff Research Center
CN5300
Princeton, NJ 08543
609-734-2385
jcp at as1.sarnoff.com

From gary at cs.UCSD.EDU  Wed Oct 16 13:05:51 1991
From: gary at cs.UCSD.EDU (Gary Cottrell)
Date: Wed, 16 Oct 91 10:05:51 PDT
Subject: batch-mode parallel implementations
Message-ID: <9110161705.AA27497@desi.ucsd.edu>

I tried implementing Elman's simple recurrent nets on an
Intel Hypercube using data parallelism (a copy of the net
at each node, each getting a part of the training set).

I found is was as fast as a bat out of h**l, but as many times
faster as it was, it was also as many times SLOWER at converging,
leading to a net gain of 0!

g.

PS I did not try conjugate gradient, or back propping more steps
in time, which probably would have helped convergence lots.

From orilex at crl.ucsd.edu  Wed Oct 16 15:33:44 1991
From: orilex at crl.ucsd.edu (Roy Higginson)
Date: Wed, 16 Oct 91 12:33:44 PDT
Subject: address for Sanger
Message-ID: <9110161933.AA21258@crl.ucsd.edu>


Can someone give me an e-mail address for Dennis Sanger AT&T Bell/Univ
of CO at Boulder?

Thanks, Higginson

From ajr at eng.cam.ac.uk  Wed Oct 16 17:48:31 1991
From: ajr at eng.cam.ac.uk (Tony Robinson)
Date: Wed, 16 Oct 91 17:48:31 BST
Subject: TR available: Phoneme recognition with recurrent networks
Message-ID: <16687.9110161648@dsl.eng.cam.ac.uk>

***Do not forward to other bboards***

I've recently completed a technical report on connectionist phoneme
recognition which I would like to make available to interested researchers.
It describes a series of changes which have been made to tidy up a previously
published system.  Copies of the technical report may be obtained courtesy of
Jordan Pollack by anonymous ftp from archive.cis.ohio-state.edu in the
directory /pub/neuroprose as file robinson-tr82.ps.Z.  If this option is not
available to you, or if you would like a reprint of the background article,
please send me email giving your full address.

Tony [Robinson]

Cambridge University Engineering Department, Trumpington Street, Cambridge, UK
------------------------------------------------------------------------------

	Several Improvements to a Recurrent Error Propagation Network
			  Phone Recognition System
				      
				Tony Robinson
			      ajr at eng.cam.ac.uk
			     CUED/F-INFENG/TR.82
			      30 September 1991

Recurrent Error Propagation Networks have been shown to give good performance
on the speaker independent phone recognition task in comparison with other
methods [Robinson and Fallside, Computer Speech and Language, July 1991].
This short report describes several recent improvements made to the existing
recogniser for the TIMIT database.

The improvements are: an addition to the preprocessor to represent voicing
information; use of histogram normalisation on the input channels of the
network; normalisation of the output channels to enforce unity sum; a change
in the cost function to give equal weighting to each target symbol; a change
in the representation of the outputs to reduce quantisation errors;
retraining on the complete TIMIT training set; and the better estimation of
HMM phone models.

Most of these changes decrease the number of arbitrary parameters used and
allow for the integration of the system with standard HMM techniques.  The
result of these changes is a decrease in the number of errors by about 16%
(from 36.5% to 30.7% when all 61 TIMIT phones are used and from 30.2% to
25.0% on a reduced 39 phone set).

From shams at maxwell.hrl.hac.com  Wed Oct 16 17:23:42 1991
From: shams at maxwell.hrl.hac.com (shams@maxwell.hrl.hac.com)
Date: Wed, 16 Oct 91 14:23:42 PDT
Subject: batch-mode parallel implementations
Message-ID: <9110162123.AA08260@maxwell.hrl.hac.com>

We have exploited the "epoch" training method for implementing back-prop on a 
2-D systolic array processor of Hughes [1,2].  There are two basic problems 
with this approach.  First, there are only a limited number of models that allow 
for epoch training (e.g. back-prop).  Second,  this type of parallelism is not 
useful during recall or classification cycle since there is only a single input 
pattern to be evaluated (unless the input data rate exceeds the processor 
throughput enabling the input data to be buffered for batch processing).  As the 
number of neurons used in real-world applications continue to increase,  there 
would be enough computation to keep all the processors busy without having to 
use epoch parallelism.
[1] S. Shams and K. W. Przytula, "Mapping of Neural Networks onto 
Programmable Parallel Machines," Proceedings of the Intern. Symp. on Circuits 
and Systems, New Orleans, LA, Vol. 4, pp. 2613-2617, 1990.

[2] S. Shams and K. W. Przytula. "Implementation of Multilayer Neural 
Networks on Parallel Programmable Digital Computers." In Parallel Algorithms 
and Architectures for DSP Applications. Ed. M. Bayoumi, Kluwer Academic 
Publishers, pp. 225-253, 1991.

Soheil Shams
Hughes Research Labs.

From karunani at CS.ColoState.EDU  Wed Oct 16 22:23:31 1991
From: karunani at CS.ColoState.EDU (n karunanithi)
Date: Wed, 16 Oct 91 20:23:31 MDT
Subject: HowtoScale
Message-ID: <9110170223.AA05027@zappa>

  Dear Connectionist,
    Some time back I posted the following problem in this news group and
    many people responded with suggestions and references. I thankful to
    all of them. I have summarized their responses and posting here to
    for other who might find it interesting. For completeness sake
    I have included my original posting as well.

******Issue raised:
Background:
-----------
   I have been using neural network models 
(both Feed-Forward Nets and Recurrent Nets) in a prediction
application and I am getting pretty good results. In fact
neural networks approach outperformed many well known analytic
models. Similar results have been reported by many researchers
in (chaotic) time series predictions. 

 Suppose that X is the independent variable and Y is the
dependent variable. Let (x(i),y(i)) represent a sequence 
of actual input/output values observed at 
time i = 0,1,2,..,t of a temporal process. Let further that both 
the input and the output variables are single dimensional variable and
can take on a sequence of +ve integers up to a maximum of 2000.
Once we train a network with the
history of the system up to time "t" we can use the network
to predict outputs y(t+h), h=1,..,n  for any future input x(t+h).
In my application I already have the complete sequence and
hence I know what is the maximum value for x and y.
Using these maximum I normalized both X and Y over a 0.1 to 0.9 range.
(Here I call such normalization as "scaled representation".)
Since I have the complete sequence it is possible for me to evaluate 
how good the networks' predictions are.

Now some basic issues: 
---------------------
1) How to represent these variables if we don't know in advance
what the maximum values are? 
 Scaled representation presupposes the existence of a maximum value.
 Some may suggest that a linear units can be used at the output layer
 to get rid of scaling. If so how do I represent the input variable?
 The standard sigmoidal unit(with temp = 1.0) gets saturated(or railed
 to 1.0) when the sum is >= 14. However one may suggest that changing 
 the output range of the sigmoidal can help to 
 get rid of saturation effect. Is it a correct approach?

2) In such prediction application, people (including me)
compare the predictive accuracy of neural networks with
that of parametric models(that are based on analytical reasons). 
But one main advantage with the parametric models is that
their parameters can be calculated using any of the following
parameter estimation techniques: least square,
maximum likelyhood, Bayesian, Genetic Algorithms or any other
method. These parameter estimation techniques do not require
any scaling, and hence there is no need for preguessing of the maximum values.
However with the scaled representation in neural networks one can
not proceed without making guesses about the maximum(or a future)
input and/or output. In many real life situations such guesses are
infeasible or dangerous. How do we address this situation?

____________________________________________________________________________
N.  KARUNANITHI              E-Mail: karunani at CS.ColoState.EDU
Computer Science Dept,       
Colorado State University,
Collins, CO 80523.           
____________________________________________________________________________


******Responses Received:

1)  Dr Huang at CMU
Date: Thu, 26 Sep 1991 11:40-EDT
From: Xuedong.Huang at SPEECH2.CS.CMU.EDU

I have several papers addressing the issues you raised. See for example:

[1] Huang, X : A Study on Speaker-Adaptive Speech Recognition" DARPA Speech
and Language Workshop, Feb , 1991, pp278-283
[2] Huang, X, K. Lee and A. Waibel: Connectionist speaker normlization and its
 applications to speech recognition", IEEE Workshop on NNSP,
 Princeton, Sept. 1991

X.D. Huang, PhD
Research Computer Scientist		
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 School of Computer Science		Tel:  (412) 268 2329  
 Carnegie Mellon University		Fax:  (412) 681 5739  
 Pittsburgh, PA 15213			Email: xdh at cs.cmu.edu 

=============================================================================

2) From Alexander at CUNY
Date: Thu, 26 Sep 91 14:45 EDT
From: TWOMBLY%JHUBARD.BITNET at CUNYVM.CUNY.EDU

In response to your question about scaling for sigmoidal units.....

  I ran into the same problem of not knowing the maximum value that my
input/output data would take at any particular time.  There were no  a priori
bounds that could be reasonably set, so the solution (in this case) was to get
rid of the sigmoidal activation function and replace it with one that did not
require any scaling.  The function I used was a clipped linear function - that
is, f(x) = 0. for x<0., and f(x) = x for x>0.  For my data this activation
function worked as well as the sigmoidal units (in some cases better) because
the hidden units never took advantage of the non-linearity in the upper range
of the sigmoid function.
  The only difficulty with this function is that it does not have a continuous
derivative at 0.  You can get around this problem by tacking on a 1/x type
function for x<0 that drops off very quickly.  This will provide a well
behaved, non-zero derivative for all parts of the activation function while
adding a negligable value to the output for x<0.  The actual function I
use is:

f(x) = x;                       x > 0.
f(x) = 1/(10**2 - x*10**4);     x < 0.

I hope this helps.

-Alexander
=============================================================================

3) Dr. Fahlman at CMU

Date: Thu, 26 Sep 91 22:20:14 -0400
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU

    1) How to represent these variables if we don't know in advance
    what the maximum values are? 
     Scaled representation presupposes the existence of a maximum value.
     Some may suggest that a linear units can be used at the output layer
     to get rid of scaling.

Right, I was about to suggest that.

     If so how do I represent the input variable?
     The standard sigmoidal unit(with temp = 1.0) gets saturated(or railed
     to 1.0) when the sum is >= 14. However one may suggest that changing 
     the output range of the sigmoidal can help to 
     get rid of saturation effect. Is it a correct approach?
    
For a non-recurrent network, the first layer of weights cand and usually
will scale the inputs for you.  You save some learning time and possible
traps if the inputs are in some reasonable range, but it really isn't
essential.  I'd advise adding a small constant (0.1 works well) to the
derivative of the sigmoid for all units so that you can recover if the unit
gets pinned to an extreme value.

I don't understand your second point, so I won't try to reply to it.

Scott Fahlman
Carnegie Mellon University

=============================================================================
4) Ian Fitchet at Birmingham University

Date: Fri, 27 Sep 91 03:43:40 +0100
From: Ian Fitchet <I.D.Fitchet at computer-science.birmingham.ac.uk>

 I'm no expert, but how about having two outputs: one is a control and
has a (mostly) fixed value; the other is the output y(i) which is
adjusted such that the one divided by the other gives the required
result.  Off the top of my head, have the control output 0.9 most of
the time, when the value of y(i) goes above unity have y(i) = 0.9 and
the control decrease, so that if the control equalled 0.45, say, then
the real value of the output would be 0.9/0.45 = 2.0 .

 Of course the question is then, how do I train the nextwork to set
the value of the control?  But I leave that as an exercise... :-)

Cheers,

	Ian

--
Ian Fitchet     				     I.D.Fitchet at cs.bham.ac.uk
School of Computer Science
Univ. of Birmingham, UK, B15 2TT
 "You run and you run to catch up with the sun, but it's sinking"  Pink Floyd

=============================================================================

5) From Dermot O'Brien at the University of Edinburgh 

Date: Fri, 27 Sep 91 10:32:31 WET DST
Sender: dob at castle.edinburgh.ac.uk

You may be interested in the following references (if you havn't read them
already):

@techreport{Lapedes:87,
   Author = "Alan S. Lapedes and Robert M. Farber",
   Title  = "Nonlinear signal processing using neural networks: prediction
   and system modelling",
   Institution = "Los Alamos National Laboratory", Year = 1987,
   Number = "LA-UR-87-2662"}

@incollection{Lapedes:88,
   Author = "Alan S. Lapedes and Robert M. Farber",
   Title  = "How Neural Nets Work",
   BookTitle = "Evolution, Learning, and Cognition", Pages = {331--346},
   Editor = "Y.C Lee", Year = 1988, Publisher = "World Scientific",
   Address = "Singapore"}

The above papers analyse the behaviour of feed-forward neural networks
applied to the problem of time series prediction, and make an
interesting analogy with Fourier decomposition.

Cheers,

Dermot O'Brien
Physics Department
University of Edinburgh
The King's Buildings
Mayfield Road
Edinburgh EH9 3JZ
Scotland
=============================================================================

6) From: Tony Robinson <ajr at eng.cam.ac.uk>
Date: Fri, 27 Sep 91 12:23:23 BST

My immediate advice is:

  Don't put the input through a nonlinearity at the start of the network.
  Use linear output units.
  Allow a linear path through the system so that if a linear solution to the
    problem is possible then this is a possible network solution.

Then you will have no problems with maximum values.

Tony [Robinson]
=============================================================================

End of summary.
____________________________________________________________________________
N.  KARUNANITHI              E-Mail: karunani at CS.ColoState.EDU
Computer Science Dept,       
Colorado State University,
Collins, CO 80523.           
____________________________________________________________________________

From thomasp at informatik.tu-muenchen.dbp.de  Mon Oct 14 05:17:00 1991
From: thomasp at informatik.tu-muenchen.dbp.de (Thomas)
Date: 14 Oct 91 10:17 +0100
Subject: report available
Message-ID: <91Oct14.101724met.34256(a)gshalle1.informatik.tu-muenchen.de>


From khosla at latcs1.lat.oz.au  Thu Oct 17 04:00:31 1991
From: khosla at latcs1.lat.oz.au (Rajiv Khosla)
Date: Thu, 17 Oct 91 18:00:31 +1000
Subject: Spatial crosstalk and modular NN architechture
Message-ID: <9110170800.AA00862@latcs1.lat.oz.au>


Dear Connectionists,

            Can anyone enlighten me on the following.

         I have to model a problem with 28 discrete inputs(1's and 0's) and
26 discrete outputs. Infact, these 26 discrete outputs can be represented by
5 normalized continous outputs also.

               Now, I have no problem modelling it as a 28-11-5  network using
Scott Fahlman's quickprop . However, I get into all sorts of problems when I
have to model 28-?-26 network(? stands for any no. of hid. units. I tried upto
104). Sometime back, I read a paper on modular NN architechtures which suggested
that because of spatial crosstalk  one should have dedicated or independent links between hidden units and each output unit. This would result in faster
training and better generalization. I tried this architechture by making suitable changes in the quickprop algorithm but to no avail. There is no improvement
over the standard architechture vis-a-vis training. Infact, things seemed to get
slightly worse. I tried with 2,3,4 sets(that is, in all 52,78,104 hid. units 
resp.) of hid. units per output unit. I gave up after about 5000 epochs as I
couldn't see any significant improvement in the total error. 

            Has anyone used the modular architechture in a similar situation
with large number of output  nodes with positive results? Am I doing something
wrong? Is there any other solution  except making the outputs continous and
reducing the number of output nodes?

               I have only recently started reading this group. So, Pl. excuse
the naiveity of the questions if any.

  Please e-mail  your replies to khosla at latcs1.lat.oz.au

                         Thanks in advance,
                                                    Rajiv

From neural!lamoon.neural!yann at att.att.com  Thu Oct 17 10:46:39 1991
From: neural!lamoon.neural!yann at att.att.com (neural!lamoon.neural!yann@att.att.com)
Date: Thu, 17 Oct 91 10:46:39 -0400
Subject: batch-mode parallel implementations 
Message-ID: <9110171446.AA19788@lamoon>


Several years ago, Steve Nowlan and I implemented a "batch-mode"
vectorized backprop on a Cray. Just as in Gary Cottrell's story, the
raw CUPS rate was high, but because batch mode converges so much slower
than on-line, the net gain was 0.

I think Patrick Haffner and Alex Waibel had a similar experience
with their implementations of TDNNs on the Alliant. 

Now, the larger, and more redundant the dataset is, the larger the difference
in convergence speed between on-line and batch.  
For small (and/or random) datasets, batch might be OK, but who cares.  
Also, if you need a very high accuracy solution (for function approximation
for example), a second-order batch technique will probably be better than
on-line.

Sadly, almost all speedup techniques for backprop only apply to batch (or
semi-batch) mode. That includes conjugate gradient, delta-bar-delta, most
Newton or Quasi-Newton methods (BFGS...), etc... 

I would love to see a clear demonstration that any of these methods beats a
carefully tuned on-line gradient on a large pattern classification problem.
I tried many of these methods several years ago, and failed.

I think there are two interesting challenges here:
1 - Explain theoretically why on-line is so much faster than batch
    (something that goes beyond the "multiple copies" argument).
2 - Find more speedup methods that work with on-line training.

  -- Yann  Le Cun

From kamil at apple.com  Thu Oct 17 12:59:21 1991
From: kamil at apple.com (Kamil A. Grajski)
Date: Thu, 17 Oct 91 09:59:21 -0700
Subject: batch & on-line training
Message-ID: <9110171659.AA23721@apple.com>

The consensus opinion seems to be that on-line learning is preferred
for situations consisting of a classification problem with a large
(possibly redundant) dataset.  What appears to have been a common
experience is that batch-mode training generates impressive MCUP
statistics, but convergence is slower enough that the net gain is 0.
It is difficult to make a scientific judgement still, mostly because
the evidence appears to be largely anecdotal, e.g., "I really tried
hard to make one (batch, or on-line) work, and it beat the other."

It has been observed that several algorithms for accelerating
convergence are designed for (semi-)batch mode.  Were these to be
seriously evaluated, would the net gain 0 still occur?  On the other
hand, with more work could on-line methods widen their apparent
superiority?

I don't think that we're splitting hairs by addressing this issue.
One trend in the implementations side of NNs is to have the highest
MCUPS performance.  In several instances, this is achieved using
mappings/architectures which rest on batch-mode training.  I think
that one might design a neurocomputer differently depending on which
training mode is to be used, e.g., the communication vs computation
curves are different.  So, at the moment, in certain instances, we've
actually put the cart before the horse.  We have fast batch implemen-
tations.  Do we make batch-mode training better, or can we make on-line
so fast and so optimally design a machine that the issue is moot?
(I'm ignoring the (possibly substantial) conflicting requirements
between training & recognition modes, here.)

In any event, it seems that folks are having success doing either
in different situations.  However, there doesn't seem to be a
compelling argument for preferring one or the other IN PRINCIPLE.

Cheers,
Kamil

From dlukas at park.bu.edu  Thu Oct 17 12:58:42 1991
From: dlukas at park.bu.edu (David Lukas)
Date: Thu, 17 Oct 91 12:58:42 -0400
Subject: Graduate study in Cognitive & Neural Systems at Boston University
Message-ID: <9110171658.AA15628@park.bu.edu>

(please post)

         ***********************************************
         *                                             *
         *                 DEPARTMENT OF               *
         *      COGNITIVE AND NEURAL SYSTEMS (CNS)     *
         *              AT BOSTON UNIVERSITY           *
         *                                             *
         ***********************************************

                    Stephen Grossberg, Chairman

The Boston University Department of Cognitive and Neural Systems
offers comprehensive advanced training in the neural and computational
principles, mechanisms, and architectures that underly human and
animal behavior, and the application of neural network architectures
to the solution of outstanding technological problems.

Applications for Fall, 1992 admissions and financial aid are now
being accepted for both the MA and PhD degree programs.

To obtain a brochure describing the CNS Program and a set of application
materials, write or telephone:

 Department of Cognitive & Neural Systems
 Boston University
 111 Cummington Street, Room 240
 Boston, MA 02215
 (617) 353-9481

or send a mailing address to: kellyd at cns.bu.edu

Applications for admission and financial aid should be received by
the Graduate School Admissions Office no later than January 15.

Applicants are required to submit undergraduate (and, if applicable,
graduate) transcripts, three letters of recommendation, and Graduate
Record Examination (GRE) scores. The Advanced Test should be in the
candidate's area of departmental specialization. GRE scores may be
waived for MA candidates and, in exceptional cases, for PhD candidates,
but absence of these scores may decrease an applicant's chances for
admission and financial aid.

Description of the CNS Department:

The Department of Cognitive and Neural Systems (CNS) provides advanced
training and research experience for graduate students interested in the neural 
and computational principles, mechanisms, and architectures that underlie human
and animal behavior, and the application of neural network architectures to the
solution of outstanding technological problems. Students are trained in a broad
range of areas concerning cognitive and neural systems, including vision and 
image processing; speech and language understanding; adaptive pattern
recognition; cognitive information processing; self-organization; associative
learning and long-term memory; cooperative and competitive network dynamics and
short-term memory; reinforcement, motivation, and attention; adaptive
sensory-motor control and robotics; and biological rhythms; as well as the
mathematical and computational methods needed to support advanced modeling
research and applications. The CNS Department awards MA, PhD, and BA/MA degrees.

The CNS Department embodies a number of unique features. It has
developed a core curriculum that  
consists of ten interdisciplinary graduate courses each of which 
integrates the psychological, neurobiological, mathematical, and computational 
information needed to theoretically investigate fundamental issues concerning 
mind and brain processes and the applications of neural networks to technology.
Additional advanced courses, including research seminars, are also offered.
Each course is typically taught once a week in the evening to make the program
available to qualified students, including working professionals, throughout
the Boston area. Students develop a coherent area of expertise by designing a 
program that includes courses in areas such as Biology, Computer Science, 
Engineering, Mathematics, and Psychology, in addition to courses in the CNS
core curriculum.

The CNS Department prepares students for thesis research with scientists 
in one of several Boston University research centers or groups, and with 
Boston-area scientists collaborating with these centers. The unit most closely 
linked to the department is the Center for Adaptive Systems. The Center for 
Adaptive Systems is also part of the Boston Consortium for Behavioral and 
Neural Studies, a Boston-area multi-institutional Congressional Center of 
Excellence. Another multi-institutional Congressional Center of Excellence 
focused at Boston University is the Center for the Study of Rhythmic 
Processes. Other research resources include distinguished research groups in
neurophysiology, neuroanatomy, and neuropharmacology at the Medical
School and the Charles River campus; in sensory robotics, biomedical
engineering, computer and systems engineering, and neuromuscular research 
within the Engineering School; in dynamical systems within the mathematics
department; in theoretical computer science within the Computer Science
Department; and in biophysics and computational physics within the Physics
Department. 

1991 FACULTY and STAFF of CNS and CAS:

Daniel H. Bullock         Nancy Kopell       
Gail A. Carpenter         John W.L. Merrill
Michael A. Cohen          Ennio Mingolla
H. Steven Colburn         Alan Peters
Paolo Gaudiano            Adam Reeves
Stephen Grossberg         James T. Todd
Thomas G. Kincaid         Allen Waxman


From MURTAGH at SCIVAX.STSCI.EDU  Thu Oct 17 15:29:25 1991
From: MURTAGH at SCIVAX.STSCI.EDU (MURTAGH@SCIVAX.STSCI.EDU)
Date: Thu, 17 Oct 1991 15:29:25 -0400 (EDT)
Subject: Workshop: Par. Prob. Solving: Applns. in Statistics & Economics
Message-ID: <911017152925.28c128fa@SCIVAX.STSCI.EDU>

Workshop Announcement and Call for Papers:

"Parallel Problem Solving From Nature: Applications in Statistics & Economics".
-------------------------------------------------------------------------------
Interdisciplinary Project Center for Supercomputing, ETH, Zurich, Switzerland.
December 10-11, 1991.

Support/Sponsorship: DOSES/Statistical Office of the European Communities;
IPS, ETH Zurich; Konjunkturforschungsstelle, ETH Zurich; MasPar Distributor
AG Zurich; PAR, Schweizerische Informatiker Gesellschaft; Parsytec GmbH,
Aachen; QT optec AG, Zug; Schweizerischer Bankverein, Basel, IBM Switzerland.

Program Committee: J. Frain (Central Bank of Ireland), K. Kirchmayr 
(Schweizerischer Bankverein, Basel), F. Murtagh (Munotec Systems, Munich and 
Dublin), P. Van Nypelseer (DOSES/EUROSTAT, Luxembourg), U. Reimer 
(Rentenanstalt Zuerich), M.M. Richter (DFKI Kaiserslautern), W. Roth 
(Konjunkturforschungsstelle ETH, Zurich), D. Wuertz (IPS, ETH Zurich), and
H.G. Zimmermann (Siemens, Munich).

Invited Speakers: J. Bernasconi (ABB Corp. Research, Baden), A. Colin (Citibank,
London), F. Fogelman-Soulie (MIMETICS, Chatenay Malabry), J. Frain (Central Bank
of Ireland), H. Horner (Universitaet Heidelberg), H. Muehlenbein (GMD, Sankt
Augustin, Bonn), F. Murtagh (Munotec Syst., Munich), M.B. Priestley (UMIST
Manchester), R. Rohwer (CSTR University of Edinburgh), C. Schaefer (Rowland 
Inst. of Science, Cambridge MA), P. Treleaven (University College London),
A. Varfis (Joint Research Center, Ispra), H.-M. Wallmeier (IBM Scientific
Center, Heidelberg), D. Weers (Aspen Intellect, Zug), A. Weigend (Stanford
University) D. Wuertz (IPS, ETH Zurich), H.G. Zimmermann (Siemens, Munich).

Registration: SFr 400 for those from profit-making companies; otherwise SFr 150.
A limited fund will be available to support younger participants who would not
otherwise be able to attend.  Late registration, after November 1, additional
SFr 50.  Remittance (only Swiss Francs) to: PASE-Workshop - Dr. Diethelm
Wuertz, Schweizerischer Bankverein, Zurich.  Acccount number: P0-206066.0.
Accommodation requests: directly to: Verkehrsverein Zurich (VVZ), Kongressbuero,
Postfach, CH-8023 Zurich, Switzerland (Tel: + 41 1 211-1256).

Contact Point:
Dr. Diethelm Wuertz, IPS ETH Zurich, ETH Zentrum, CLU B3, CH-8092 Zurich,
Switzerland.  Fax: + 41 1 252-0185. Email: wuertz at ips.ethz.ch
or the undersigned.

Abstract: 1 page, by November 1.

F.D. Murtagh
murtagh at scivax.stsci.edu

From dominic at DEBUSSY.CODA.CS.CMU.EDU  Thu Oct 17 16:21:08 1991
From: dominic at DEBUSSY.CODA.CS.CMU.EDU (Chioccioli)
Date: Thu, 17 Oct 91 14:21:08 -0600
Subject: No subject
Message-ID: <9110172021.AA24272@debussy.cs.colostate.edu>

This posting briefly describes my interest in parallel learning
algorithms for neural networks. 

Currently I am investigating the following two aspects of
parallel reinforcement learning algorithms for sequential decision
tasks:


1) Multiple nets on multiple task simulations.

   Our goal here is to combine multiple-simultaneous experiences
   to reduce the wall-clock time required to learn a task.


2) Multiple nets on single task simulation.

   This paradigm assumes that multiple simulations cannot be 
   run, however, parallel search of the (single) experience 
   space obtained from running a single simulation can be used 
   to reduce the total number of trials (i.e. simulated experiences)
   required for learning.

Several different algorithms will be attempted for both of the
above tasks.

I am interested in hearing from others who may also be doing research
in parallel learning algorithms for neural networks. 
Pointers to relevant publications or references will be most helpful.


thanks, in advance for any responses. 

I will post a summary of any references I receive provided
that this is not a repeated request and that sufficient 
response is forthcoming.

Regards, Steve Dominic

dominic at debussy.cs.colostate.edu
Colorado State University
Computer Science Dept.

From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu  Thu Oct 17 16:01:33 1991
From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu)
Date: Thu, 17 Oct 91 15:01:33 EST
Subject: Paper
Message-ID: <mailman.455.1149591235.29955.connectionists@cs.cmu.edu>

Two days ago I posted the abstract of our paper "PPNN: A Faster Learning
and Better Generalizing Neural Net".  Because the paper will appear in the
proceeding of IJCNN'91-SINGAPORE, I thought it would be not necessary to
place it in the neuroprose.  However, since the posting, I have received
large amounts of messages requesting a copy of the paper, and the request
is still going on.  Because I had no preparation for this, I was unable to
answer all of the messages in time.  Please excuse me for any possible delay
and errors in replying your requests.

Thanks to many colleagues suggestion, I am going to place the paper to
neuroprose archive.  I will provide the procedures for reaching it
at cheops of Ohio State when it is ready.  I will be happy to send hardcopy
to those having no access to FTP.

Bo Xu
Indiana University
itgt500 at indycms.iupui.edu

From khosla at latcs1.lat.oz.au  Thu Oct 17 20:49:38 1991
From: khosla at latcs1.lat.oz.au (khosla@latcs1.lat.oz.au)
Date: Fri, 18 Oct 91 10:49:38 +1000
Subject: Pl. Ignore
Message-ID: <9110180049.AA28265@latcs2.lat.oz.au>

This is a test

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Fri Oct 18 02:10:29 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Fri, 18 Oct 91 02:10:29 -0400
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Thu, 17 Oct 91 10:46:39 -0500.
             <9110171446.AA19788@lamoon> 
Message-ID: <mailman.456.1149591235.29955.connectionists@cs.cmu.edu>


    Yann LeCun writes:

    Now, the larger, and more redundant the dataset is, the larger the difference
    in convergence speed between on-line and batch.  
    For small (and/or random) datasets, batch might be OK, but who cares.  

I think that it may be misleading to lump together "large" and "redundant"
as if they were the same thing, or as if they were inseparable.  I agree
that for highly redundant datasets, continuous updating has an advantage.
I also agree that for small datasets, we don't care much about speed.  But
it seems to me that it is possible to have a large, not-very-redundant data
set, and that accelerated batch methods should have an advantage for these.

I guess you could measure redundancy by seeing if some subset of the
training data set produces essentially the same gradient vector as the full
set.  Probably statisticians have good ways of talking about this
redundancy business -- unfortunately, I don't know the right vocabulary.
In a data set with noise, you need a big enough training set to raise
relatively rare but real features above the level of the random background
noise.  If you have roughly that much data, I bet fast batch techniques
would win; if you have a training set that is several times this minimal
size, then continuous updating would win.  That's my suspicion, anyway.

    I would love to see a clear demonstration that any of these methods beats a
    carefully tuned on-line gradient on a large pattern classification problem.
    I tried many of these methods several years ago, and failed.
    
Well, if my hypothesis above is right, we could demonstrate this by finding
a dataset that is large enough to make you happy, but not highly redundant.
I guess that we could create this by taking any large dataset, measuring
its redundancy, and trimming it down to minimal size (assuming that the
result still can be classified as large).  Do you know of any big sets that
would qualify?  It should preferably a relatively "pure" N-input
data-classification problem, without all the additional issues (e.g.
translation invariance) that are present in image-processing and
speech-processing tasks.

    I think there are two interesting challenges here:
    1 - Explain theoretically why on-line is so much faster than batch
        (something that goes beyond the "multiple copies" argument).
    2 - Find more speedup methods that work with on-line training.
    
I have a hunch that if we work hard enough on speeding up online training,
we'll end up with something whose NET EFFECT is equivalent to the following:

1. Accumulate gradient data for a length of time that is adaptively chosen:
   Large enough for the gradients to be stable and accurate, but not large
   enough to be redundant.

2. Use something equivalent to one of the batch-processing acceleration
   techniques on this smoothed gradient.

That's not to say that the technique will necessary do this in an obvious
way -- it may be twiddling the weights each time a sample goes by -- but I
suspect this kind of accumulation, smoothing, and acceleration will be
present at some level.  As I said, for now this is just a hunch.

-- Scott Fahlman

P.S. I avoid using the term "on-line" for what I call "per-sample" or
"continuous" updating of weights.  For me, "online" means something else.
At this moment, I am sitting at my workstation watching one of my
batch-updating algorithms running "on-line" in front of me.

From smagt at fwi.uva.nl  Fri Oct 18 09:23:07 1991
From: smagt at fwi.uva.nl (Patrick van der Smagt)
Date: Fri, 18 Oct 91 14:23:07 +0100
Subject: Spatial crosstalk and modular NN architechture
Message-ID: <9110181323.AA28643@fwi.uva.nl>

>         I have to model a problem with 28 discrete inputs(1's and 0's) and
>26 discrete outputs. Infact, these 26 discrete outputs can be represented by
>5 normalized continous outputs also.

If one would want to model any kind of function, why go for the least
obvious solution via a neural network first?  Since your problem is
binary, too, I would first create a much simpler method such as 
k-nearest-neighbour or any bin approach which would enable one to
gain an understanding of the data and the overlap.  Ten years ago
this would have been a more standard approach, instead of using a
black box (aka neural network).

The reason that I would _not_ immediately grasp a network to do some
function-approximation is that I have seen too many people choke
on the fact that they do not understand their data, or the complexity
of the data, a reasonable ratio #degrees of freedom:#learning samples,
etc.

				Patrick van der Smagt

From xiru at Think.COM  Fri Oct 18 10:42:04 1991
From: xiru at Think.COM (xiru Zhang)
Date: Fri, 18 Oct 91 10:42:04 EDT
Subject: batch & on-line training
In-Reply-To: "Kamil A. Grajski"'s message of Thu, 17 Oct 91 09:59:21 -0700 <9110171659.AA23721@apple.com>
Message-ID: <9110181442.AA03133@yangtze.think.com>


   Date: Thu, 17 Oct 91 09:59:21 -0700
   From: "Kamil A. Grajski" <kamil at apple.com>

   The consensus opinion seems to be that on-line learning is preferred
   for situations consisting of a classification problem with a large
   (possibly redundant) dataset.  What appears to have been a common
   experience is that batch-mode training generates impressive MCUP
   statistics, but convergence is slower enough that the net gain is 0.
   It is difficult to make a scientific judgement still, mostly because
   the evidence appears to be largely anecdotal, e.g., "I really tried
   hard to make one (batch, or on-line) work, and it beat the other."

I have used per-epoch training on an auto-association netowrk, to extract
"features" of protein local structures, using as few hidden units as
possible. I spent a lot of time to fine-tune the training process, such as
using different learning rate at different stage of training, different
momentum term, different range of random weights at the beginning, how
large each "batch" is, etc. At the end I got a pretty good convergence
rate. (Maybe I did not spend enough effert to fine-tune the per-sample
training.) My feeling is that training a large network with lots of
examples is still an art. You can almost always improve it if you spend
time on it. Per-epoch training may have somewhat different behavior than
per-sample training. So different training schedule is often needed. And it
takes time to figure out what is a good one. It also critically depends on
the particular problem you want to solve. 

Besides the issue of convergence rate, I wonder if people have compared
networks trained by per-epoch schedule and per-sample schedule, to see if
they have the same level of generalization. One thing I noticed in my work
is that per-sample training tends to make certain weights much larger than in
per-epoch training. But I am not sure if this is true in general. 


- Xiru Zhang

From neural!lamoon.neural!yann at att.att.com  Fri Oct 18 11:08:03 1991
From: neural!lamoon.neural!yann at att.att.com (neural!lamoon.neural!yann@att.att.com)
Date: Fri, 18 Oct 91 11:08:03 -0400
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Fri, 18 Oct 91 02:10:29 -0400.
Message-ID: <9110181508.AA00547@lamoon>


   Scott Fahlman writes:

    >I avoid using the term "on-line" for what I call "per-sample" or
    >"continuous" updating of weights.

I personally prefer the phrase "stochastic gradient" to all of these.

   >I guess you could measure redundancy by seeing if some subset of the
   >training data set produces essentially the same gradient vector as the full
   >set.

Hmmm, I think any dataset for which you expect good generalization is redundant.
Train your net on 30% of the dataset, and measure how many of the remaining
70% you get right. If you get a significant portion of them right, then
accumulating gradients on these examples (without updating the weights) would
be little more than a waste of time.

This suggests the following (unverified) postulate:
 The better the generalization, the bigger the speed difference between
 on-line (per-sample, stochastic....) and batch.

In other words, any dataset interesting enough to be learned (as opposed to
stored) has to be redundant.
There might be no such thing as a large non-redundant dataset that is worth 
learning.

  -- Yann

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Fri Oct 18 12:38:38 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Fri, 18 Oct 91 12:38:38 -0400
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Fri, 18 Oct 91 11:08:03 -0500.
             <9110181508.AA00547@lamoon> 
Message-ID: <mailman.457.1149591235.29955.connectionists@cs.cmu.edu>


    Original-From: Yann le Cun <yann at lamoon.neural>
    I personally prefer the phrase "stochastic gradient" to all of these.
    
That's a fine term, but it seems to me that it refers to one of the effects
of per-sample updating, and not to the mechanism itself.  You might get a
"stochastic gradient" because you are updating after every randomly chosen
sample, but you might also get it from noise in the samples themselves.  So
if you want to refer to the choice of updating mechanism, and not to the
quality of the gradient, I think it's better to use a term like "per-sample
updating" that is nearly impossible for the reader to misunderstand.

       >I guess you could measure redundancy by seeing if some subset of the
       >training data set produces essentially the same gradient vector as the full
       >set.
    
    Hmmm, I think any dataset for which you expect good generalization is redundant.
    Train your net on 30% of the dataset, and measure how many of the remaining
    70% you get right. If you get a significant portion of them right, then
    accumulating gradients on these examples (without updating the weights) would
    be little more than a waste of time.
    
    This suggests the following (unverified) postulate:
     The better the generalization, the bigger the speed difference between
     on-line (per-sample, stochastic....) and batch.
    
    In other words, any dataset interesting enough to be learned (as opposed to
    stored) has to be redundant.
    There might be no such thing as a large non-redundant dataset that is worth 
    learning.
    
I think we may be talking about two different things here.  Let's assume
that there is some underlying distribution that we are trying to model, and
that we take some number of samples from this distribution to use as a
training set.  It is clearly true that there must be some "redundancy" in
the underlying distribution if it is to be worth modelling.  In this case,
I'm using the term "redundancy" to mean that there's some sort of regular
statistical structure that is stable enough to be of predictive value.  Put
another way, the distribution must not be totally random-looking; it has
less than the maximum possible information per sample.

However, given one of these redundant underlying distributions, we want to
choose a training set that is large enough to be representative of the
distribution (and to separate signal from noise), but not so large as to be
redundant itself.  This training set is what I was referring to in my
earlier message.  I think it is quite possible for the training set to be
large, not internally redundant, and interesting in the sense that it
models an predictable (redundant) underlying distribution.  And this is the
kind of case where I think that batch-updating has an advantage.

-- Scott Fahlman

From english at sun1.cs.ttu.edu  Fri Oct 18 14:20:19 1991
From: english at sun1.cs.ttu.edu (Tom English)
Date: Fri, 18 Oct 91 13:20:19 CDT
Subject: batch-mode parallel implementations
Message-ID: <9110181820.AA00593@sun1.cs.ttu.edu>

Scott Fahlman remarked,

> As for speed of convergence, continuous updating clearly beats per-epoch
> updating if the training set is highly redundant.

Another important factor is the autocorrelation of the training sequence.
Consider a (highly redundant) training sequence that starts with 1000
examples of A and ends with 1000 examples of B.  With continuous updating,
there is a good chance that learning the B examples will cause the learned
response to A examples to be lost.  The obvious answer, in this contrived
case, is to alternate presentations of A and B examples.

Now for an uncontrived case:  Suppose we are training a recurrent net for
speaker-independent speech recognition, and that inputs to the net are
power spectra extracted from the speech signal at fixed intervals.  There
are relatively long intervals in which the speech sound (spectrum) does
not change much.  There are even longer intervals in which the speaker
does not change.  Reordering the spectra for an utterance is clearly
not an option, and continuous updating seems imprudent even though the
redundancy of the training set is high.  I'm sure there are plenty of
nonstationary time series, other than speech, which present the same
problems.

In response to Scott's remark on the batch size used with an accelerated
convergence procedure,

> It must be sufficiently large to give a reasonably stable picture of
> the overall gradient, but not so large that the gradient is computed
> many times over before a weight-update cycle occurs.

I would like to mention a case where, surprisingly, even large batches
gave instability.  The application was recognition of handwritten
lower-case letters, and the network was of the LeCun variety.  The
training set comprised three batches of 1950 letter images (a total of
5850 images).  This partition was chosen randomly.  Fahlman's quickprop
behaved poorly, and with some close inspection I found a number of
weights for which the partial derivative was changing sign from one
batch to the next.  Further, the magnitudes of those partials were not
always small.  In short, the performance surfaces for the three batches
differed considerably.  The moral:  You may have to make a single batch
of the entire training set, even when working with fairly large training
sets.

-- Tom English
   english at sun1.cs.ttu.edu

From nowlan at helmholtz.sdsc.edu  Fri Oct 18 14:30:16 1991
From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan)
Date: Fri, 18 Oct 91 11:30:16 MST
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Thu, 17 Oct 91 10:46:39 -0400.
Message-ID: <9110181830.AA14145@bose>


A couple of clarifications with regards to Yann's post:

i) The dataset used in the comparison had a high degree of redundancy.

ii) The "batch-mode" back-prop was vanilla fixed-step gradient descent, not
    a second order method.

The issue of "batch" versus "on-line" is still a very open one. For relatively
small problems (for me < ~5000 cases) I prefer conjugate gradient because
of accuracy and no need to tune parameters. These techniques are also very
easy to parallelize over cases.

I have also implemented on a Cray a BP simulator that vectorized over
connections rather than cases, and could implement on-line or batch techniques
with ease. My experience here suggested that speed-ups could be obtained
when the network had as few as a few thousand connections.

		- Steve


From yoshua at psyche.mit.edu  Sat Oct 19 12:55:19 1991
From: yoshua at psyche.mit.edu (Yoshua Bengio)
Date: Sat, 19 Oct 91 12:55:19 EDT
Subject: online parallel implementation
Message-ID: <9110191655.AA12225@psyche.mit.edu>


This message concerns an attempt to apply some parallelism
to online back-propagation.

I had recently access to N = 20 to 40 NeXT workstations on which I could
perform learning experiments with back-propagation. My training database
was huge (TIMIT, more than half a million patterns, but
organized in sequences - sentences - of about 100 'frames' each),
so I did not want to use a batch-based method.

The idea I attempted to implement was the following:

Split the database into N copies.
Run N versions of the network on each of the N copies (on the N machines).
Share weights _asynchronously_ among the networks, after 1 or more sequence.

A 'server' program running on a separate machine received requests
from any of the other machines to collect its contribution
and return to it the current global moving average of the weights.

Since I was running backpropagation through time the weight
update was performed only after each sequence even in the
single machine implementation, hence the update was not
much less 'online' in the parallel implementation. 

Unfortunately, I don't have anymore access to these machines
- because I have moved to a new institution - and I didn't have
time to perform enough experiments and compare this approach
with others.

Yoshua Bengio
MIT

From honavar at iastate.edu  Sat Oct 19 13:30:33 1991
From: honavar at iastate.edu (honavar@iastate.edu)
Date: Sat, 19 Oct 91 12:30:33 CDT
Subject: redundancy (was Re: batch-mode implementations)
In-Reply-To: Your message of Fri, 18 Oct 91 11:08:03 -0400.
             <9110181508.AA00547@lamoon> 
Message-ID: <9110191730.AA07387@iastate.edu>


Scott Fahlman wrote: 

>>I guess you could measure redundancy by seeing if some subset of the
>>training data set produces essentially the same gradient vector as the full
>>set.

Yann Le Cun responded: 
> Hmmm, I think any dataset for which you expect good generalization is redunda
nt.
> Train your net on 30% of the dataset, and measure how many of the remaining
> 70% you get right. If you get a significant portion of them right, then
> accumulating gradients on these examples (without updating the weights) would
> be little more than a waste of time.

It is probably useful to distinguish between redundancy WITHIN the training set
and the redundancy BETWEEN the training and test sets (or, redundancy in
the combined training and test sets).  I suspect Scott Fahlman was  
refering to the redundancy (R1) within the training set while Le Cun 
was refering to the redundancy (R2) in the set formed by the union of
training set and test set (please correct me if I am wrong). I would
expect the relationship between generalization and R1 to be quite different
from the relationship between generaization and R2.  

Whether the two measures of redundancy will be the same or not will almost
certainly depend on the method(s) (e.g., sampling procedures, sample size 
reduction techniques) used to arrive at the data actually given to the
network during training. 
In fact, if a training set T (obtained say, by random sampling 
from some underlying distribution) were to be preprocessed in
some fashion (e.g., using statistical techniques) and reduced
training set T' was obtained from T after eliminating the "redundant" samples,
clearly the redundancy (R1') within the reduced training set T' will be much
smaller than the redundancy (R1) in the original training set T although the
overall redundancy (R2) in the set formed by the union of T and the test data
may be more or less equal to the redundancy (R2') in the set formed by the 
union of T' and the test data. My guess is that the generalization on the test 
data will be more or less the same irrespective of whether T or T' is used for
training the network.  

Vasant Honavar 
honavar at iastate.edu 


From nowlan at helmholtz.sdsc.edu  Sat Oct 19 15:05:24 1991
From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan)
Date: Sat, 19 Oct 91 12:05:24 MST
Subject: Paper Announcement (Neuroprose)
Message-ID: <9110191905.AA15742@bose>


    ** Paper available via Neuroprose ***************************************
    ** Please do not forward to other mailing lists or boards.  Thank you. **

The following paper has been placed in the Neuroprose
archives at Ohio State. The file is nowlan.soft-share.ps.Z
Ftp instructions follow the abstract.


      -----------------------------------------------------

                    Simplifying Neural Networks
                      by Soft Weight-Sharing

                       Steven J. Nowlan
                Computational Neuroscience Laboratory
                      The Salk Institute
                        P.O. Box 5800
                     San Diego, CA 92186-5800

                      Geoffrey E. Hinton
                 Department of Computer Science
                     University of Toronto
                    Toronto, Canada M5S 1A4


                            ABSTRACT:

One way of simplifying neural networks so they generalize better is to add an
extra term to the error function that will penalize complexity.  Simple
versions of this approach include penalizing the sum of the squares of the
weights or penalizing the number of non-zero weights.  We propose a more
complicated penalty term in which the distribution of weight values is
modelled as a mixture of multiple gaussians.  A set of weights is simple if
the weights have high probability densities under the mixture model.  This can
be achieved by clustering the weights into subsets with the weights in each
cluster having very similar values.  Since we do not know the appropriate
means or variances of the clusters in advance, we allow the parameters of the
mixture model to adapt at the same time as the network learns.  Simulations on
two different problems demonstrate that this complexity term is more effective
than previous complexity terms.

      -----------------------------------------------------
                        FTP INSTRUCTIONS

Either use "Getps nowlan.soft-share.ps.Z", or do the following:

     unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary
     ftp> get nowlan.soft-share.ps.Z
     ftp> quit
     unix> uncompress nowlan.soft-share.ps.Z
     unix> lpr -s nowlan.soft-share.ps (or however you print postscript)


Steven J. Nowlan
Computational Neuroscience Laboratory
The Salk Institute
P.O. Box 85800
San Diego, CA
92186-5800

Work Phone: 619-453-4100 X463
e-mail:  nowlan at helmholtz.sdsc.edu


From tgd at guard.berkeley.edu  Sat Oct 19 17:09:06 1991
From: tgd at guard.berkeley.edu (Tom Dietterich)
Date: Sat, 19 Oct 91 14:09:06 -0700
Subject: batch-mode parallel implementations
In-Reply-To: Tom English's message of Fri, 18 Oct 91 13:20:19 CDT <9110181820.AA00593@sun1.cs.ttu.edu>
Message-ID: <9110192109.AA04626@guard.berkeley.edu>

There has been a fair amount of work in decision-tree learning on the
issue of breaking large training sets into smaller batches.  In 1980,
Quinlan introduced a method called "windowing" in which a small sample
(or window) of the training data is initially drawn at random.  The
algorithm is trained on this window and then tested on the remainder of
the data (that was excluded from the window).  Then, some fraction of
the misclassified examples (possibly all of them) are added to the
window.

Generally speaking, in noise-free domains, windowing works quite well.
A very high-performing decision tree can be learned with a relatively
small window.  However, for noisy data, the general experience has
been that the window eventually grows to include the entire training set.
Jason Catlett (Sydney U) recently completed his dissertation on
testing windowing and various other related tricks on datasets of
roughly 100K examples (straight classification problems).  I recommend
his papers and thesis.

His main conclusion is that if you want high performance, you need to
look at all of the data.

--Tom

From ross at psych.psy.uq.oz.au  Sat Oct 19 19:50:16 1991
From: ross at psych.psy.uq.oz.au (Ross Gayler)
Date: Sun, 20 Oct 1991 09:50:16 +1000
Subject: batch & on-line training
Message-ID: <9110192350.AA02282@psych.psy.uq.oz.au>

On the topic of batch versus on-line training, Kamil at apple.com writes:

>  ... there doesn't seem to be a
> compelling argument for preferring one or the other IN PRINCIPLE.

I would like to turn the dichotomy into a trichotomy and argue that there
is an 'in principle' reason for a preference.

I want to add one-shot learning, which I define (on the spur of the moment)
to be successful learning from one occasion of exposure to the input.

This phenomenon is known to happen in animals (e.g. it can happen in taste
aversion conditioning) and can happen in humans (e.g. recognition of an
abstract painting seen only once before).  One-shot learning becomes critical
if you are trying to perform 'cognitive' tasks - when you learn the route to
a new office you don't need hundreds or thousands of exposures to get it
right.

Obviously, one-shot learning can't be expected to happen in all circumstances:
you have to be working in a constrained problem domain that can support it
and the learner has to have the background knowledge that will support what is
to be learned.  Most of the work that is done with backprop and its relatives
starts with near to a tabula rasa and all the time and effort goes into
creating the universe from only the input data.

Obviously, techniques do exist for one-shot learning: e.g. simple delta rule
with a learning rate of 1.  The problem is that they fail on the problems
that people regard as interesting - inputs non-orthogonal and hidden units
required.  The challenge is to find a one-shot learning algorithm that can
work on interesting problems.  I believe that this will require strong
architectural and problem data constraints.

I see the current heavy use of gradient-descent techniques as analogous to
the period in the history of AI when researchers looked for general problem
solving techniques that were universally applicable.  General techniques
worked on toy problems but rapidly bogged down on real problems.  In BP,
we have a technique for learning arbitrary mappings, and we pay for it with
excruciatingly slow learning.

To summarise: IF you want to perform cognitive tasks THEN 'in principle' one
shot learning is the only training regime that is acceptable (although slower
learning may be required to get the net to the point where it can learn in
one shot).  All you have to do is invent a good one-shot learning scheme :-).

Ross Gayler
ross at psych.psy.uq.oz.au

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Sun Oct 20 11:08:11 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Sun, 20 Oct 91 11:08:11 -0400
Subject: batch-mode parallel implementations 
In-Reply-To: Your message of Fri, 18 Oct 91 13:20:19 -0600.
             <9110181820.AA00593@sun1.cs.ttu.edu> 
Message-ID: <mailman.458.1149591236.29955.connectionists@cs.cmu.edu>


    I would like to mention a case where, surprisingly, even large batches
    gave instability.  The application was recognition of handwritten
    lower-case letters, and the network was of the LeCun variety.  The
    training set comprised three batches of 1950 letter images (a total of
    5850 images).  This partition was chosen randomly.  Fahlman's quickprop
    behaved poorly, and with some close inspection I found a number of
    weights for which the partial derivative was changing sign from one
    batch to the next.  Further, the magnitudes of those partials were not
    always small.  In short, the performance surfaces for the three batches
    differed considerably.  The moral:  You may have to make a single batch
    of the entire training set, even when working with fairly large training
    sets.
    
    -- Tom English

Note that it is OK to switch from one training set to another when using
Quickprop, but that every time you change the training set you *must* zero
out the prev-slopes and delta vectors.  This prevents to quadratic part of
the algorithm from trying to draw a parabola between two slopes that are
not closely related.  If you don't do this, that one step can badly mess up
the weights you've laboriously accumulated so far.  Of course, if you do
this after every sample, the quadratic acceleration never kicks in and you
end up with nothing more than plain old backprop without momentum.  If you
want to get any benefit from quickprop, you have to run each distinct
training set for at least a few cycles.

If you were aware of all that (it's unclear from your message) and still
experienced instability, then I would say that the batches, even though
they are fairly large, are not large enough to provide a fair
representation of the underlying distribution.

-- Scott

From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu  Sun Oct 20 19:55:51 1991
From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (Bo Xu)
Date: Sun, 20 Oct 91 18:55:51 EST
Subject: One-shot learning
Message-ID: <mailman.459.1149591236.29955.connectionists@cs.cmu.edu>

Ross Gayler wrote:

>To summarise: IF you want to perform cognitive tasks THEN 'in principle' one
>shot learning is the only training regime that is acceptable (although slower
>learning may be required to get the net to the point where it can learn in
>one shot).  All you have to do is invent a good one-shot learning scheme :-).

Although one-shot (-trial) learning may not be the only mode of learning in
our cognitive processes, it's true that the learning in our cognitive
processes will not take as many times (epochs) as current BPNN takes.
One-shot learning can be served as a goal and a criterion for learning schemes
in both cognitive learning processes as well as learning systems for practical
applications.

Our work on PPNN (I posted the abstract several days ago) was originally driven
by the one-trial learning.  Although PPNN has not reached one-trial learning,
it has stepped closer to it.

In order to contrast the topological effect, we constrained PPNN to be the
same as BPNN in all aspects except the topology. It was shown that the
stereotopology alone can increase the training times (epoches) by several
orders (due to the characteristics of PPNN's stereotopology, we used the
average training time instead of epochs to measure the rate of convergence).
It was found that the more difficult the problem is, the higher the order is.
This topological speedup lies in the fact that there is a cause of slowness
in the original planar topology of BPNN that cannot be accounted for by the
learning algorithm or units characteristics (no matter what learning algorithm
is used or what units responsive characteristics are employed, this cause of
slow learning always exists.  It is inherent to the planar topology of BPNN).


Bo Xu
Indiana University
itgt500 at indycms.iupui.edu

From mmoller at daimi.aau.dk  Mon Oct 21 08:13:06 1991
From: mmoller at daimi.aau.dk (mmoller@daimi.aau.dk)
Date: Mon, 21 Oct 91 13:13:06 +0100
Subject: Batch methods versus stochastic methods...
Message-ID: <9110211213.AA13826@sinope.daimi.aau.dk>


--- Concerning the discussion about batch-update versus stochastic
update.

The last about 6 month we have been working with online versus batch problems.
A preprint of a paper, which tries to describe why the stochastic methods in 
some instances are better than the deterministic batch mehods will soon be 
available via the neuroprose archive.  
The paper also introduces a new algorithm which combines the good properties
of the stochastic methods as well the batch methods.
Our results so far can be summarized as follows:

The redundancy of the trainingset plays as has been mentioned before
a very important role. It is not clear, however, how to define this
redundancy in a proper way. The usually definition of redundancy taken
from the information theory can give a hint about he redundancy but can 
not in any obvious way provide a precise defintion, because this would 
involve the information content of the trainingset as well as the 
internal dynamics (the structur) of the network. So when we discuss
the concept of redundancy we should be aware of that redundancy in the 
context of learning in feedforward networks is not very well defined.

Another very important issue which I think is even more important 
than the concept of redundancy is the structure of the error surface.
The "true" error surface which are given by the whole trainingset
is as we know often characterized by a large number of flat regions and 
very steep, narrow ravines. 
Batch methods operates in the true but very complex error
surfaces while stochastic methods operate in partial error surfaces which
are only approximations to the true error surface. So stochastic methods
makes a noisy, stochastic search in the true error surface which can
help them through the flat regions. One can think of the stochastic search
as a kind of "simulated annealing" approach where increase of error is also
allowed.

The algorithm we propose is based on a combination of the good
properties of stochastic and batch algorithms. The main idea is to
use a conjugate gradient algorithm on blocks of data (block-update or
semi-batch update). Because the conjugate gradient algorithms updates weights
with variable (and sometime large) stepsizes a validation scheme is used to
control the updates. Through a simple sample technique we estimate the 
probabillity that an update will decrease the total error. This probabillity
is then used to decide whether to update or not. 
The number of patterns needed in each block-update is a variable and 
controlled by an adaptive optimization scheme during training.

We have done some experiments with this approach on the nettalk problem. 
Our results so far shows that the approach decreases the error faster per
epoch than the stochastic backpropagation. More computation is however needed
per epoch. 
An interesting observation is that the number of blocks needed
to make an update is growing during learning so that after a certain
number of epochs the blocksize is equal to the number of patterns.
When this happens the algorithm is equal to a traditional batch-mode
algorithm and no validation is needed anymore.
In order to be able to draw some definite conclusions we need a few more 
experiments on different trainingsets.
Unfortunately, we do not have any datasets of the proper size.
So I would appreciate if anyone could inform me about where to find big 
datasets that are public available.

-- Martin M

-----------------------------------------------------------------------
Martin F. Moller	       	email: mmoller at daimi.aau.dk
Computer Science Department	phone: +45 86202711 5223
Aarhus University		fax:    +45 86135725
Ny Munkegade, Building 540
8000 Aarhus C
Denmark
----------------------------------------------------------------------


From giles at research.nec.com  Mon Oct 21 09:15:03 1991
From: giles at research.nec.com (Lee Giles)
Date: Mon, 21 Oct 91 09:15:03 EDT
Subject: Announcement of NIPS Workshop
Message-ID: <9110211315.AA19197@fuzzy.nec.com>


Announcement of NIPS Workshop:

**************************************************************************

            RECURRENT NETWORKS: THEORY AND APPLICATIONS

Recurrent neural networks have a very large potential for handling
dynamical / sequential problems, e.g. recognition and classification
of time-dependent signals like speech, modelling and control of
dynamical systems, learning of grammars and symbolic processing, etc.
However, the fulfillment of this potential remains one of the
important open issues in the neural network area. Training algorithms
are very inefficient in terms of memory demands, computational needs
or both. Little is known about convenient architectures for recurrent
networks. The number of known successful applications is very limited.

Even for static applications (operation in the "fixed point mode"),
recurrent networks are more general, and therefore more powerful, in
principle, than feedforward ones. However, once again, little is known
about their actual (dis)advantages, convenient architectures,
successful applications, etc.

We welcome proposals for presentations ( no more than one page in
length) related to the theme of theory or applications of recurrent networks. 
Subject to the number of received proposals, we envisage a two day workshop, 
one day theory, the next day applications, with 15-20 minute
presentations, each followed by about 10 minutes of discussion. 

Please send proposals to Lee Giles.

Organizers:

Professor Luis Borges de Almeida
INESC
Rua Alves Redol, 9
Apartado 10105
1017 LISBOA CODEX PORTUGAL
351-1-544607
inesc!lba at relay.EU.net (or) 
lba at sara.inesc.pt

C. Lee Giles
NEC Research Institute
4 Independence Way
Princeton, N.J. 08540
609-951-2642
FAX: 609-951-2482
giles at research.nj.nec.com

Richard Rohwer                            
Centre for Speech Technology Research      
Edinburgh University                     
80, South Bridge                                 
Edinburgh  EH1 1HN,   Scotland             
(44 or 0) (31) 650-2764             
FAX: (44 or 0) (31) 226-2730
rr%ed.cstr at nsfnet-relay.ac.uk (or) 
rr at uk.ac.ed.cstr
**************************************************************************


                                  C. Lee Giles
                                  NEC Research Institute
                                  4 Independence Way
                                  Princeton, NJ 08540
                                  USA

Internet:   giles at research.nj.nec.com
    UUCP:   princeton!nec!giles
   PHONE:   (609) 951-2642
     FAX:   (609) 951-2482


From DOW_ERNST at LILLY.COM  Mon Oct 21 10:16:00 1991
From: DOW_ERNST at LILLY.COM (Ernst Dow, 276-9916)
Date: Mon, 21 Oct 1991 09:16 EST
Subject: one-shot learning
Message-ID: <01GC03SM0RHC0000EE@GATEWAY.LILLY.COM>

Ross Gayler writes:

  I want to add one-shot learning, which I define (on the spur of the moment)
  to be successful learning from one occasion of exposure to the input.

  This phenomenon is known to happen in animals (e.g. it can happen in taste
  aversion conditioning) and can happen in humans (e.g. recognition of an
  abstract painting seen only once before). etc.

If it was a big enough event in your life, you will have memorized the
event. If it was not so monumental, you can help your memory by replaying
the event in your mind.

But in this case, we are talking memorization, not generalization. You may
be able to identify the painting you saw before, but could you make the
leap to recognizing all other abstract paintings?

Ernst Dow
ernst at lilly.com

From: DOW ERNST                     (MCVAX0::TC64566)


From mike at psych.ualberta.ca  Mon Oct 21 12:15:37 1991
From: mike at psych.ualberta.ca (Mike R. W. Dawson)
Date: Mon, 21 Oct 1991 10:15:37 -0600
Subject: Open position in cognitive psychology
Message-ID: <9110211613.AA01542@psych.ualberta.ca>

I'd like to bring the following open position in cognitive psychology to
the attention of anyone who might be modeling cognitive processes with
their networks:

=======================================================================

          Cognitive or Developmental Psychologists


The Department of Psychology, University of Alberta, invites
applications for one and, subject to budgetary considerations,
possibly two tenure track positions at the level of beginning 
Assistant Professor, salary range: $38,955-$55,755.  Candidates
with research expertise in either COGNITIVE PSYCHOLOGY or
DEVELOPMENTAL PSYCHOLOGY will be considered.  The position in
Cognitive is open with respect to area of specialization.  The
position in Developmental is also open with respect to area, but
there is some preference for individuals with interests in language
development, conceptual development, mathematical cognition, reading,
scientific reasoning, spelling, or writing.  Current Developmental
faculty conduct research on emergent literacy, reading, and
arithmetic skill.  Decisions will be made on the basis of demonstrated
research excellence, interactions with colleagues, and teaching
ability.  Applications should include a curriculum vita, three
letters of recommendation, and reprints or recent publications.
These materials should be sent, as appropriate, to Cognitive Search
Chair, Dr. Peter Dixon, or Developmental Search Chair, Dr. Jeffrey
Bisanz, Department of Psychology, University of Alberta, Edmonton,
Alberta, Canada T6G 2E9.  To receive full consideration, all
materials must be received by January 1, 1992.

The University of Alberta is committed to the principle of equity
in employment.  The University encourages applications from
aboriginal persons, disabled persons, members of visible
minorities and women.
========================================================================

Michael R. W. Dawson                       email: mike at psych.ualberta.ca
Department of Psychology
University of Alberta
Edmonton, Alberta                       Tel:  +1 403 492 5175
T6G 2E9, Canada                         Fax:  +1 403 492 1768

From bap at james.psych.yale.edu  Mon Oct 21 13:41:35 1991
From: bap at james.psych.yale.edu (Barak Pearlmutter)
Date: Mon, 21 Oct 91 13:41:35 -0400
Subject: Paper Announcement (Neuroprose)
In-Reply-To: "Steven J. Nowlan"'s message of Sat, 19 Oct 91 12:05:24 MST <9110191905.AA15742@bose>
Message-ID: <9110211741.AA03347@james.psych.yale.edu>

The following paper has not been placed in the Neuroprose archives at
Ohio State.  The file is not pearlmutter.soft-share.soft-share.ps.Z.
Ftp instructions follow the abstract.


      -----------------------------------------------------

		      Simplifying Neural Network
		     Soft Weight-Sharing Measures
				  by
			 Soft Weight-Measure
			 Soft Weight Sharing

			  Barak Pearlmutter
		       Department of Psychology
		      P.O. Box 11A Yale Station
		      New Haven, CT  06520-7447

                            ABSTRACT:

It has been shown by Nowlan and Hinton (1991) that it is advantagious
to construct weight complexity measures for use in weight
regularization through the use of EM, instead of relying on some
a-priori complexity measure, or even worse, neglecting regularization
by assuming a uniform distribution.  Their work can be regarded as a
generalization of the "Optimal Brain Damage" of Le Cunn et al (1990),
in which the distribution of weights is estimated with a histogram, a
peculiar functional form for a distribution.  Nowlan and Hinton assume
a much simpler functional form for the distribution, avoiding
overfitting and therefore overregularization.  However, they disregard
the issue of regularization of the regularizer itself.  Just as
certain weights might be considered a-priori quite unlikely, certain
distributions of weights may be considered a-priori quite unlikely.
To solve this problem, we introduce a regularization term on the
parameters of the weight distribution being estimated.  This
regularization term is itself determined by a distribution over these
distributional parameters.  In this light, Nowlan and Hinton (1991)
make the uniform distributional parameter distribution assumption.
Here, we estimate the distribution of distributions by running an
ensemble of networks, with EM used to estimate the weight distribtion
of each network (following Nowlan and Hinton), but we then use EM to
estimate the distribution of distributions across networks.  Of
course, each estimated distribution is used to regularize the
parameters over which that distribution is defined, leading to
regularization of the individual network regularizers.

We do not consider how to estimate the a-priori distribution which
might be used to regularize the distribution being used to regularize
the distribution being used to regularize the weights being estimated
from the data, which will be the explored in a future paper.

      -----------------------------------------------------
                        FTP INSTRUCTIONS

Either use "getps pearlmutter.soft-share.soft-share.ps.Z", or do the following:

     unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary
     ftp> get pearlmutter.soft-share.soft-share.ps.Z
     ftp> quit
     unix> uncompress pearlmutter.soft-share.soft-share.ps.Z
     unix> lpr -s pearlmutter.soft-share.soft-share.ps


Barak Pearlmutter
Department of Psychology
P.O. Box 11A Yale Station
New Haven, CT  06520-7447

Work Phone: 203 432-7011

From ANDERSON%BROWNCOG.BITNET at mitvma.mit.edu  Mon Oct 21 15:46:00 1991
From: ANDERSON%BROWNCOG.BITNET at mitvma.mit.edu (ANDERSON%BROWNCOG.BITNET@mitvma.mit.edu)
Date: Mon, 21 Oct 91 14:46 EST
Subject: Technical Report Announcement
Message-ID: <mailman.460.1149591236.29955.connectionists@cs.cmu.edu>


                     Technical Report 91-3 available from:

                Department of Cognitive and Linguistic Sciences

                Box 1978, Brown University, Providence, RI 02912


                        A Study in Numerical Perversity:

                    Teaching Arithmetic to a Neural Network


           James A. Anderson, Kathryn T. Spoehr, and David J. Bennett

                Department of Cognitive and Linguistic Sciences

                                    Box 1978

                                Brown University

                              Providence, RI 02912


                                    Abstract


             There are only a few hundred well-defined facts in

        elementary arithmetic, but humans find them hard to learn and

        hard to use.  One reason for this difficulty is that the

        structure of elementary arithmetic lends itself to severe

        associative interference.  If a neural network corresponds in

        any sense to brain-style computation, then we should expect

        similar difficulties teaching elementary arithmetic to a neural

        network.  We find this observation is correct for a simple

        network that was taught the multiplication tables.  We can

        enhance learning of arithmetic by forming a hybrid coding for

        the representation of number that contains a powerful analog or

        "sensory" component as well as a more abstract component.  When

        the simple network uses a hybrid representation, many of the

        effects seen in human arithmetic learning are reproduced,

        including overall error patterns and response time patterns for

        false products.  An extension of the arithmetic network is

        capable of being flexibly programmed to correctly answer

        questions involving terms such as "bigger" or "smaller."

        Problems can be answered correctly, even if the particular

        comparisons involved had not been learned previously.  Such a

        system is genuinely creative and flexible, though only in a

        limited domain.  It remains to be seen if the computational

        limitations of this approach are coincident with the limitations

        of human cognition.


             A version of this report will appear as a chapter in:

          "Neural Networks for Knowledge Representation and Inference"

               Edited by Daniel S. Levine and Manuel Aparicio, IV

                               To be published by

               Lawrence Erlbaum Associates, Hillsdale, New Jersey


             Copies can be obtained by sending an email message to:

                            LI700008 at brownvm.BITNET

                                     or to:

                            anderson at browncog.BITNET


From english at sun1.cs.ttu.edu  Mon Oct 21 17:12:09 1991
From: english at sun1.cs.ttu.edu (Tom English)
Date: Mon, 21 Oct 91 16:12:09 CDT
Subject: batch-mode parallel implementations
Message-ID: <9110212112.AA01265@sun1.cs.ttu.edu>

With regard to my earlier posting on problems I encountered in applying
Quickprop, Scott Fahlman has replied:

  Note that it is OK to switch from one training set to another when using
  Quickprop, but that every time you change the training set you *must* zero
  out the prev-slopes and delta vectors.

  If you want to get any benefit from quickprop, you have to run each
  distinct training set for at least a few cycles.

  If you were aware of all that (it's unclear from your message)....

Well, I was not aware of what others were doing in practice.  Scott's
original tech report on Quickprop gave results only for the case of
once-per-epoch weight updates.  I apologize for referring to my
implementation with once-per-batch weight updates and no zeroing 
between batches as "Fahlman's Quickprop."

What I *did* understand was that Quickprop's attempt to approximate
the error surface with a paraboloid was going to be fouled-up if the
"pictures" of the error surface gleaned from different batches were
substantially different.  Training for multiple iterations with
one batch, and then resetting the variables used in estimating the
shape of the error surface before going on to the next batch would
certainly eliminate the problem I described.

The prospect of choosing the number of iterations per batch does not
thrill me, however.  In general, I hate parameter tweaking.  From my
perspective, the worst thing about parameter tweaking is that we
don't really know how it affects the quality of the final network
obtained.  Also, exploring the effects of different parameter settings
takes too much of *my* time.  I want a procedure that does not require
tweaking and that runs at a reasonable fraction of the speed of a
"well-tuned" stochastic gradient descent procedure for a wide range of
problems.  (I haven't experimented with conjugate gradient descent yet,
but it seems to fit my bill.)

--Tom
  english at sun1.cs.ttu.edu


From giles at research.nec.com  Tue Oct 22 15:51:28 1991
From: giles at research.nec.com (Lee Giles)
Date: Tue, 22 Oct 91 15:51:28 EDT
Subject: Announcement of NIPS (Neural Information Processing Systems) Workshop
Message-ID: <9110221951.AA21064@fuzzy.nec.com>


Announcement of NIPS (Neural Information Processing Systems) Workshop:
Dec 6-7, Vail, Colorado.

**************************************************************************

            RECURRENT NETWORKS: THEORY AND APPLICATIONS

Recurrent neural networks have a very large potential for handling
dynamical / sequential problems, e.g. recognition and classification
of time-dependent signals like speech, modelling and control of
dynamical systems, learning of grammars and symbolic processing, etc.
However, the fulfillment of this potential remains one of the
important open issues in the neural network area. Training algorithms
are very inefficient in terms of memory demands, computational needs
or both. Little is known about convenient architectures for recurrent
networks. The number of known successful applications is very limited.

Even for static applications (operation in the "fixed point mode"),
recurrent networks are more general, and therefore more powerful, in
principle, than feedforward ones. However, once again, little is known
about their actual (dis)advantages, convenient architectures,
successful applications, etc.

We welcome proposals for presentations ( no more than one page in
length) related to the theme of theory or applications of recurrent networks. 
Subject to the number of received proposals, we envisage a two day workshop, 
one day theory, the next day applications, with 15-20 minute
presentations, each followed by about 10 minutes of discussion. 

Please send proposals to Lee Giles.

Organizers:

Professor Luis Borges de Almeida
INESC
Rua Alves Redol, 9
Apartado 10105
1017 LISBOA CODEX PORTUGAL
351-1-544607
inesc!lba at relay.EU.net (or) 
lba at sara.inesc.pt

C. Lee Giles
NEC Research Institute
4 Independence Way
Princeton, N.J. 08540
609-951-2642
FAX: 609-951-2482
giles at research.nj.nec.com

Richard Rohwer                            
Centre for Speech Technology Research      
Edinburgh University                     
80, South Bridge                                 
Edinburgh  EH1 1HN,   Scotland             
(44 or 0) (31) 650-2764             
FAX: (44 or 0) (31) 226-2730
rr%ed.cstr at nsfnet-relay.ac.uk (or) 
rr at uk.ac.ed.cstr
**************************************************************************

                                  
                                  C. Lee Giles
                                  NEC Research Institute
                                  4 Independence Way
                                  Princeton, NJ 08540
                                  USA

Internet:   giles at research.nj.nec.com
    UUCP:   princeton!nec!giles
   PHONE:   (609) 951-2642
     FAX:   (609) 951-2482


From thsspxw at iitmax.iit.edu  Tue Oct 22 19:10:57 1991
From: thsspxw at iitmax.iit.edu (Peter Wohl)
Date: Tue, 22 Oct 91 18:10:57 CDT
Subject: batch-mode parallel implementations 
In-Reply-To: <8431.688104706@B.GP.CS.CMU.EDU>; from "Connectionist_Research_Group@B.GP.CS.CMU.EDU" at Oct 22, 91 12:11 (midnight)
Message-ID: <9110222311.AA09935@iitmax.iit.edu>

Dear connectionists,
I have some comments on several of these, so I decided not to include
all the history of this discussion in my reply (you read it anyway).
So here I go:

1. Given per-sample training, one still faces the problem of how to deal
with really large networks (thousands of neurons and hundreds of thousands
connections) on a parallel machine that has far fewer processors.
What has been proposed: a) SIMD (don't cry for unused processors, as long
as you can communicate fast enough); b) MIMD with clustering neurons
somehow together, to increase granularity (SIMD also needs some), problem
here being dependence on VERY particular nets (usually layers with powers
of 2 neurons); c) re-writing the communication of the algorithm (see for
example my paper this coming Nov at ICTAI'91).

2. I agree that epoch-training is probably desirable. How large is a
"typical" epoch for a "large" net (thousands of neurons, 
fraction of million connections at least) ? Tens of vectors, hundreds ?
I would say, no more than few hundreds.

3. "Recall" (forward propagation with no weight update) is far easier to
parallelize, since there is no end-of-epoch bottleneck (barrier synch).
In some results (to be published next year), we achieved (on 32 BBN
Butterfly processors) almost 2 million connec-presen/sec with backprop.,
but over 5 million at recall. (2.5 million if you "adjust" forward-only
by dividing by two, to match the backprop figure more closely).

To summarize, I think the real problem of parallelizing ANNs applies when
at least one of net-size or training-epoch-size is large (and thus slow
when run sequentially). And don't forget: net architecture could change
during training (e.g. cascade corr), and still keep it parallel.

Thanks for your patience,
Peter Wohl
thsspxw at iitmax.iit.edu

From spotter at darwin.bio.uci.edu  Tue Oct 22 19:17:52 1991
From: spotter at darwin.bio.uci.edu (Steve Potter)
Date: Tue, 22 Oct 91 16:17:52 PDT
Subject: Continuous vs. Batch learning
Message-ID: <9110222317.AA22627@sanger.bio.uci.edu>


It is pretty clear to me that biological neural networks have all adapted
to prefer the continuous learning technique, as we can verify for humans
by remembering something that we only saw (or heard, etc.) once.  One-trial
learning paradigms abound in the behavioral literature.  I cant think of 
any biological examples of batch learning, in which sensory data are
saved until a certain number of them can be somehow averaged together
and conclusions made and remembered. Any ideas?  

Anyway, perhaps we should take an example from nature, which has been
optimizing things far longer than we have!

Steve Potter
UC Irvine Psychobiology dept.
Irvine, CA 92717

spotter at darwin.bio.uci.edu


From jbower at cns.caltech.edu  Wed Oct 23 00:47:51 1991
From: jbower at cns.caltech.edu (Jim Bower)
Date: Tue, 22 Oct 91 21:47:51 PDT
Subject: CNS*92
Message-ID: <9110230447.AA01301@cns.caltech.edu>


                       CALL FOR PAPERS                        

			First Annual
	     Computation and Neural Systems Meeting
			  CNS*92

	    Tuesday, July 26 through Sunday, July 31
                            1992

		    San Francisco, California

          This is the first annual meeting of an  inter-disciplinary  
          conference intended to address the broad range of research
	  approaches and issues involved in the general field of 
	  computational neuroscience.  The meeting itself has grown out
	  of a workshop on "The Analysis and Modeling of Neural Systems"
	  which has been held each of the last two years at the same 
	  site.  The strong response to these previous meetings has 
	  suggested that it is now time for an annual open meeting 
	  on computational approaches to understanding neurobiological 
	  systems.

	  CNS*92 is intended to bring together experimental and
	  theoretical neurobiologists along with engineers, computer
          scientists, cognitive scientists, physicists, and  mathematicians
          interested  in  understanding how neural systems compute.  
	  The meeting will equally emphasize experimental, model-based, 
	  and more abstract theoretical approaches to understanding 
	  neurobiological computation.
	
          The first day of the meeting (July 26) will be devoted to tutorial 
	  presentations and workshops focused on particular technical
	  issues confronting computational neurobiology.  The next three
	  days will include the main technical program consisting of
	  plenary, contributed and poster sessions.  There will be no
	  parallel sessions and the full text of presented papers will
	  be published.  Following the regular session, there will be 
	  two days of focused workshops at a site on the California coast 
	  (July 30-31). Participation in the workshops is restricted to 
	  75 attendees.
          
	   
          Technical Program:  Plenary, contributed and poster sessions will
          be  held.   There will be no parallel sessions.  The full text of
          presented  papers  will  be  published.   

	  Presentation categories:
		A. Theory and Analysis
		B. Modeling and Simulation
		C. Experimental
		D. Tools and Techniques 

	  Themes:
		A. Development
		B. Cell Biology
		C. Excitable Membranes and Synaptic Mechanisms
		D. Neurotransmitters, Modulators, Receptors
		E. Sensory Systems
			1. Somatosensory
			2. Visual
			3. Auditory
			4. Olfactory
			5. Other
		F. Motor Systems and Sensory Motor Integration
		G. Behavior
		H. Cognitive 
		I. Disease	


	  Submission  Procedures: Original  research  contributions  are  
	  solicited,  and  will  be carefully refereed.  Authors must submit 
	  six  copies  of  both  a 1000-word  (or less) summary and six copies 
	  of a separate singlepage  50-100  word  abstract  clearly   stating   
	  their   results postmarked  by January 7, 1992. Accepted abstracts 
	  will be published in the conference program.  Summaries are for  
	  program  committee use  only.   At the bottom of each abstract page 
	  and on the first summary page indicate preference for oral or poster
  	  presentation and  specify  at least one appropriate category and 
	  and theme.  Also indicate preparation if applicable.  Include
          addresses of all authors on the front  of  the  summary  and  the
          abstract  and  indicate  to which author correspondence should be
          addressed. Submissions will not be considered that lack  category
          information,  separate  abstract sheets, the required six copies,
          author addresses, or are late.
	

          Mail Submissions To:

	  Chris Ploegaert
	  CNS*92 Submissions 
	  Division of Biology
	  216-76
	  Caltech
	  Pasadena, CA. 91125


          Mail For Registration Material To:

	  Chris Ghinazzi
	  Lawrence Livermore National Laboratories
	  P.O. Box 808
	  Livermore CA.  94550


          All  submitting  authors  will  be  sent  registration   material
          automatically.   Program  committee decisions will be sent to the
          correspondence author only.

          CNS*92 Organizing Committee: 
		Program  Chair,  James M. Bower, Caltech.
	  	Publicity  Chair,  Frank Eeckman, Lawrence Livermore Labs.
	  	Finances, John Miller, UC Berkeley and 
			Nora Smiriga, Institute of Scientific Computing Res.	
		Local Arrangements, Ted Lewis, UC Berkeley and 
			Muriel Ross, NASA Ames.
	  
	  Program Committee:  
	  	William Bialek, NEC Research Institute. 
		James M. Bower, Caltech.
	  	Frank Eeckman, Lawrence Livermore Labs. 
		Scott Fraser, Caltech.
	  	Christof Koch, Caltech. 
	  	Ted Lewis, UC Berkeley. 
		Eve Marder, Brandeis.
		Bruce McNaughton, University of Arizona. 
		John Miller, UC Berkeley. 
		Idan Segev, Hebrew University, Jerusalem
		Shihab Shamma, University of Maryland.  
		Josef Skrzypek, UCLA.
                 

	      DEADLINE FOR SUMMARIES & ABSTRACTS IS January 7, 1992 
                                     

				please post


From palmer at world.std.com  Wed Oct 23 02:25:10 1991
From: palmer at world.std.com (Kent D Palmer)
Date: Wed, 23 Oct 91 02:25:10 -0400
Subject: THINKNET  NEWSLETTER  ANNOUNCEMENT
Message-ID: <9110230625.AA18459@world.std.com>

===========================START=OF=THINKNET=FILE============================
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|||||||            PLEASE POST ----- NEWSLETTER ANNOUNCEMENT         ||||||||
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

                             /|   .......  ..   ..    .     .        .      .
                           .==|........ ...  ..   ....   .      ....   ..    
._____.   . * .     .    / ===|_ _.    ..______________________________......
   |  |   | | |\    |  / ======== |\ ...|  .... |.THINKNET:An Electronic....
   |  |---| | |  \  |< ========== |. \ .|---- . |.Journal Of Philosophy,...
   |  |   | | |    \|  \ ======== |... \| ..... |.Meta-Theory, And Other..
   |  |   | | |     |    \ ====== |.... |____.. |.Thoughtful Discussions....
                           .==|  ........  ..     .... ..      ...     .. . 
                             \|    .... ...  ..  ..  . . ..    .          . 

-----------------------------------------------------------------------------
OCTOBER 1991                     ISSUE  001               VOLUME 1   NUMBER 1
-----------------------------------------------------------------------------

This is an announcement for Thinknet, an on-line magazine forum dedicated to 
thoughtfulness in the cybertime environment. Thinknet covers philosophy, 
systems theory, and meta-theoretical discussions within disciplines. It is your
interdisciplinary window on to what significant information sources are 
available to foster thought provoking discussion.


*CONTENTS*

Publication Data

     Scope of newsletter.
     Rationale for newsletter.
     Subscriptions and Submittals address.
     Bulletin Boards where it may be found.
     Services offered by newsletter.
     Staff of this edition.
     Coda: call for participation.

About Thinknet
    
     Discussion of goals of Thinknet Newsletter.

Prospect for Philosophy and Systems Theory in Cybertime

     Is there a possibility for a renaissance for philosophy?

The Philosophy Category on GEnie

      Review by Gordon Swobe with list of topics.

Philosophy on the WELL

      Review by Jeff Dooley with list of topics.

Origin Conference on the WELL

      Review by Bruce Schuman with list of topics

Internet Philosophy Mailing Lists

      A review of all know philosophy oriented mailing lists by Stephen Clark.

Books Of Note
   
      THE MATRIX

      !%@:: A DIRECTORY OF ELECTRONIC MAIL ADDRESSING & NETWORKS

Other Publications

      BOARDWATCH MAGAZINE

      SOFTWARE ENGINEERING FOUNDATIONS [a work in progress]
    
Books, Electronic Newsletters, and Cyber-Artifacts Received

      ARTCOM NEWSLETTER

      FACTSHEET FIVE

Protocols for Meaningful Discussions: ARTICLE by Kent Palmer

       A consideration of how philosophy discussions might be made more 
       useful and their history accessible by using a voluntary protocol.

Thoughtful Communications: EDITORIAL

    Closing remarks.


<<<<<<<<<<<<Thinknet Electronic Newsletter (c) 1991 Kent Palmer.>>>>>>>>>>>>>


-----------------------------------------------------------------------------
HOW TO GET YOUR COPY                                                      kdp
----------------------------------------------------------------------------- 

*Price*

The electronic form is FREE. 

Hardcopies cost money for reproduction, postage, and handling.

*Subscriptions*

Send an e-mail message to the following address:

            thinknet at world.std.com

Your message should be of the following form:

SEND THINKNET TO YourFullName AT YourEmailAddress

Some mailing lists do not include your return mailing address if you use the 
reply function of your mail reader so you must make sure your return e-mail 
address is in the body of your message.

Thinknet file is long, about 1113 lines; 7136 words; 51795 bytes.

You will be added to the thinknet subscription list. You will get all further
issues unless you unsubscribe.

*Bulletin Boards*

Thinknet will be posted in the WELL philosophy conference in a topic.

The WELL 
27 Gate Five Road, Sausalito, CA 94965 
modem 415-332-6106
voice 415-332-4335

Also on GEnie in the Philosophy category under the Religion and
Ethics Bulletin Board. 

GEnie Client Services 1-800-638-9636


*PHILOS-L Listserver*

You will eventually be able to get the thinknet newsletter from a listserver.

Send the message 'GET THINKNET DOC' to 'LISTSERV at LIVERPOOL.AC.UK'.

If you get an error message try the regular thinknet address.

*Or if all else fails*

            THINKNET
            PO BOX 8383
            ORANGE CA 92664-8383
            UNITED STATES


==============================END=THINKNET=FILE=============================

From ross at psych.psy.uq.oz.au  Wed Oct 23 04:23:43 1991
From: ross at psych.psy.uq.oz.au (Ross Gayler)
Date: Wed, 23 Oct 1991 18:23:43 +1000
Subject: one-shot learning
Message-ID: <9110230823.AA28466@psych.psy.uq.oz.au>

Ernst Dow (ernst at lilly.com) writes (in the context of one-shot or one-trial
learning):

>But in this case, we are talking memorization, not generalization. You may
>be able to identify the painting you saw before, but could you make the
>leap to recognizing all other abstract paintings?

My interest is in analogical retrieval and not one-trial learning (except to
the extent that it is necessary for 'truly cognitive' capabilities).  The
literature on analogy stresses the role that goals play in determining the
apparent similarity (and hence generalisation) of entities.  That is, in
analogy the generalisation pattern emerges at recall time rather than being
completely determined at storage time.  For such a (post-hoc) generaliser
it makes sense to attempt to memorise everything.  This contrasts with the
approach of most BP work where the system learns an internal representation
(read that as set of hidden units and weights) that supports a particular
pre-specified pattern of generalisation.

I realise that there is more to life than analogical recall and some
generalisation is based on literal similarity etc, but I am just stating the
extreme position for simplicity.

Ross Gayler
ross at psych.psy.uq.oz.au

From pluto at cs.UCSD.EDU  Mon Oct 21 19:29:59 1991
From: pluto at cs.UCSD.EDU (Mark Plutowksi)
Date: Mon, 21 Oct 91 16:29:59 PDT
Subject: Redundancy 
Message-ID: <9110212329.AA12326@tournesol.ucsd.edu>

Scott Fahlman writes:

::	
::	I guess you could measure redundancy by seeing if some subset of the
::	training data set produces essentially the same gradient vector as the full
::	set.  Probably statisticians have good ways of talking about this
::	redundancy business -- unfortunately, I don't know the right vocabulary.


Indeed they do; however, they begin from a more general perspective:
for a particular "n", where "n" is the number of exemplars we are going to
train on, call a set of "n" exemplars optimal if better generalization can 
not be obtained by training on any other set of "n" exemplars.
This criterion is called "Integrated Mean Squared Error."  
See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. 

Using appropriate approximations, we can use this to obtain what you suggest.  
Results for the case of clean data are currently available in
Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD
CSE department (see [Plutowski & White, 1991].)  Basically, given a set of
candidate training examples, we select a subset which if trained upon
give a gradient highly correlated with the gradient obtained by
training upon the entire set.  This results in a concise set of exemplars 
representative (in a precise sense) of the entire set.  
Preliminary empirical results indicate that the end result is what we
originally desired: training upon this well chosen subset results in 
generalization close to that obtained by training upon the entire set.  


Tom Dietterich writes:

::	
::	There has been a fair amount of work in decision-tree learning on the
::	issue of breaking large training sets into smaller batches.  In 1980,
::	Quinlan introduced a method called "windowing" in which a small sample
::	(or window) of the training data is initially drawn at random.  The
::	algorithm is trained on this window and then tested on the remainder of
::	the data (that was excluded from the window).  Then, some fraction of
::	the misclassified examples (possibly all of them) are added to the
::	window.
::	
::	Generally speaking, in noise-free domains, windowing works quite well.
::	A very high-performing decision tree can be learned with a relatively
::	small window.  However, for noisy data, the general experience has
::	been that the window eventually grows to include the entire training set.
::	Jason Catlett (Sydney U) recently completed his dissertation on
::	testing windowing and various other related tricks on datasets of
::	roughly 100K examples (straight classification problems).  I recommend
::	his papers and thesis.
::	
::	His main conclusion is that if you want high performance, you need to
::	look at all of the data.
	

Could you provide a reference to the work demonstrating the performance 
of windowing on clean data?  And could you provide an e-mail address for
Jason Catlett?   I am in the process of setting up benchmarking experiments
for the technique I mentioned above.   Although I consider the more general
task of fitting arbitrary functional mappings, these works seem relevant.

Thanks,

=================
== Mark Plutowski
Computer Science and Engineering 0114
University of California, San Diego
La Jolla,  CA


-----------
REFERENCES:
-----------

Box,G., and N.Draper. 1987.
	{\bf Empirical Model-Building and Response Surfaces.}
	Wiley, New York. 

Khuri, A.I., and J.A.Cornell. 1987.
	{\bf Response Surfaces (Designs and Analyses)}.
	Marcel Dekker, Inc., New York. 

Myers, Raymond H., and A.I. Khuri, W.H. Carter, Jr. 1989. 
	``Response Surface Methodology: 1966-1988.'' 
	{\em Technometrics}. vol.31, no.2.

Plutowski, Mark E., and Halbert White. 1991.
	``Active selection of training examples for network learning 
	in noiseless environments.''  
	Technical Report No. CS91-180, 
	Department of Computer Science and Engineering, 
	The University of California, San Diego. 92093-0114.
	Accepted pending revision by IEEE Transactions on Neural Networks.


	---- Here are some other related works: --------


Cohn, David, Les Atlas, and Richard Ladner. 1990.
	``Training connectionist networks with queries and selective sampling.''
	{\em Advances in Neural Information Processing Systems 2,}
	Proc. of the Neural Information Processing Systems Conference.
	Morgan Kaufmann, San Mateo, California.
	
Hwang, Jenq-Neng, J.J. Choi, Seho Oh, and Robert J. Marks III. 1990.
	``Query learning based on boundary search and gradient 
	computation of trained multilayer perceptrons. '' 
	{\em Proc. IJCNN 1990, San Diego. The 
	International Joint Conference on Neural Networks.}  IEEE press.  


From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Mon Oct 21 21:27:08 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Mon, 21 Oct 91 21:27:08 -0400
Subject: Redundancy 
In-Reply-To: Your message of Mon, 21 Oct 91 16:29:59 -0800.
             <9110212329.AA12326@tournesol.ucsd.edu> 
Message-ID: <mailman.461.1149591236.29955.connectionists@cs.cmu.edu>


    ::	I guess you could measure redundancy by seeing if some subset of the
    ::	training data set produces essentially the same gradient vector as the full
    ::	set.  Probably statisticians have good ways of talking about this
    ::	redundancy business -- unfortunately, I don't know the right vocabulary.
    
    Indeed they do; however, they begin from a more general perspective:
    for a particular "n", where "n" is the number of exemplars we are going to
    train on, call a set of "n" exemplars optimal if better generalization can 
    not be obtained by training on any other set of "n" exemplars.
    This criterion is called "Integrated Mean Squared Error."  
    See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. 
    
    Using appropriate approximations, we can use this to obtain what you suggest.  
    Results for the case of clean data are currently available in
    Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD
    CSE department (see [Plutowski & White, 1991].)  Basically, given a set of
    candidate training examples, we select a subset which if trained upon
    give a gradient highly correlated with the gradient obtained by
    training upon the entire set.  This results in a concise set of exemplars 
    representative (in a precise sense) of the entire set.  
    Preliminary empirical results indicate that the end result is what we
    originally desired: training upon this well chosen subset results in 
    generalization close to that obtained by training upon the entire set.  
    
Thanks for the references.  This is a useful beginning, but doesn't seem to
address the problem we were discussing.  In many real-world problems, the
following constraints hold:

1. We do not have direct access to "the entire set".  In fact, this set may
well be infinite.  All we can do is collect some number of samples, and
there is usually a cost for obtaining each sample.

2. Rather than hand-crafting a training set by choosing all its elements,
we want to choose an appropriate "n" and then pick "n" samples at random
from the set we are trying to model.  Of course, if collecting samples is
cheap and network training is expensive, you might throw some samples away
and not use them in the training set.  I don't *think* that this would ever
improve generalization, but it might lead to faster training without
hurting generalization.

3. The data may not be "clean".  The structure we are trying to model may
be masked by a lot of random noise.

Do you know of any work on how to pick an optimal "n" under these
conditions?  I would guess that this sort of problem is already
well-studied in statistics; if not, it seems like a good research topic for
someone with the proper background.

-- Scott Fahlman

From pluto at cs.UCSD.EDU  Mon Oct 21 21:54:29 1991
From: pluto at cs.UCSD.EDU (Mark Plutowksi)
Date: Mon, 21 Oct 91 18:54:29 PDT
Subject: Redundancy
Message-ID: <9110220154.AA12390@tournesol.ucsd.edu>


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
..in response to your message, included here:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

::	To: Mark Plutowksi <pluto at cs.UCSD.EDU>
::	Cc: connectionists at CS.CMU.EDU
::	Subject: Re: Redundancy 
::	In-Reply-To: Your message of Mon, 21 Oct 91 16:29:59 -0800.
::	             <9110212329.AA12326 at tournesol.ucsd.edu> 
::	Date: Mon, 21 Oct 91 21:27:08 -0400
::	From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU
::	
::	    ::	I guess you could measure redundancy by seeing if some subset of the
::	    ::	training data set produces essentially the same gradient vector as the full
::	    ::	set.  Probably statisticians have good ways of talking about this
::	    ::	redundancy business -- unfortunately, I don't know the right vocabulary.
::	    
::	    Indeed they do; however, they begin from a more general perspective:
::	    for a particular "n", where "n" is the number of exemplars we are going to
::	    train on, call a set of "n" exemplars optimal if better generalization can 
::	    not be obtained by training on any other set of "n" exemplars.
::	    This criterion is called "Integrated Mean Squared Error."  
::	    See [Khuri & Cornell, 1987], [Box and Draper, 1987], or [Myers et.al., 1989]. 
::	    
::	    Using appropriate approximations, we can use this to obtain what you suggest.  
::	    Results for the case of clean data are currently available in
::	    Neuroprose in the report "plutowski.active.ps.Z", or from the UCSD
::	    CSE department (see [Plutowski & White, 1991].)  Basically, given a set of
::	    candidate training examples, we select a subset which if trained upon
::	    give a gradient highly correlated with the gradient obtained by
::	    training upon the entire set.  This results in a concise set of exemplars 
::	    representative (in a precise sense) of the entire set.  
::	    Preliminary empirical results indicate that the end result is what we
::	    originally desired: training upon this well chosen subset results in 
::	    generalization close to that obtained by training upon the entire set.  
::	    
::	Thanks for the references.  This is a useful beginning, but doesn't seem to
::	address the problem we were discussing.  In many real-world problems, the
::	following constraints hold:
::	
::	1. We do not have direct access to "the entire set".  In fact, this set may
::	well be infinite.  All we can do is collect some number of samples, and
::	there is usually a cost for obtaining each sample.
::	
::	2. Rather than hand-crafting a training set by choosing all its elements,
::	we want to choose an appropriate "n" and then pick "n" samples at random
::	from the set we are trying to model.  Of course, if collecting samples is
::	cheap and network training is expensive, you might throw some samples away
::	and not use them in the training set.  I don't *think* that this would ever
::	improve generalization, but it might lead to faster training without
::	hurting generalization.
::	
::	3. The data may not be "clean".  The structure we are trying to model may
::	be masked by a lot of random noise.
::	
::	Do you know of any work on how to pick an optimal "n" under these
::	conditions?  I would guess that this sort of problem is already
::	well-studied in statistics; if not, it seems like a good research topic for
::	someone with the proper background.
::	
::	-- Scott Fahlman
::	

I don't know of a feasible way of choosing such an "n".  
Instead, I obtain a greedy approximation to it.
What we do (as reported in the tech report by Plutowski & White) 
is sequentially grow the training set, first finding
an "optimal" training set of size 1, then fitting the network to this
training set, appending the training set with a new exemplar selected from
the set of available candidates, obtaining a training set of size 2 which
is "approximately optimal",  fitting this set,  appending a third exemplar, etc,
continuing the process until the network fit obtained by training over the
exemplars fits the rest of the available examples within the desired tolerance.

I have no idea as
to how close the resulting training sets are to being truly IMSE-optimal.
But, they are much more concise than the original set - and so far, 
at least on the  toy problems I have tried so far,
it has resulted in a computational benefit, apparently because training on the
smaller set of exemplars provides an informative gradient at much lower 
cost than is required to obtain a gradient over all of the available examples.
The more the redundancy in the data, the more the computational benefit.

Of course, more extensive testing is required (and in progress.)

= Mark Plutowski


From 72247.2225 at CompuServe.COM  Mon Oct 21 23:05:00 1991
From: 72247.2225 at CompuServe.COM (Larry Fast)
Date: 21 Oct 91 23:05:00 EDT
Subject: Backprop Feedback Gain
Message-ID: <911022030500_72247.2225_EHL25-1@CompuServe.COM>

I'm expanding the PDP Backprop program (McClelland&Rumlhart version 1.1) to
compensate for the following problem:
 
As Backprop passes the error back thru multiple layers, the gradient has
a built in tendency to decay.  At the output the maximum slope of 
the 1/( 1 + e(-sum)) activation function is 0.5.
Each successive layer multiplies this slope by a maximum of 0.5.
The maximum gains at various layers (where n is the output layer) is:
max slope at layer n   = 0.5
max slope at layer n-2 = 0.125
max slope at layer n-3 = 0.0625
max slope at layer n-4 = 0.03125 ....
 
It has been suggested (by a couple of sources) that an attempt should be
made to have each layer learn at the same rate.  To this end, I'm installing
a gain factor on error being backpropagated.
 
The new error function is:  errorPropGain * act * (1 - act)
The nominal value that makes sense is 2 (or more).  This would allow at least
the maximum learning rate to propagate unattenuated.
 
Has anyone else tried this, or any other method of flattening out the learning
rate in deep layers.  Any info regarding more recent releases of PDP or
a users' group would also be helpful.
 
Please respond directly to  72247.2225 at compuserve.com
 
	Thanks, Larry Fast


From max.coltheart at mrc-apu.cam.ac.uk  Mon Oct 21 23:04:38 1991
From: max.coltheart at mrc-apu.cam.ac.uk (max.coltheart@mrc-apu.cam.ac.uk)
Date: Tue, 22 Oct 1991 11:04:38 +0800
Subject: redundancy and generalization
Message-ID: <18650.9110221006@sirius.mrc-apu.cam.ac.uk>

Consider the eight words PAT PAD CAT CAD POT POD COT COD. Give a net the
task
of translating these from letters to phonemes. Choose any subset of, say,
four
items as the training set and after training to asymptote test performance
on
the other four. Even with a training set that contains all the information
needed for the test set (e.g. PAT POD CAT COD exemplifies every
letter-phoneme
pairing twice), the various architectures we have been trying score 0% on
the
generalization set (in this example, the net learns nothing about the third
letter so in the generalisation test translates PAD as "pat", POT as "pod",
COT as "cod" and CAD as "cat". Is this problem, trivial for rule-learning
algorithms, insoluble for any system that learns by error-correction?
 
Tom Dietterich writes:
 
>Generally speaking, in noise-free domains, windowing works quite well.
>A very high-performing decision tree can be learned with a relatively
>small window.  However, for noisy data, the general experience has
>been that the window eventually grows to include the entire training set.
>Jason Catlett (Sydney U) recently completed his dissertation on
>testing windowing and various other related tricks on datasets of
>roughly 100K examples (straight classification problems).  I recommend
>his papers and thesis.
>
>His main conclusion is that if you want high performance, you need to
>look at all of the data.
"The window eventually grows to include the entire training set" = "the
system is incapable of generalizing accurately ". Note that noise isn't the
problem. In
my example, there's no noise, and no generalization
 
Max Coltheart
max.coltheart at mrc-apu.cam.ac.uk
 

From ahg at eng.cam.ac.uk  Tue Oct 22 05:20:21 1991
From: ahg at eng.cam.ac.uk (A.H. Gee)
Date: Tue, 22 Oct 91 10:20:21 +0100
Subject: No subject
Message-ID: <22398.9110220920@tw700.eng.cam.ac.uk>

************** PLEASE DO NOT FORWARD TO OTHER NEWSGOUPS ****************

The following technical report has been placed in the neuroprose 
archives at Ohio State University:

                NEURAL NETWORKS AND COMBINATORIAL
                 OPTIMIZATION PROBLEMS - THE KEY
                     TO A SUCCESSFUL MAPPING

          Andrew Gee, Sreeram Aiyer and Richard Prager

	       Technical Report CUED/F-INFENG/TR 77

	               Cambridge University
		      Engineering Department 
		        Trumpington Street 
		        Cambridge CB2 1PZ 
			     England 


                            Abstract

For several years now there has been much research interest in the use
of Hopfield networks to solve combinatorial optimization problems.
Although initial results were disappointing, it has since been
demonstrated how modified network dynamics and better problem mapping
can greatly improve the solution quality. The aim of this paper is to
build on this progress by presenting a new analytical framework in
which problem mappings can be evaluated without recourse to purely
experimental means. A linearized analysis of the Hopfield network's
dynamics forms the main theory of the paper, followed by a series of
experiments in which some problem mappings are investigated in the
context of these dynamics. In all cases the experimental results are
compatible with the linearized theory, and observed weaknesses in the
mappings are fully explained within the framework. What emerges is a
largely analytical technique for evaluating candidate problem
mappings, without having to resort to the more usual trial and error.

************************ How to obtain a copy ************************

a) Via FTP:

unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get gee.opt_map.ps.Z
ftp> quit
unix> uncompress gee.opt_map.ps.Z
unix> lpr gee.opt_map.ps (or however you print PostScript)

Please note that a couple of the figures in the paper were produced
on an Apple Mac, and the resulting PostScript is not quite standard.
People using an Apple LaserWriter should have no problems though.

b) Via postal mail:

Request a hardcopy from

Andrew Gee,
Speech Laboratory,
Cambridge University Engineering Department, 
Trumpington Street, 
Cambridge CB2 1PZ,
England.

or email me: ahg at eng.cam.ac.uk


From dlb at ukc.ac.uk  Wed Oct 23 08:10:16 1991
From: dlb at ukc.ac.uk (dlb@ukc.ac.uk)
Date: Wed, 23 Oct 91 13:10:16 +0100
Subject: Research Fellowship (UK)
Message-ID: <mailman.462.1149591236.29955.connectionists@cs.cmu.edu>


Research Fellowship in Neural Networks:

         Investigation of Digitally Implemented Neural Networks
               Based on Novel Goal-Seeking Principles

                 UNIVERSITY OF KENT AT CANTERBURY
               Electronic Engineering Laboratories

Applications are invited for a Research Fellowship in the Electronic
Engineering Laboratories at the University of Kent to work on an
SERC-funded project on digitally implemented neural networks.
The project, part of an on-going programme of work in neural networks,
will investigate the properties and applications of novel artificial
neural networks based on Boolean processing nodes and embodying local
low-level goal-seeking principles.

Applicants should have a good Honours degree in electronic engineering
or computer science/engineering and should preferably hold a Ph.D. degree
in an appropriate area.  Applicants with previous experience in the field
of neural networks or image analysis would be especially welcome.

The Digital Systems Research Group in the Electronic Engineering Laboratories
have a very strong research programme in computational architectures for
pattern processing, with a particular emphasis on neural network
architectures.  Extensive facilities to support this work are available, 
including both central and in-house computing systems, and a dedicated
workstation will be available for this project.  Technician
support will also be provided.

The appointment is for a three year period and is available from
1st January 1992.  The salary is on the scale 11969 - 14170 pounds.
informal enquiries may be made to Dr. Michael Fairhurst or
Dr. David Bisset on  +44 227-764000, or by e-mail to dlb at ukc.ac.uk

Further particulars and application forms are available from
The Personnel Office, The University of Kent at Canterbury,
Canterbury, Kent, CT2 7NZ, England, quoting reference A92/13. Telephone
+44 227 475482 or 764000 x3915. The closing date is 1st November 1991.

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Wed Oct 23 11:23:19 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Wed, 23 Oct 91 11:23:19 -0400
Subject: Continuous vs. Batch learning 
In-Reply-To: Your message of Tue, 22 Oct 91 16:17:52 -0800.
             <9110222317.AA22627@sanger.bio.uci.edu> 
Message-ID: <mailman.463.1149591236.29955.connectionists@cs.cmu.edu>


    It is pretty clear to me that biological neural networks have all adapted
    to prefer the continuous learning technique...
    
    Anyway, perhaps we should take an example from nature, which has been
    optimizing things far longer than we have!
    
Sure, but with a totally different technology.  Give me 10^9 processors,
10^13 active, complex connections, and 3-D packing, and make short-term
memory scarce, slow, and unreliable, and I'd pick continuous learning as
well.  And it wouldn't even take me a billion years to make the decision.

-- Scott Fahlman

From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu  Wed Oct 23 14:13:31 1991
From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (BO XU)
Date: Wed, 23 Oct 91 13:13:31 EST
Subject: Paper
Message-ID: <mailman.464.1149591236.29955.connectionists@cs.cmu.edu>

Because this is the first time I place paper into neuroprose, I have brought
lots of troubles to Jordan Pollack of Ohio State.  We don't know whether
it's due to my postscript file's problem (I generated the ps file on MacWrite
II by pressing and holding the command key and the "F" or "K"  key together
before clicking the "OK" button in the print dialogue menu)  or not, the ps
file cannot be printed at Jordan's place.  We retried it several times, and
he still cannot see it after processing it.  However, the ps file inside the
Inbox can be traced from UNIX.

So we decide to leave the paper inside the Inbox subdirectory and announce
it with a caveat that it may not work.  I am sorry for this delay and
inconvenience, and I will be very glad to know more methods to generate ps
files from MacWrite II which will have a good behavior at neuroprose archive.
Thanks in advance.

The procedure to get the ps file from the Inbox is as follows:

unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose/Inbox
ftp> binary
ftp> get ppnn.ps6
ftp> quit
unix> lpr ppnn.ps6

I want to thank Jordan for his great help since last week.  I appreciate
very much his instructions and patience in retrying different versions of
ps files I sent to him.


Bo Xu
Indiana University
itgt500 at indycms.iupui.edu

From steck at spock.wsu.ukans.edu  Wed Oct 23 15:11:40 1991
From: steck at spock.wsu.ukans.edu (jim steck (ME)
Date: Wed, 23 Oct 91 14:11:40 -0500
Subject: Batch Mode Parallel Implementations
Message-ID: <9110231911.AA01043@spock.wsu.UKans.EDU>


     S. Kollias and D. Anastassiou presented an interesting approximate
second order training algorithm using a Least Squares Estimation
Technique at IJCNN 1988  (IEEE Transactions on Circuits and Systems
vol 36 no. 8 ).

     This algorithm is interesting because it updates the weights with
each training pair, but performs the update using  information saved
from all previous training pairs.  The algorithm includes a parameter
called a forgetting factor which causes information from the previous
training pairs to slowly be discounted (or forgotten).  This is 
essentially learning somewhere in between "batch learning" and
"on line learning".

     As an appoximate second order method, it is somewhat computationally
intensive; however, the method is easily and productively vectorized
on parallel architectures.

Jim Steck


From wray at ptolemy.arc.nasa.gov  Wed Oct 23 18:33:42 1991
From: wray at ptolemy.arc.nasa.gov (Wray Buntine)
Date: Wed, 23 Oct 91 15:33:42 PDT
Subject: Paper Announcement (Neuroprose)
In-Reply-To: Barak Pearlmutter's message of Mon, 21 Oct 91 13:41:35 -0400 <9110211741.AA03347@james.psych.yale.edu>
Message-ID: <9110232233.AA17716@ptolemy.arc.nasa.gov>


>		      Simplifying Neural Network
>		     Soft Weight-Sharing Measures
>				  by
>			 Soft Weight-Measure
>			 Soft Weight Sharing
>	
>			  Barak Pearlmutter
>		       Department of Psychology
>		      P.O. Box 11A Yale Station


I enjoyed this take-off immensely.  

Determining good regularisers (or priors) is a major problem facing
feed-forward network research (and related representations), so I also
enjoyed the original Nowlan-Hinton paper.  Dramatic performance
improvements can be got by careful choice of regulariser/prior (I know
this from my tree research), and its a bit of a black art right now,
though I have some good directions.  Nowlan & Hinton suggest a strong
theoretical basis exists for their approach (see their section 8), so
perhaps we'll see more of this style, and "cleaner" versions to keep
the theoreticians happy.

By the way, at CLNL in Berkeley in August I expressed the view that
this problem: i.e.

Regularizers
------------
	for a given network/activation-function configuration,
	what are suitable parameterised families of regularizes,
	and how might the parameters be set from the knowledge
	of the particular application being addressed
NB.  the setting of the $\lambda$ tradeoff term in Nowlan & Hinton's
     equation (1) has several fairly elegant and practical solutions

along with:

Training
--------
	decision-theoretic/bounded-rationality approaches to 
	batch vs. block (sub-batch) vs. pattern updates during gradient 
	descent (i.e. of back-prop.)
	(i.e. the Fahlman-LeCunn-English-Grajski-et-al. discussion,
	      or the batch update vs. stochastic update problem)
	and subsequent addition of second-order gradient methods

as two of the most pressing problems to make feed-forward networks
a "mature" technology that will then supercede many earlier 
non-neural methods.  

Wray Buntine
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 244-17                          fax:    (415) 604 6997
Moffett Field, CA, 94035 		  email:  wray at ptolemy.arc.nasa.gov


PS.thanks also to Martin Moller for adding some meat to the Training
   problem:
>     An interesting observation is that the number of blocks needed
>     to make an update is growing during learning so that after a certain
>     number of epochs the blocksize is equal to the number of patterns.
>     When this happens the algorithm is equal to a traditional batch-mode
>     algorithm and no validation is needed anymore.
  When explaining batch update vs. stochastic update to people,
  I always use this behaviour as an example of what a decision-theoretic 
  training scheme **should** do, so I'm glad you've confirmed it
  experimentally.


From ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu  Wed Oct 23 20:46:29 1991
From: ITGT500%INDYCMS.BITNET at vma.cc.cmu.edu (BO XU)
Date: Wed, 23 Oct 91 19:46:29 EST
Subject: Paper
Message-ID: <mailman.465.1149591236.29955.connectionists@cs.cmu.edu>

A moment ago I received a message from Jordan telling me that he can see
the ppnn.ps6 file now and he has put it into neuroprose subdirectory named
xu.ppnn.ps.Z.  I am very glad to hear this news and also sorry for possible
inconvenience to you.  Please don't follow the procedure for ppnn.ps6 in
Inbox (ppnn.ps6 may not be there anymore).  Instead, following is the
procedure to get the paper "PPNN: A Faster Learning and Better Generalizing
Neural Net":

unix> ftp archive.cis.ohio-state.edu
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get xu.ppnn.ps.Z
ftp> quit
unix> uncompress xu.ppnn.ps.Z
unix> lpr xu.ppnn.ps (or however you print postscript)

Thanks to Jordan again for his continuing efforts.

Bo Xu
Indiana University
itgt500 at indycms.iupui.edu

From karit at spine.hut.fi  Thu Oct 24 05:53:53 1991
From: karit at spine.hut.fi (Kari Torkkola)
Date: Thu, 24 Oct 91 11:53:53 +0200
Subject: Speech recognition research job in Switzerland (REPOST)
Message-ID: <9110240953.AA01337@spine.hut.fi.hut.fi>

----------------------------------------------------------------------------
RESEARCH POSITIONS AVAILABLE IN SPEECH PROCESSING (repost)

The newly created "Institut Dalle Molle
d'Intelligence Artificielle Perceptive" (IDIAP)
in Martigny, Switzerland seeks to hire qualified
researchers in the area of automatic
speech recognition. Candidates should be
able to conduct independent research in a UNIX
environment on the basis of solid theoretical
and applied knowledge. Salaries will be aligned
with those offered by the Swiss government
for equivalent positions. Researchers are expected
to begin activity in the beginning of 1992.

IDIAP is supported by the Dalle Molle
Foundation along with public-sector partners at
the local and federal levels (in Switzerland).

IDIAP is the third institute of
artificial intelligence supported by
the Dalle Molle Foundation, the others
being ISSCO (attached to the University
of Geneva) and IDSIA (situated in Lugano).
The new institute maintains close
contact with these latter centers as well
as with the Polytechnical School of Lausanne
and the University of Geneva.

Applications for a research position at IDIAP
should include the following elements: 
- a curriculum vitae 
- sample publications or technical reports
- a brief description of the research programme
  that the candidate wishes to pursue
- a list of personal references.

Applications are due by December 1, 1991 and may
be sent to the address below:

Daniel Osherson
IDIAP
Case Postale 609
CH-1920 Martigny
SWITZERLAND

For further information by e-mail, 
contact:

osherson at idiap.ch (Daniel Osherson, director) or
karit at idiap.ch    (Kari Torkkola,   researcher)

Please use the latter email address only for
inquiries concerning speech recognition research.


From prechelt at ira.uka.de  Thu Oct 24 11:16:36 1991
From: prechelt at ira.uka.de (prechelt@ira.uka.de)
Date: Thu, 24 Oct 91 16:16:36 +0100
Subject: Terminology (was: batch-mode parallel implementations)
Message-ID: <mailman.467.1149591236.29955.connectionists@cs.cmu.edu>


I noticed a lot of inconsistent use of terminology concerning 
the frequency of weight update in Backprop learning.

I would like to make a suggestion for the meaning of 
certain terms, that is not based on the democratic aspect
of what is used most often, but on investigations in 
a dictionary:

There are three cases:

(a) update after only ONE single example has been seen
(b) update after ALL of the examples have been seen
(c) something in between


The terms used are epoch, block, batch, sample, continuous, on-line.

An EPOCH is (thus saith my dictionary) not only a section of 
time or history (an "era"), but also a turning point.
This should make EPOCH the preferred term for case (b),
because the end of the training set clearly is such a
turning point.

A BATCH is a set of some size, a pile of things or so; with
some inherent need for the information about its size.
Thus it is a good candidate for case (c) and there should 
always be some indication of the size either as an absolute
number, as a fraction of training set size or by some 
qualitative criterion.

BLOCK could be a perhaps even better word for the same, for
computer scientists, because blocks are always groups of a certain 
number of similar objects and the word does not have the danger of
misunderstanding that stems from the term "batch-processing" from
the early days of data processing, where everything was being
executed completely, before you received the results.
Unfortunately, for reasons of other connotations, confusion of 
Block with Epoch is nevertheless very likely.

A SAMPLE is a part picked from a whole, usually for test purposes.
Although it is not absolutely clear, that a sample is just a
single object, in my ears the word tends to sound so.
Thus it should be indicating case (a).

CONTINOUS is a bad term to use, because the individual 
examples are not cut into parts, so BP is always discrete.

ON-LINE usually means something like "available without physical
action, merely by execution of software" and is of course
completely inappropriate to learning, except perhaps where there is 
an infinite training set constantly floating through the machine.


SUMMARY:
--------

Let us use 'Epoch' for (b), 'Batch' for (c) and 'Sample' for (a).
Let us avoid 'continous', 'on-line' and 'block' as much as possible.


I think as scientists we should exercise some discipline in the use
of language, especially when confusion is as close as in the
area of learning systems...  :->

Please direct all comments and flames to me.

  Lutz


Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  D-7500 Karlsruhe 1;  Germany   | they get
(Voice: ++49/721/608-4317, FAX: ++49/721/697760)        | less simple.

From oden at herky.cs.uiowa.edu  Thu Oct 24 12:11:12 1991
From: oden at herky.cs.uiowa.edu (Gregg Oden)
Date: Thu, 24 Oct 91 11:11:12 -0500
Subject: Batch mode in nature?
Message-ID: <9110241611.AA26933@herky.cs.uiowa.edu>


Steve Potter asks
>  I cant think of any biological examples of batch learning, in which
>  sensory data are saved until a certain number of them can be somehow
>  averaged together and conclusions made and remembered. Any ideas?

If by 'sensory data' you mean the most peripheral, unanalyzed input
representations, then probably not.  Otherwise, yes: it has been a
long-term recurring theme in the psychological literature on the
development of concepts that exemplars are remembered with a great
deal of specific detail until a sufficient corpus of them have been
acquired to support the abstraction of a general concept.
(Subsequently, idiosyncratic details may be lost/suppressed through
assimilation to the encompassing category.)  This notion is supported
by the intuitive experience of reflective recognition of regularities;
i. e., insight.  In recent years, it has also gained empirical support
from experimental work, most notably by Lee Brooks and his colleagues.

Some of this was briefly discussed in my chapter in the Annual Review
of Psychology, 1987.  (See also Oden & Lopes, "On the internal structure
of fuzzy subjective categories" in Recent Developments in Fuzzy Set and
Possibility Theory, R. Yager, ed., 1982.)

Gregg Oden
Psychology & Computer Science
U. of Iowa


From huyser at mithril.stanford.edu  Thu Oct 24 18:27:42 1991
From: huyser at mithril.stanford.edu (Karen Huyser)
Date: Thu, 24 Oct 91 15:27:42 PDT
Subject: learning and memory
Message-ID: <9110242227.AA27923@mithril.stanford.edu>

It seems to me people are confusing very different things in the recent
discussion of learning (one-shot, generalization, etc).  A posting from
Ross Gayler quotes Ernst Dow as saying (in the context of one-shot
learning):

> You may be able to identify the painting you saw before, but could you
> make the leap to recognizing all other abstract paintings?

To have the experience of seeing a painting and to be able to recall
the memory of the experience is one kind of learning and memory.
To be told by someone that the painting is of a type called "abstract" is
to add a category label, another kind of learning and memory.  However,
to recognize another painting as abstract or imitate the painting style
one must form a sufficiently rich concept to be able to make a category
with the label "abstract" and the original painting as one member of the
class.  For most humans, this involves questions, insightful answers,
and many more examples of paintings.  As a completely separate conceptual
skill, consider the learning and concept-formation task that goes on
while doing research.  How does it come about that one day we look at
a set of phenomena in a new way, with new concepts and categories?

There are many different skills that appear under the labels "learning"
and "memory".

Karen Huyser
huyser at mojave.stanford.edu

From bill at nsma.arizona.edu  Thu Oct 24 23:04:21 1991
From: bill at nsma.arizona.edu (Bill Skaggs)
Date: Thu, 24 Oct 91 20:04:21 MST
Subject: Continuous vs. Batch learning
Message-ID: <9110250304.AA07667@nsma.arizona.edu>

>It is pretty clear to me that biological neural networks have all adapted
>to prefer the continuous learning technique, as we can verify for humans
>by remembering something that we only saw (or heard, etc.) once.  One-trial
>learning paradigms abound in the behavioral literature.  I cant think of 
>any biological examples of batch learning, in which sensory data are
>saved until a certain number of them can be somehow averaged together
>and conclusions made and remembered. Any ideas?  

  David Marr's theory of the hippocampus proposed that it (the 
hippocampus) is an intermediate-term memory storage device,
performing one-shot learning of experiences and then holding
them for a period of days or weeks until they can be evaluated
for significance and then gradually moved into the neocortex for
permanent storage.

  In my humble opinion this is still the best available
theory of what the hippocampus does.  Some of the details have
changed, but the basic idea still makes sense.

  Patrick Lynn has recently been exploring a more abstract version
of Marr's idea, using a "buffer" of example patterns to train a
recurrent back-prop net, with new patterns going into the
buffer, hanging around for a while, then dropping out.  He
has found that under certain conditions buffering gives
better performance than learning each pattern only when it
is presented.

(Reference:  "Simple memory: a theory for archicortex." D. Marr, 1971,
Phil Trans Roy Soc B 262: 23-81.)

	-- Bill Skaggs

From gary at cs.UCSD.EDU  Fri Oct 25 21:59:28 1991
From: gary at cs.UCSD.EDU (Gary Cottrell)
Date: Fri, 25 Oct 91 18:59:28 PDT
Subject: Seminar abstract: The Sanguine Algorithm
Message-ID: <9110260159.AA09259@desi.ucsd.edu>


                                       SEMINAR

                New approaches to learning in Connectionist Networks

                                Garrison W. Cottrell
                                  Richard K. Belew
                          Institute for Neural Declamation
                Condominium Community College of Southern California


               Previous approaches to learning in recurrent networks  often
          involve  batch  learning: A large amount of effort is expended in
          deciding which way to move in weight space, then a little step is
          taken.  We propose a new algorithm for learning in large networks
          which is orders of magnitude more efficient than batch  learning.
          Based  on the realization that many nearby points in weight space
          are worse  than  where  we  are  now,  we  propose  the  sanguine
          algorithm.   The basic idea is to become more happy with where we
          are, rather than going to all the  work  of  moving.   Hence  the
          approach  is  quite  simple:  Randomly  sample  a nearby point in
          weight space.  Compute the error functional based on that  point.
          If  it  is  better than the current point, repeat until we find a
          nearby point that is worse.  Now, here's the real trick: Once  we
          find  a  point  worse off than where we are now, we stay where we
          are and increment a "happiness function".   That  is,  we  search
          until  we  find  a  place  that  we  can "look down on" in weight
          space[1].

               Now, in order to remain happy with where we are may  involve
          a certain amount of minor work to keep this point in weight space
          looking good.  For example, we could change the error  functional
          until  this  point  looks  better than most other points we find.
          Towards this end,  we  can  apply  recent  techniques  (Nowlan  &
          Hinton, 1991) to make the error functional soft and flabby.  Then
          we can stretch the error any way we like.  This approach can also
          be extended to replace computationally expensive "weight-sharing"
          techniques.  If we make the weights soft and flabby, then lifting
          them  becomes much easier since part of the weight always remains
          on the ground, and sharing the burden of  large  weights  becomes
          unnecessary.  Note that this can be done completely locally.

               We have applied this novel learning procedure to the problem
          of time series prediction.  Using the Mackey-Glass equations with
          dimension 3.5, we give the network values at 0,  6,  12,  and  18
          time units back in time to predict the value of the time series 6
          time units into the  future.  Using  the  Sanguine  Algorithm,  a
          network  with  only  two hidden units rapidly converges to a soft
          error functional.  Of course, the network has  no  idea  of  what
          value will come next; however, the happiness function shows it is
          quite blissful in its ignorance.  We propose that this  technique
          will   have   wide   application   in  Republican  approaches  to
          government.
          ____________________
             [1]Thus the pet name for our algorithm is the "Nyah Nyah Algo-
          rithm".

From steck at spock.wsu.ukans.edu  Sat Oct 26 13:49:10 1991
From: steck at spock.wsu.ukans.edu (jim steck (ME)
Date: Sat, 26 Oct 91 12:49:10 -0500
Subject: Batch Learning and Parallel Implementation
Message-ID: <9110261749.AA04481@spock.wsu.UKans.EDU>


Regarding Parallel implementations of Batch and online learning....
 
     S. Kollias and D. Anastassiou presented an interesting approximate
second order training algorithm using a Least Squares Estimation
Technique at IJCNN 1988  (IEEE Transactions on Circuits and Systems
vol 36 no. 8 ).
 
     This algorithm is interesting because it updates the weights with
each training pair, but performs the update using  information saved
from all previous training pairs.  The algorithm includes a parameter
called a forgetting factor which causes information from the previous
training pairs to slowly be discounted (or forgotten).  This is 
basically a type of learning somewhere inbetween "batch" learning and
"on line" learning.
 
     As an appoximate second order method, it is somewhat computationally
intensive; however, the method is easily and productively vectorized
on parallel architectures.
 
Jim Steck
Wichita State University
 

From todd at galadriel.stanford.edu  Fri Oct 25 17:50:47 1991
From: todd at galadriel.stanford.edu (todd@galadriel.stanford.edu)
Date: Fri, 25 Oct 91 14:50:47 PDT
Subject: MUSIC AND CONNECTIONISM Book Announcement
Message-ID: <9110252150.AA02708@galadriel.stanford.edu>


			      BOOK ANNOUNCEMENT:

			   MUSIC AND CONNECTIONISM
				  edited by
		       Peter M. Todd and D. Gareth Loy


MUSIC AND CONNECTIONISM is now available from MIT Press.  This 280-pp. book
contains a wide variety of recent research in the applications of neural
networks and other connectionist methods to the problems of musical listening
and understanding, performance, composition, and aesthetics.  It consists of a
core of articles that originally appeared in the Computer Music Journal, along
with several new articles by Kohonen, Mozer, Bharucha, and others, and new
addenda to the original articles describing the authors' most recent work.
Topics covered range from models of psychological processing of pitches,
chords, and melodies, to algorithmic composition and performance factors.  A
wide variety of connectionist models are employed as well, including
back-propagation in time, Kohonen feature maps, ART networks, and Jordan- and
Elman-style networks.  We've also included a discussion generated by the
Computer Music Journal articles on the use and place of connectionist systems
in artistic endeavors.  A more detailed description of the book is provided
below (from the jacket text), along with the complete table of contents.

We hope this book will be of use to a wide variety of readers, including
neural network researchers interested in a broad, challenging, and fun new
area of application, cognitive scientists and music psychologists looking for
robust new models of musical behavior, and artists seeking to learn more about
a potentially very useful technology.

MUSIC AND CONNECTIONISM can be found in bookstores that carry MIT Press
publications, or can be purchased directly from MIT Press by calling their
toll-free order number, 1-800-356-0343, and giving the operator this catalog
number: 1CSAT 503, and this book code: TODMH.  By phone and mail-order, the
price is $39.95; in stores, it will probably be $45 (there is some confusion
with the publisher on this point, so I wanted to give out the detailed
information for phone orders to save people some money).

Please drop me a line if you have any questions, and especially if you take up
the gauntlet and pursue research or applications in this area!

                             cheers,
                             peter todd


*****************************************************************************
			   Music and Connectionism
		  edited by Peter M. Todd and D. Gareth Loy

As one of our highest expressions of thought and creativity, music has always
been a difficult realm to capture, model, and understand.  The connectionist
paradigm, now beginning to provide insights into many realms of human
behavior, offers a new and unified viewpoint from which to investigate the
subtleties of musical experience.  \fIMusic and Connectionism\fP provides a
fresh approach to both fields, using techniques of connectionism and parallel
distributed processing to look at a wide range of topics in music research,
from pitch perception to chord fingering to composition.

The contributors, leading researchers in both music psychology and neural
networks, address the challenges and opportunities of musical applications of
network models.  The result is a current and thorough survey that advances our
understanding of musical perception, cognition, composition, and performance
and of the design and analysis of networks.

Music and Connectionism is based on a core of articles originally appearing as
two special issues of the Computer Music Journal.  These have been augmented
with addenda covering more recent research by the authors.  The book opens
with tutorial chapters introducing neural networks in a musical context and
relevant aspects of previous computer music research, making this a
self-contained text.  There are many new chapters, along with new section
introductions, summaries of related work, and a final debate on the artistic
implications of connectionist methods.

Peter M. Todd is a doctoral candidate in the PDP Research Group of the
Psychology Department at Stanford University.  Gareth Loy DMA is an
award-winning composer, member of the Board of Directors of the Computer Music
Association, lecturer in the Music Department of UC San Diego, and member of
the technical staff of Frox Inc.

Contents:

Preface and Introduction
                Peter M. Todd and D. Gareth Loy

Part 1: Background
	Machine Tongues XII: Neural Networks
		Mark Dolson	    		
	Connectionism and Musiconomy
		D. Gareth Loy	       		

Part 2: Perception and Cognition
	A Neural Net Model for Pitch Perception
		Hajime Sano and B. Keith Jenkins
	Connectionist Models for Tonal Analysis
		Don L. Scarborough, Ben O. Miller, and Jacqueline A. Jones
	The Representation of Pitch in a Neural Net Model of Chord Classification
		Bernice Laden and Douglas H. Keefe
	Pitch, Harmony, and Neural Nets:  A Psychological Perspective
		Jamshed J. Bharucha	  		 
	The Ontogenesis of Tonal Semantics:  Results of a Computer Study
		Marc Leman  				 
	Modeling the Perception of Tonal Structure with Neural Nets
		Jamshed J. Bharucha and Peter M. Todd	 
	Using Connectionist Models to Explore Complex Musical Patterns
		Robert O. Gjerdingen			 
	The Quantization of Musical Time: A Connectionist Approach
		Peter Desain and Henkjan Honing		 

Part 3: Applications
	A Connectionist Approach to Algorithmic Composition
		Peter M. Todd					
	Connectionist Music Composition Based on Melodic, Stylistic, and 
          Psychophysical Constraints
		Michael C. Mozer                                
	Creation By Refinement and the Problem of Algorithmic Music Composition
		J.P. Lewis 				        
	A Nonheuristic Automatic Composing Method
		Teuvo Kohonen, Pauli Laine, Kalev Tiits, and Kari Torkkola
	Fingering for String Instruments with the Optimum Path Paradigm
		Samir I. Sayegh				        

Part 4: Conclusions
	Letter from Otto Laske 					
	Responses to Laske by Todd and Loy			
	Further Research and Directions
	      Peter M. Todd 					

List of Author Addresses                                        

From white at teetot.acusd.edu  Fri Oct 25 19:49:14 1991
From: white at teetot.acusd.edu (Ray White)
Date: Fri, 25 Oct 91 16:49:14 -0700
Subject: No subject
Message-ID: <9110252349.AA27577@teetot.acusd.edu>

Larry Fast writes:

> I'm expanding the PDP Backprop program (McClelland&Rumlhart version 1.1) to
> compensate for the following problem:
 
> As Backprop passes the error back thru multiple layers, the gradient has
> a built in tendency to decay.  At the output the maximum slope of 
> the 1/( 1 + e(-sum)) activation function is 0.5.
> Each successive layer multiplies this slope by a maximum of 0.5.
.....
 
> It has been suggested (by a couple of sources) that an attempt should be
> made to have each layer learn at the same rate. ...
 
> The new error function is:  errorPropGain * act * (1 - act)

This suggests to me that we are too strongly wedded to precisely
f(sum) = 1/( 1 + e(-sum)) as the squashing function. That function
certainly does have a maximum slope of 0.25.

A nice way to increase that maximum slope is to choose a slightly different
squashing function.  For example f(sum) = 1/( 1 + e(-4*sum)) would fill
the bill, or if you'd rather have your output run from -1 to +1, then
tanh(sum) would work.  I think that such changes in the squashing function
should automatically improve the maximum-slope situation, essentially by
doing the "errorPropGain" bookkeeping for you.

Such solutions are static fixes. I suggested a dynamic adjustment of the
learning parameter for recurrent backprop at IJCNN - 90 in San Diego
(The Learning Rate in Back-Propagation Systems: an Application of Newton's
Method, IJCNN 90, vol I, p 679). The method amounts to dividing the
learning rate parameter by the square of the gradient of the output
function (subject to an empirical minimum divisor). One should be able
to do something similar with feedforward systems, perhaps on a layer by
layer basis.

- Ray White (white at teetot.acusd.edu)


Please respond directly to  72247.2225 at compuserve.com
 
	Thanks, Larry Fast


From BUTUROVIC%BUEF78%yubgef51.bitnet at BITNET.CC.CMU.EDU  Sun Oct 27 14:17:00 1991
From: BUTUROVIC%BUEF78%yubgef51.bitnet at BITNET.CC.CMU.EDU (BUTUROVIC%BUEF78%yubgef51.bitnet@BITNET.CC.CMU.EDU)
Date: Sun, 27 Oct 1991 21:17 +0200
Subject: forward propagation
Message-ID: <2B147310A0000F63@yubgef51.bitnet>

I am interested in training multi-layer perceptron without using
back-propagation (BP) of the error.
 
MLP training by means of the back-propagation (BP) algorithm is in fact
minimization of the criterion function using the ordinary gradient-descent
minimization algorithm. For this, the computation of derivatives is
necessary. Now, it is off course possible to optimize multi-variable function
without computation of derivatives. One of effective algorithms
of this type is simplex algorithm [1], so it seems logical to utilize it for
MLP training. There are two advantages in avoiding derivatives: first,
transfer functions of the individual neurons may be non-differentiable.
Second, BP utilizes a criterion function that must be
written in the form of the average squared difference between target
and actual outputs (there are variants to this, but, for the
purpose of this discussion, they vary insignificantly), and the
derivative of this function with respect to the weights must be
computable. Using simplex, i. e. not using derivatives,
this limitation can be avoided, as long as the function to be
minimized can be measured. This can be important for applications in
control where we are sometimes not able to express criterion function
as a function of network parameters.
 
There is one serious limitation regarding this algorithm, and it is
spatial complexity. It requires roughly N*N memory locations, where
N is the number of variables (network weights). In practice, this
limits the size of the network to a couple of thousands of weights.
 
In order to verify the behavior of the algorithm, I performed
extensive experiments with Ljubomir Citkusev of the Boston University.
We trained MLP to perform classification tasks on three data sets. In short,
the results obtained indicate that training of the network using simplex
can be done successfully. However, BP is more effective, regarding both
classification accuracy (i. e., function approximation accuracy), and
computational complexity (number of iterations). We didn't yet verify
the ability of the algorithm to train networks with non-differentiable
transfer functions or criterion functions that can not be computed
analitically.
 
It is puzzling that in [2] Minsky and Papert claimed the training
of the perceptrons with hidden layers to be impossible, while at that
time (1969.) there was available effective algorithm for precisely
that task. While BP was shown to be superior in our experiments, they
could have done some quite satisfactory training of the multi-layer
networks when they wrote the book. I tried to talk to Minsky about
this, but I couldn't do it.
 
I would like to hear people's opinion on this idea. Also, it would be
beneficial to know if anyone is aware of similar work.
 
        Thanks, Ljubomir Buturovic, University of Belgrade
 
References
[1] Nelder, J. A., and Mead, R. 1965, Computer Journal, vol. 7,
p. 308.
[2] M. Minsky, and S. Papert, Perceptrons: An Introduction to
Computational Geometry, MIT Press, 1969.
 

From kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET  Mon Oct 28 09:49:58 1991
From: kddlab!crl.hitachi.co.jp!nitin at uunet.UU.NET (Nitin Indurkhya)
Date: Mon, 28 Oct 91 09:49:58 JST
Subject: Robinson's vowel dataset
Message-ID: <9110280049.AA00241@hcrlgw.crl.hitachi.co.jp>

Does anyone have any NEW results on Robinson's vowel dataset. I am aware of
the original results given in his thesis:

A. Robinson. "Dynamic Error Propagation Networks", PhD Thesis, Cambridge Univ
1989.

Please send me mail,

	thanks
	Nitin Indurkhya

	(nitin at crl.hitachi.co.jp)

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Mon Oct 28 00:10:20 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Mon, 28 Oct 91 00:10:20 EST
Subject: Announcement of NIPS Workshop
Message-ID: <mailman.468.1149591236.29955.connectionists@cs.cmu.edu>


The Neural Information Processing Systems Conference will be followed by a
program of workshops in Vail, Colorado on December 6 and 7, 1991.  The
following one-day workshop will be offered on December 6:


	      Constructive and Destructive Learning Algorithms

		   Workshop Leader: Scott E. Fahlman
		       School of Computer Science
		       Carnegie Mellon University
			  Pittsburgh, PA 15213
		      Internet: fahlman at cs.cmu.edu

Most existing neural network learning algorithms work by adjusting
connection weights in a fixed network.  Recently we have seen the emergence
of new learning algorithms that alter the network's topology as they learn.
Some of these algorithms start with excess connections and remove any that
are not needed; others start with a sparse network and add hidden units as
needed, sometimes in multiple layers; some algorithms do both.  These
algorithms eliminate the need to guess in advance what network topology
will best fit a given problem.  In addition, some of these algorithms claim
significant improvements in learning speed and generalization.

A successful two-day workshop on this topic was presented at the NIPS-90
conference.  A number of algorithms were presented by their authors and
were critically evaluated.  The past year has seen a great deal of
additional work in this area, so a second workshop on this topic seems
appropriate.  We will briefly review the major algorithms presented last
year.  Then we will turn to more recent developments, including both new
algorithms and experience gained in using the older ones.  Finally, we will
consider current trends and will try to identify open problems for future
research.

I would like to hear from people who are interested in presenting new
algorithms or results at this workshop.  I would particularly like to hear
from people with application results or comparative studies using
algorithms of this kind.  The tentative plan, depending on the response we
get, is allow 15-20 minutes for each presentation, with ample time for
discussion.  If you would like to present something, please send a short
description to Scott Fahlman, at the internet address listed above.

For Cascade-Correlation fans, I will be presenting a new variation called
"Cascade 2" that performs better than the original in a number of
situations, especially in problems with continuous analog outputs.


From tesauro at watson.ibm.com  Mon Oct 28 11:41:58 1991
From: tesauro at watson.ibm.com (Gerald Tesauro)
Date: Mon, 28 Oct 91 11:41:58 EST
Subject: Program information: NIPS91 Workshops
Message-ID: <mailman.469.1149591236.29955.connectionists@cs.cmu.edu>

The NIPS91 post-conference workshops will take place Dec. 5-7, 1991,
at the Marriott Mark Resort Hotel in Vail, Colorado. The following
message gives information on the program schedule and local
arrangements, and is organized as follows:
    I. Summary schedule
    II. Workshop schedule
    III. Arrangements information
    IV. Workshop abstracts

I. Summary Schedule:

        Thursday, Dec. 5th     5:00 pm           Registration Open
                               7:00 pm           Orientation Meeting
                               8:00 pm           Reception

        Friday, Dec. 6th       7:00 am           Breakfast
                               7:30 - 9:30 am    Workshop Sessions
                               4:30 - 6:30 pm    Workshop Sessions
                               7:00 pm           Banquet

        Saturday, Dec. 7th     7:00 am           Breakfast
                               7:30 - 9:30 am    Workshop Sessions
                               4:30 - 6:30 pm    Workshop Sessions
                               6:30 - 7:00 pm    Wrap-up
                               7:30 pm           Barbecue Dinner
                                                    (optional)


II. Workshop schedule:

        Friday, Dec. 6th:

        Character recognition
        Projection pursuit and neural networks
        Constructive and destructive learning algorithms II
        Modularity in connectionist models of cognition
        VLSI neural networks and neurocomputers (1st day)
        Recurrent networks: theory and applications (1st day)
        Active learning and control (1st day)
        Self-organization and unsupervised learning in vision (1st day)
        Developments in Bayesian methods for neural networks (1st day)

        Saturday, Dec. 7th:

        Oscillations and correlations in neural information processing
        Optimization of neural network architectures for speech recognition
        Genetic algorithms and neural networks
        Complexity issues in neural computation and learning
        Computer vision vs. network vision
        VLSI neural networks and neurocomputers (2nd day)
        Recurrent networks: theory and applications (2nd day)
        Active learning and control (2nd day)
        Self-organization and unsupervised learning in vision (2nd day)
        Developments in Bayesian methods for neural networks (2nd day)


III. Arrangements information:

        Accomodations:

           The conference sessions will be held in the banquet area at
              Marriott Mark Resort, at Vail CO, 90 miles west of Denver.
              For accomodations, call the Mariott at (303)-476-4444.  Our
              room rate is $74 (single or double).  Condos for larger
              groups can be arranged through Destination Resorts, at
              (303)-476-1350.

        Registration:

           Registration fee for the workshops is $100 ($50 for students).

        Transportation:

           CME (Colorado Mountain Express) will be running special shuttles
              from the Sheraton in Denver up to the Marriott in Vail Thursday
              afternoon at a price of $31.00 per person.  Call them at 1-800-
              525-6363, at least 24 hours in advance, to reserve and give a
              credit card number for prepayment.  CME also runs shuttles down
              from Vail to the Denver airport, same price, on Sunday at many
              convenient times.  The earlier you call CME, the more vans will
              be made available for our use.  Be sure to mention our special
              group code "NIPS".

           Hertz has a desk in the Sheraton, and will rent cars at a weekend
              rate for the trip up to Vail and back to the airport in Denver.
              This is an unlimited mileage rate; prices start at $60 (three
              days, plus tax).  To make reservations call the Sheraton at
              1-800-552-7030 and ask for Kevin Kline at the Hertz desk.

        Skiing:

           Skiing at Vail can be expensive.  The lift tickets this year were
           slated to rise to $40 per day.  The conference has negotiated
           very attractive group rates for tickets bought in advance:

              $56  for a 2-day ticket
              $84  for a 3-day ticket
              $108 for a 4-day ticket

           You can purchase these by sending a check to the conference
              registration office:  NIPS*91 Registration, Siemens Research
              Center, 755 College Road East, Princeton, NJ 08540.
              The tickets will be printed for us, and available
              when we get to Vail on Thursday evening.

           There are several sources for rental boots and skis in Vail.
              The rental shop at the lifts and Banner Sports (located in
              the Marriott) are offering the following packages to those
              who identify themselves as NIPS attendees:

                                     skis, boots, poles       skis, poles
              standard package            $  8 / day           $6 / day

              performance package         $ 11 / day           $9 / day

           Banner will, as extra incentives, stay open for us after the
           Thursday orientation meeting, and give a 10% discount on
           anything else in the store.

        Optional Gourmet barbecue dinner(!):

           Finally, besides the conference banquet, included in the
           registration fee, there will be an optional dinner on Saturday
           night at Booco's Station, a few miles outside of Vail and world
           famous for its barbecued meats and special sauces.  Dinner will
           include transportation (if you need it), appetizers,
           all-you-can-eat barbecue, cornbread, vegetables, dessert,
           and more than 40 kinds of beer at the cash bar.  Tickets will
           be on sale at the Sheraton and at the Marriott.  Price: $27.


IV. Workshop Abstracts:
=========================================================================
Modularity in Connectionist Models of Cognition

Organizer: Jordan Pollack, Ohio State Univ.

Speakers: Michael Mozer, Univ of Colorado
          Robert Jacobs, MIT
          John Barnden, New Mexico State University
          Rik Belew, UCSD

Abstract:
Classical modular theories of mind presume mental "organs" - function
specific, put in place by evolution - which communicate in a symbolic
language of thought.  In the 1980's, Connectionists radically rejected
this view in favor of more integrated architectures, uniform learning
systems which would be very tightly coupled and communicate through
many feedforward and feedback connections. However, as connectionist
attempts at cognitive modeling have gotten more ambitious, ad-hoc
modular structuring has become more prevalent. But there are concerns
regarding how much architectural bias is allowable.  There has been a
flurry of work on resolving these concerns by seeking the principles
by which modularity could arise in connectionist architectures. This
will involve solving several major problems - data decomposition,
structural credit assignment, and shared adaptive representations.
This workshop will bring together proponents of modular connectionist
architectures to discuss research direction, recent progress, and
long-term challenges.
=========================================================================
Character Recognition
Organizers: C. L. Wilson and M. D. Garris, National Institute
of Standards and Technology

Speakers: Jon Hull, SUNY Buffalo
	  Tom Vogl, ERIM
	  Jim Keeler, MCC
	  Chris Schofield, Nestor
	  C. L. Wilson, NIST
	  R. G. Casey, IBM

Abstract:
This workshop will consider issues related to present and future testing
needs for character recognition including:
   1) What is user experience in using the NIST and other
      publicly available databases?
   2) What types of databases will be required in the future?
   3) What are future testing needs, such as x-y coordinate stream or
      gray level data?
   4) How can the evaluation of current research problems, such as
      segmentation, be enhanced through carefully designed databases,
      standard testing procedures, and automated evaluation methodologies.
   5) Is the incorporation of context important in testing?
   6) What other issues face the research and development of large scale
      recognition systems?
The target audience includes those interested in and/or working on
hand print recognition and developers who wish to include character
recognition as part of systems to recognize documents.
=========================================================================
Genetic Algorithms and Neural Networks

Organizer: Rik Belew, Univ. of Calif. at San Diego

Speakers: Rik Belew and Dave Rogers

Abstract:
This workshop will examine theoretical and algorithmic
interactions between GA and NNet techniques, as well as
models of the evolutionary constraints on nervous systems.
Specific topics include:
   1) Comparison and composition of global GA sampling techniques
      with the local (gradient) search of NNet methods.
   2) Use of the GA to evolve additional higher-order function
      approximation terms (``hidden units'').
   3) The dis/advantages of GA recombination and its impact
      on appropriate representations for NNets.
   4) Trade-offs between NNet training time and GA generational time.
   5) Parallel implementations of GAs that facilitate NNet simulation.
   6) A role for ontogenesis between GA evolution and NNet learning.
   7) The role optimality (doesn't!) play in evolution
=========================================================================
Projection Pursuit and Neural Networks

Organizers: Ying Zhao, Chris Atkeson and Peter Huber, MIT

Speakers: R.Douglas Martin, University of Washington
	  John Moody, Yale University
	  Ying Zhao, MIT
	  Andrew R. Barron, University of Illinois
	  Nathan Intrator, Brown University
	  Trevor Hastie, Bell Labs

Abstract: Projection Pursuit is a nonparametric statistical technique
to find "interesting" low dimensional projections of high dimensional
data sets. We hope to improve our understanding of neural networks
and projection pursuit by discussing issues such as fast training
algorithms based on PP, duality with kernel approximation, possible
avoidance of the "curse of dimensionality", and the sample complexity
for PP.
=========================================================================
Constructive and Destructive Learning Algorithms II

Organizer: Scott E. Fahlman, Carnegie Mellon University

Speakers: TBA

Abstract:
Recently we have seen the emergence of new learning algorithms that
alter the network's topology.  Some of these algorithms start with
excess connections and remove any that are not needed; others start
with a sparse network and add hidden units as needed, sometimes in
multiple layers; some algorithms do both.  In a two-day workshop
on this topic at NIPS-90, a number of learning algorithms that
modify network topology were presented by their authors and were
critically evaluated.  The past year has seen a great deal of
additional work in this area.  We will briefly review the major
algorithms presented last year.  Then we will turn to more recent
developments, including both new algorithms and experience gained
in using the older ones.  Finally, we will consider current trends
and will try to identify open problems for future research.
=========================================================================
Oscillations and Correlations in Neural Information Processing

Organizer: Ernst Niebur, Caltech

Speakers: Bard Ermentrout, U. of Pittsburgh
	  Hennric Jokeit, U. of Munich
	  Marius Usher, Weizmann Institute
	  Ernst Niebur, Caltech

Abstract:
This workshop will address models proposed for tasks like tieing
together the different parts of one object in the visual field
or for binding the different representations of an object in
different cortical areas. Both oscillation-based models as well
as alternative models based on phase coherence (correlations)
will be considered in the light of the latest experimental findings.
=========================================================================
Optimization of Neural Network Architectures for Speech Recognition

Organizers: Uli Bodenhausen, Universitaet Karlsruhe
	    Alex Waibel, Carnegie Mellon University

Speakers: Kenichi Iso, NEC Corporation, Japan
	  Patrich Haffner, CNET, France
	  Mike Franzini, Telefonica I + D, Spain

Abstract:
A variety of neural network algorithms have recently been applied to
speech recognition tasks. Besides having learning algorithms for
weights, optimization of the network architectures is required to
achieve good performance. Also of critical importance is the
optimization of neural network architectures within hybrid systems
for best performance of the system as a whole. Parameters that have
to be optimized within these constraints include the number of hidden
units, number of hidden layers, time-delays, connectivity within the
network, input windows, the number of network modules, number of states
and others. The proposed workshop intends to discuss and evaluate the
importance of these architectural parameters and different integration
strategies for speech recognition systems. Participating researchers
interested in speech recognition are welcome to present short case
studies on the optimization of neural networks, preferably with an
evaluation of the optimization steps. The workshop could also be of
interest to researchers working on constructive/destructive learning
algorithms because the relevance of different architectural parameters
should be considered for the design of these algorithms.
=========================================================================
SELF-ORGANIZATION AND UNSUPERVISED LEARNING IN VISION

Organizer: Jonathan A. Marshall, Univ. of North Carolina

Speakers: Suzanna Becker, University of Toronto
	  Irving Biederman, University of Southern California
	  Thomas H. Brown, Yale University
	  Joachim M. Buhmann, Lawrence Livermore National Laboratory
	  Heinrich Bulthoff, Brown University
	  Edward Callaway, Duke University
	  Allan Dobbins, McGill University
	  Gillian Einstein, Duke University
	  Charles Gilbert, The Rockefeller Universty
	  John E. Hummel, UCLA
	  Daniel Kersten, University of Minnesota
	  David Knill, University of Minnesota
	  Laurence T. Maloney, New York University
	  Jonathan A. Marshall, University of North Carolina at Chapel Hill
	  Paul Munro, University of Pittsburgh
	  Albert L. Nigrin, American University
	  Alice O'Toole, The University of Texas at Dallas
	  Jurgen Schmidhuber, University of Colorado
	  Nicol Schraudolph, University of California at San Diego
	  Michael P. Stryker, University of California at San Francisco
	  Patrick Thomas, Technische Universitat Muenchen
	  Rich Zemel, University of Toronto

Abstract:
This workshop considers the role that unsupervised learning
procedures (e.g. Hebb-type rules) may play in the self-organization
of cortical structures involved in the processing of visual
information. Researchers in visual neuroscience, visual psychophysics
and neural network modeling will be brought together to address
head-on the key issue of how animal visual systems got the way
they are. We hope that this will lead to a better understanding
of the factors that shape the structure of animal visual systems,
as well as better models of the neurophysiological processes
underlying vision.
=========================================================================
Developments in Bayesian methods for neural networks
Organizers: David MacKay, Caltech
	    Steve Nowlan, Salk Institute

Abstract:
The first day of this workshop will be 50% tutorial in content,
reviewing some new ways Bayesian methods may be applied to neural
networks.  The rest of the workshop will be devoted to discussions of
the frontiers and challenges facing Bayesian work in neural networks,
including issues such as Monte Carlo clustering, data selection,
active query learning, prediction of generalisation, missing inputs,
unlabelled data and discriminative training, Discussion will be
moderated by John Bridle.

Speakers: Radford Neal
	  Jurgen Schmidhuber
	  John Moody
	  David Haussler + Michael Kearns
	  Sara Solla + Esther Levin
	  Steve Renals

Reading up before the workshop
------------------------------
People intending to attend this workshop are encouraged to obtain
preprints of relevant material before NIPS. A selection of preprints
are available by anonymous ftp, as follows:

unix> ftp hope.caltech.edu		(or ftp 131.215.4.231)
login: anonymous
password: <your name>
ftp> cd pub/mackay
ftp> get README.NIPS
ftp> quit

Then read the file README.NIPS for further information.
Problems? Contact David MacKay, mackay at hope.caltech.edu
=========================================================================
Active Learning and Control

Organizers: David Cohn, Univ. of Washington
	    Don Sofge, MIT

Speakers: C. Atkeson, MIT
	  A. Barto, Univ. of Massachussetts, Amherst
	  J. Hwang, Univ. of Washington
	  M. Jordan, MIT
	  A. Moore, MIT
	  J. Schmidhuber, University of Colorado, Boulder
	  R. Sutton, GTE
	  S. Thrun, Carnegie-Mellon University

Abstract:
An "active" learning system is one that is not merely a passive
observer of its environment, but instead play an active role in
determining its inputs.  This definition includes classification
networks that query for values in "interesting" parts of their domain,
learning systems that actively "explore" their environment, and
adaptive controllers that learn how to produce control outputs to
achieve a goal.

Common facets of these problems include building world models in
complex domains, exploring a domain to safely and efficiently, and,
planning future actions based on one's model.

In this workshop, our main focus will be to address key unsolved
problems which may be holding up progress on these problems rather
than presenting polished, finished results. Our hopes are that
unsolved problems in one field may be able to draw on insight from
research in other fields.
=========================================================================
Computer Vision vs Network Vision
Organizers: John Mayhew and Terry Sejnowski

Speakers: TBA

Abstract:
Computer vision has developed a methodology based on sound
engineering practice:  1.  Break the problem down into well-defined
subproblems and mathematically analyze each part;  2.  Develop efficient
algorithms for each module; 3. Implement each algorithm with the best
available technology.  These are Marr's three levels: computational,
algorithmic, and implementational.

In contrast, proponents of neural networks have developed a different
methodology:  1.  Find a good representation for the input data that makes
explicit the features needed to solve the problem; 2.  Use learning algorithms
to cluster and categorize the data; 3.  Glue together networks that solve
different parts of the problem with more learning.  Networks are memory
intensive and constraints from the hardware level are as important as
constraints from the computational level.

This workshop is intended to provoke a lively and free-wheeling
discussion of the central issues in vision.
=========================================================================
Complexity Issues in Neural Computation and Learning

Organizers: Kai-Yeung Sui and Vwani Roychowdhury, Stanford Univ.

Speakers: TBA

Abstract: The goal of this workshop is to address recent developments
in understanding the capabilities and limitations of various models
for neural computation and learning. Topics will include: 1) circuit
complexity of neural networks, 2) capacity of neural networks, and
3) complexity issues in learning algorithms.
=========================================================================
RECURRENT NETWORKS: THEORY AND APPLICATIONS

Organizers: Luis Borges de Almeida, INESC
	    C. Lee Giles, NEC Research Institute
	    Richard Rohwer, Edinburgh University

Speakers: TBA

Abstract:
Recurrent neural networks have a very large potential for handling
dynamical / sequential problems, e.g. recognition and classification
of time-dependent signals like speech, modelling and control of
dynamical systems, learning of grammars and symbolic processing, etc.
However, the fulfillment of this potential remains an important
open issue. Training algorithms are very inefficient in terms of
memory and computational demands.  Little is known about convenient
architectures. The number of known successful applications is very
limited.  This is true even for static applications (operation in the
"fixed point mode").

The first day of this two-day workshop will focus on the outstanding
theoretical issues in recurrent neural networks, and the second
day will examine existing and potential real-world applications.
=========================================================================
VLSI Neural Networks and Neurocomputers

Organizers: Clifford Lau, Office of Naval Research
	    Jim Burr, Stanford University

Speakers: TBA

Abstract:
This two-day workshop will address the latest advances in VLSI
implementations of neural nets, and the design of high performance
neurocomputers.  We will present an updated list of currently
available neurochips, and discuss a wide range of issues, including:
   1) Design issues: Advantage and disadvantage of analog and digital
      approaches; how much arithmetic precision is necessary;
      which algorithms have been implemented; importantance of on-chip
      learning; neurochip design in existing CAD environment.
   2) Performance issues: Critical factors to achieve robust performance;
      Tradeoffs between capacity and performance; scaling limits to
      constructing large neural networks.
   3) Use of neurochips: What input/output devices are necessary;
      what programming support environment is necessary.
   4) Application areas for supercomputing neurocomputers

From zeiden at cs.wisc.edu  Mon Oct 28 10:30:14 1991
From: zeiden at cs.wisc.edu (zeiden@cs.wisc.edu)
Date: Mon, 28 Oct 91 09:30:14 CST
Subject: tech report available in NEUROPROSE 
Message-ID: <9110281530.AA29229@ai.cs.wisc.edu>

I have placed the following tech report in the NEUROPROSE ftp
archive at Ohio State, under the name zeidenberg.containment.ps.Z

Implementing Spatial Relations in Neural Nets: The Case of
Figure/Ground and Containment

Matthew Zeidenberg
zeiden at cs.wisc.edu

A neural network system that computes the relation of containment
between objects in a retina-like input array is described. This
system is multi-layer, and operates by recognizing and segmenting the
objects in the input to place them in separated arrays. The figure of
each object, that is, the set of all pixels on the perimeter of or
contained in the object, is computed for each object, using a method
that involves a connectionist implementation of a standard algorithm
using parity networks. These figures are then used to compute containment
relations between the objects in the input.

ftp Instructions:

unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get zeidenberg.containment.ps.Z
ftp> quit
unix> uncompress zeidenberg.containment.ps.Z
unix> lpr zeidenberg.containment.ps (or other command to print
postscript)


From black at seismo.CSS.GOV  Mon Oct 28 12:01:00 1991
From: black at seismo.CSS.GOV (Mike Black)
Date: Mon, 28 Oct 91 12:01:00 EST
Subject: What is current technology in Analog Neural Nets?
Message-ID: <9110281701.AA21092@beno.CSS.GOV>


I've seen little discussion and have found no references to work
in analog neural networks.  If you can provide some references or
indicate what your current work is I'll summarize.  These are
the goals for my current research:

Given an analog data source (e.g. pulse generator):

1.  Recognize pulses (for example a single shot square wave) and reject
"noise" (i.e. triangular wave) at rates of at least 10MHz (that is, it
should be able to deal with a minimum 100ns pulse width).

2.  Provide the trigger for an external digitizer to grab the resultant
"good" pulses.

3.  Be software controllable (hardware should be able to be updated by
remote control).


Please forward any current work or capability in this area to:
black at beno.css.gov

>> -------------------------------------------------------------------------------
>> : usenet: black at beno.CSS.GOV   :  land line: 407-494-5853  : I want a computer:
>> : real home: Melbourne, FL     :  home line: 407-242-8619  : that does it all!:
>> -------------------------------------------------------------------------------


From lissie!botsec7!botsec1!dcl at uunet.UU.NET  Mon Oct 28 13:54:54 1991
From: lissie!botsec7!botsec1!dcl at uunet.UU.NET (David Lambert)
Date: Mon, 28 Oct 91 13:54:54 EST
Subject: Resource Allocation Network (RAN)
Message-ID: <9110281854.AA20399@botsec1.bot.COM>


Dear Connectionists:

Has anyone tried to implement the Resource Allocation Network
of John Platt (NIPS 3 and Neural Computation V3 #2)?  I have
a first cut at an implementation, and so far I have not been
able to approach his published results.

I'd be very interested in corresponding with anyone who has
tried this algorithm.

Also, if anyone has a means of reaching John Platt, I'd love
to hear about it.  I've been calling Synaptics in San Jose for
over a week now, and there don't seem to be any humans that
work there...only voice mail.

Thanks

David Lambert
dcl at object.com or dcl at panix.com

From khosla at latcs1.lat.oz.au  Mon Oct 28 22:32:44 1991
From: khosla at latcs1.lat.oz.au (Rajiv Khosla)
Date: Tue, 29 Oct 91 14:32:44 +1100
Subject: Spatial crosstalk and modular NN architecture
Message-ID: <9110290332.AA18704@latcs1.lat.oz.au>


Dear Connectionists,

          This is regarding my problem of making a 28-11-26, binary
input/output neural network work. Thanks to everyone who sent me the
replies. Its working nice and kicking. Best results are achieved by
connecting the input layer to the output layer. 

               Thanks once again
                                                Rajiv

From terry at jeeves.UCSD.EDU  Tue Oct 29 02:35:07 1991
From: terry at jeeves.UCSD.EDU (Terry Sejnowski)
Date: Mon, 28 Oct 91 23:35:07 PST
Subject: Continuous vs. Batch learning
Message-ID: <9110290735.AA01748@jeeves.UCSD.EDU>

There is evidence that the hippocampus is doing something like
batch mode teaching for neocortex.  The hippocampus is needed for
one-shot learning, also called declarative or episodic learning.
It seems to be storing up a lot of examples and over a period of
months transfers this informaiton to cortex, where it is stored
in a more categorical representation.

Terry

-----

From smieja at jargon.gmd.de  Tue Oct 29 05:14:40 1991
From: smieja at jargon.gmd.de (Frank Smieja)
Date: Tue, 29 Oct 91 11:14:40 +0100
Subject: Batch methods versus stochastic methods...
In-Reply-To: mmoller@daimi.aau.dk's message of Mon, 21 Oct 91 13:13:06 +0100
Message-ID: <9110291014.AA24169@jargon.gmd.de>

-) Unfortunately, we do not have any datasets of the proper size.
-) So I would appreciate if anyone could inform me about where to find big 
-) datasets that are public available.
-) 
-) -- Martin M
-) 
-) -----------------------------------------------------------------------
-) Martin F. Moller	       	email: mmoller at daimi.aau.dk
-) Computer Science Department	phone: +45 86202711 5223
-) Aarhus University		fax:    +45 86135725
-) Ny Munkegade, Building 540
-) 8000 Aarhus C
-) Denmark
-) ----------------------------------------------------------------------

I demonstrated in my paper "MLP Solutions, Generalization and Hidden
Unit Representations" in the DANIP (Distributed And Neural Information
Processing) conference in Bonn, Germany, April 1989 (ed: Kindermann &
Linden, pub: Oldenbourg Verlag), how one might "synthetically"
construct a training set of any size of inputs/outputs, that may be
generalized, insofar that the "regularities" beloved by our networks
are guaranteed to exist, since they are used to generate the training
set pairs, but not visible to the network until the examples are seen,
and the learning results in "emergent generalization".  I used this
method in the paper to study a small diagnosis problem, but scaling up
is no problem.

If you cannot get hold of this book, and would like to see the paper,
I can make it available in the neuroprose archive (unfortunately
without figures, but they are not needed to explain the method).  If
this is also difficult, I will send hard copies to interested parties.

Please send such requests directly to me (smieja at gmdzi.uucp) and I
will either reply directly or to the bboard.

	-Frank Smieja


From joachim at gmdzi.gmd.de  Tue Oct 29 12:57:47 1991
From: joachim at gmdzi.gmd.de (Joachim Diederich)
Date: Tue, 29 Oct 91 16:57:47 -0100
Subject: New Paper
Message-ID: <9110291557.AA14221@gmdzi.gmd.de>

The following paper has been placed in the Neuroprose archives at
Ohio  State.   The  file  is "diederich.hybrid.ps.Z." See ftp in-
structions below.


         Efficient Question Answering in a Hybrid System

           Joachim Diederich (1,2) & Debra L. Long (2)

 (1) German National Research Center for Computer Science (GMD)
               Schloss Birlinghoven, P.O. Box 1240
                  D-5205 St.Augustin 1, Germany

                  (2) Department of Psychology
                 University of California, Davis
                     Davis, CA 95616, U.S.A.


ABSTRACT:

A connectionist model for answering open-class questions  in  the
context of text processing is presented. The system answers ques-
tions from different question categories, such  as  "How,"  Why,"
and  "Consequence" questions. These question categories have been
identified in several empirical studies (Graesser & Clark,  1985;
Graesser,  1990). The system responds to a question by generating
a set of possible answers that are weighted  according  to  their
plausibility.  Search is performed by means of a massively paral-
lel, directed spreading activation process.  The  search  process
operates  on  several knowledge sources (i.e., connectionist net-
works) that are learned or explicitly built-in. Spreading activa-
tion involves the use of signature messages (Lange & Dyer, 1989).
Signature  messages  are  numeric  values  that  are   propagated
throughout  the  networks  and  identify  a  particular  question
category (this makes the system hybrid). Binder units  that  gate
the flow of activation between textual units receive these signa-
tures and change their states. That is, the binder  units  either
block the spread of activation or allow the flow of activation in
a certain direction. The process results in a pattern of  activa-
tion  that  represents a set of candidate answers based on avail-
able knowledge sources.

This paper will appear in the IJCNN-91 Singapore Proceedings.


unix> ftp archive.cis.ohio-state.edu
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get diederich.hybrid.ps.Z
ftp> quit
unix> uncompress diederich.hybrid.ps.Z
unix> lpr diederich.hybrid.ps


Joachim Diederich
German National Research Center for Computer Science (GMD)
P.O. Box 1240
D-5205 St. Augustin 1
Germany

From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Tue Oct 29 15:23:46 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Tue, 29 Oct 91 15:23:46 EST
Subject: Robinson's vowel dataset 
In-Reply-To: Your message of Mon, 28 Oct 91 09:49:58 +0200.
             <9110280049.AA00241@hcrlgw.crl.hitachi.co.jp> 
Message-ID: <mailman.470.1149591237.29955.connectionists@cs.cmu.edu>


    Does anyone have any NEW results on Robinson's vowel dataset. I am aware of
    the original results given in his thesis:
    
    A. Robinson. "Dynamic Error Propagation Networks", PhD Thesis, Cambridge Univ
    1989.
    
I don't know of any more recent publications on this problem.  I got some
rather good results using Cascade-Correlation:

(train 300 300 25))
SigOff 0.10, WtRng 1.00, WtMul 1.00
OMu 2.00, OEps 1.00, ODcy 0.0300, OPat 12, OChange 0.010
IMu 2.00, IEps 10.00, IDcy 0.0300, IPat 8, IChange 0.030
Utype :SIGMOID, Otype :SIGMOID, RawErr NIL, Pool 32

Trial 0:  181 of 462 cases wrong, 281 right, 60.82% @ 23 hidden
Trial 1:  174 of 462 cases wrong, 288 right, 62.34% @ 11 hidden
Trial 2:  193 of 462 cases wrong, 269 right, 58.23% @ 24 hidden
Trial 3:  174 of 462 cases wrong, 279 right, 60.39% @ 15 hidden
Trial 4:  180 of 462 cases wrong, 282 right, 61.04% @ 24 hidden
Trial 5:  186 of 462 cases wrong, 276 right, 59.74% @ 17 hidden
Trial 6:  188 of 462 cases wrong, 274 right, 59.31% @ 11 hidden
Trial 7:  174 of 462 cases wrong, 288 right, 62.34% @ 15 hidden
Trial 8:  173 of 462 cases wrong, 289 right, 62.55% @ 13 hidden
Trial 9:  170 of 462 cases wrong, 292 right, 63.20% @ 18 hidden
Avg:      180 of 462 cases wrong, 282 right, 61.03% @ 17 hidden

The test set was run after each output training phase and the best value
obtained is the one reported.

The best results obtained by Robinson were 260 right (56%) for nearest
neighbor, and 253 right (55%) for 528 Gaussian nodes or 88 square nodes.
Backprop with 88 sigmoids never got better than  234 (51%).

I've never published these results, because I think they are a bit of a
cheat.  The problem is that I played around with the decay factor and other
parameters until I got good results on the test set.  It's not clear that
the same setting would give equally good performance on a new test set that
I had never seen.  Also, in all cases the algorithm obtained a solid level
of 59% or so, but then wandered up and down, in no particular pattern, as
new units were added.  I can get a good number -- up to 63% -- by grabbing
the best point on this random walk, but I don't honestly believe that the
network at that point would give equally good results on new test data
drawn from the same distribution.

What we really need is a much larger data set for this problem.  Then we
could split the set into training data (a larger set, offering much better
generalization), cross-validation data (used to determine when training
should stop), and final test data, never used in training.  The the current
set is so small that it's not possible to split things up this way.

-- Scott Fahlman

From kak at max.ee.lsu.edu  Tue Oct 29 16:36:52 1991
From: kak at max.ee.lsu.edu (Dr. S. Kak)
Date: Tue, 29 Oct 91 15:36:52 CST
Subject: No subject
Message-ID: <9110292136.AA01849@max.ee.lsu.edu>

	    CALL FOR PAPERS
	    
            Special Issue On

     NETWORKS FOR NEURAL PROCESSING
      

Circuits,  Systems, and  Signal  Processing

Guest Editors: W.A. Porter, University of Alabama, Huntsville
               S.C. Kak, Louisiana State University, Baton Rouge


Papers are solicited on the theoretical foundations, challenging
applications and efficient parallel architectures for neural computing.
Suggested topics include: training for generalization, use of higher
order moments, rapid training algorithms, nonbinary design, optimization
networks, and mapping networks. Papers which critique and/or compare
recent developments in neural computation are also of interest.

Papers should be prepared according to the Information for Contributors
on the inside back cover of Circuits, Systems, and Signal Processing.
Papers should be submitted in triplicate by January 20, 1992 in care of:

Professor William A. Porter
Department of Electrical and Computer Engineering
The Univesity of Alabama in Huntsville
Huntsville, AL 35899
[Tel. (205) 895-6858]
 
For further information contact Professor S.C. Kak at kak at max.ee.lsu.edu
or contact Professor W.A. Porter. 

From dlukas at PARK.BU.EDU  Tue Oct 29 13:34:49 1991
From: dlukas at PARK.BU.EDU (David Lukas)
Date: Tue, 29 Oct 91 13:34:49 -0500
Subject: Faculty position in Cognitive & Neural Systems at Boston University
Message-ID: <9110291834.AA29864@cns.bu.edu>


                       Assistant Professor 

                   Cognitive and Neural Systems 

                        Boston University 


Boston University seeks to hire a tenure track assistant professor 

starting in Fall 1992 for its graduate Department of Cognitive and 

Neural Systems. The Department offers an integrated curriculum 

offering the full range of psychological, neurobiological, and 

computational concepts, models, and methods in the fields 

of neural networks, computational neuroscience, parallel distributed 

processing, and biological information processing, in which Boston 

University is a leader. Candidates should have extensive analytic 

or computational research experience in modelling nonlinear neural

networks, especially in one or more of the areas: learning, speech

and language processing, adaptive pattern recognition, cognitive

information processing, and adaptive sensory-motor control. Send a 

complete curriculum vitae and three letters of recommendation to

Stephen Grossberg, Chairman, Search Committee, Department of Cognitive 

and Neural Systems, Room 240, 111 Cummington Street, Boston University, 

Boston, MA  02215, no later than January 1, 1992. Boston University 

is an Equal Opportunity/Affirmative Action employer.


If you have questions or require further information, please reply
to Carol Jefferson---caroly at cns.bu.edu.


From demers at cs.UCSD.EDU  Tue Oct 29 16:48:36 1991
From: demers at cs.UCSD.EDU (David DeMers)
Date: Tue, 29 Oct 91 13:48:36 PST
Subject: Generalization
Message-ID: <9110292148.AA15810@beowulf.ucsd.edu>


A short while back there was a discussion of generalization;
I recall contributions by Wolpert and Goldfarb, among others.
I didn't save the exchanges, however I'd like to look at
them now.  Unfortunately, I can't seem to connect up to the
archive to retrieve the mailings.  If anyone has most of the
discussion still lying around, I'd appreciate it if you
could mail it to me; also, I'd appreciate anyone's opinion on 
"what is generalization" in 250 words or less :-)

I do have most of David Wolpert's papers, so don't need
another copy of them...

Thanks for any help,

Dave


From hcard at ee.UManitoba.CA  Wed Oct 30 15:14:07 1991
From: hcard at ee.UManitoba.CA (hcard@ee.UManitoba.CA)
Date: Wed, 30 Oct 91 14:14:07 CST
Subject: batch learning
Message-ID: <9110302014.AA00760@card.ee.umanitoba.ca>


In the PDP books batch learning accumulates error derivatives from each pattern rather than simply their contributions to the total error, before making weight changes. It seems that gradient descent ought to add all the errors before taking any derivatives. Any comments?

Howard Card

From petsche at learning.siemens.com  Wed Oct 30 15:33:38 1991
From: petsche at learning.siemens.com (Thomas Petsche)
Date: Wed, 30 Oct 91 15:33:38 EST
Subject: NIPS travel (limited cheap airfare)
Message-ID: <9110302033.AA12077@learning.siemens.com>

FYI:  United has a special fare program available until tomorrow.  We
just booked a round trip from Newark to Denver (leave Monday morning
and return Sunday morning) for $250.


From Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU  Tue Oct 29 22:59:19 1991
From: Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU (Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU)
Date: Tue, 29 Oct 91 22:59:19 EST
Subject: Resource Allocation Network (RAN) 
In-Reply-To: Your message of Mon, 28 Oct 91 13:54:54 -0500.
             <9110281854.AA20399@botsec1.bot.COM> 
Message-ID: <mailman.471.1149591237.29955.connectionists@cs.cmu.edu>


Have you tried E-mail?  I exchanged some mail with him a month or so ago:

John Platt <platt at synaptics.com>

-- Scott

From ruizdeangulo%ispgro.cern.ch at BITNET.CC.CMU.EDU  Wed Oct 30 04:55:26 1991
From: ruizdeangulo%ispgro.cern.ch at BITNET.CC.CMU.EDU (ruizdeangulo%ispgro.cern.ch@BITNET.CC.CMU.EDU)
Date: Wed, 30 Oct 91 10:55:26 +0100
Subject: batch-continous-one shot
Message-ID: <9110300955.AA03462@dxmint.cern.ch>

Referring to the batch-continous-one-shot learning discussion, in the reference
bellow we describe an algorithm that can be labeled as one-shot learning. I
think it fits well with the Plutowski and White method described recently.
 
 
>What we do (as reported in the tech report by Plutowski & White)
>is sequentially grow the training set, first finding
>an "optimal" training set of size 1, then fitting the network to this
>training set, appending the training set with a new exemplar selected from
>the set of available candidates, obtaining a training set of size 2 which
>is "approximately optimal",  fitting this set,  appending a third exemplar,
 etc,
>continuing the process until the network fit obtained by training over the
>exemplars fits the rest of the available examples within the desired tolerance.
 
 The MDL (Minimal disturbance learning) algorithm introduces a new exemplar
minimizing an estimation of the loss function (error increment) over the old
patterns.It makes a little search for this optimization but whatever the
stopping point(for this search), perfect recall of the new exemplar is gotten.
The network is not forced to assume any special kind of local-representation.
 
 
Ruiz de Angulo,V.,Torras, C.(1991) Minimally disturbing Learning. In the
proceedings of the IWANN 91.Springer Verlag
 

From edelman at wisdom.weizmann.ac.il  Thu Oct 31 04:08:00 1991
From: edelman at wisdom.weizmann.ac.il (Shimon Edelman)
Date: Thu, 31 Oct 91 11:08+0200
Subject: Resource Allocation Network (RAN)
In-Reply-To: <9110281854.AA20399@botsec1.bot.COM>
Message-ID: <19911031090807.2.EDELMAN@YAD.weizmann.ac.il>

A similar technique of RBF center allocation, in conjunction with 
other modifications of RBF learning, was successful in replicating
human performance in the difficult visual task of hyperacuity 
vernier discrimination. See AI Memo 1271, "Synthesis of visual
modules from examples: learning hyperacuity", by T. Poggio, 
M. Fahle and S. Edelman (January 1991). Center allocation is
discussed there on p.7.

-Shimon Edelman
edelman at wisdom.weizmann.ac.il

From dfausett at zach.fit.edu  Thu Oct 31 09:48:43 1991
From: dfausett at zach.fit.edu ( Donald W. Fausett)
Date: Thu, 31 Oct 91 09:48:43 -0500
Subject: What is current technology in Analog Neural Nets?
Message-ID: <9110311448.AA02454@zach.fit.edu>

	Prof. Bernard Widrow at Stanford University (EE Dept) would
be a likely source to stir you in the right direction.  Locally, you
might try Prof. Hal Brown at FIT (EE Dept).
	Good luck.  -- Don Fausett

From lissie!botsec7!botsec1!dcl at UUNET.uu.net  Thu Oct 31 10:13:26 1991
From: lissie!botsec7!botsec1!dcl at UUNET.uu.net (David Lambert)
Date: Thu, 31 Oct 91 10:13:26 EST
Subject: Resource Allocation Network (RAN)
Message-ID: <9110311513.AA24956@botsec1.bot.COM>


Hi.  Thanks to all respondents concerning my RAN question.
I managed to get in touch with John Platt, and he was most
helpful.

John Platt writes:

>   Someone forwarded me your posting on the connectionist mailing list..
> Could you please follow up, and say that you have successfully used
> RAN?  It would be nice to leave an impression of a working algorithm...

My sincere apologies for being lax in my courtesies, John.  You're
right, of course.

I got RAN working just fine, and it works as well (if not better than)
advertised.

To those who asked for a copy of the resulting code, I'll probably release
it sometime soon, through one mechanism or another.

Thanks again.

David Lambert
dcl at object.com or dcl at panix.com

From B344DSL at UTARLG.UTA.edu  Wed Oct  9 23:55:00 1991
From: B344DSL at UTARLG.UTA.edu (B344DSL@UTARLG.UTA.edu)
Date: Wed, 9 Oct 1991 22:55 CDT
Subject: Announcement and call for abstracts for Feb. conference
Message-ID: <01GBK4XORVOW000MGU@utarlg.uta.edu>

ANNOUNCEMENT AND CALL FOR ABSTRACTS

          WORKSHOP ON

OPTIMALITY IN BIOLOGICAL AND ARTIFICIAL NETWORKS?

Sponsored by the Metroplex Institute for Neural Dynamics (MIND) and the Texas
SIG of the International Neural Network Society (INNS).  To be held at a loca-
tion to be announced in the Dallas-Fort Worth area, Thursday through Saturday,
February 6-8, 1992.

Confirmed speakers include:

	Stephen Grossberg (Boston University)
	Stephen Hampson (University of California, Irvine)
	Karl Pribram (Radford University)
	Harold Szu (Naval Surface Warfare Center)
	Graham Tattersall (University of East Anglia)

The focus of this conference will be twofold: (1) how to optimize different
aspects of neural and cognitive function and (2) whether particular natural or
artificial solutions to specific neural or cognitive problems are in fact opti-
mal.  Specific problems to which these optimality considerations are applied will be taken from many areas including goal direction and planning, adaptive cat-
egorization, sensory perception, and motor control.

The talks will be an hour each for invited speakers and 45 minutes each for contributed speakers, with time afterwards for questions.  Speakers will not be re-
quired to write a paper, but will be invited to contribute chapters to a book 
several months after the conference.  Books based on two previous MIND conferen-
ces  -- on Motivation, Emotion, and Goal Direction in Neural Networks and NeuralNetworks for Knowledge Representation and Inference -- are now being published
by Lawrence Erlbaum Associates.

Registration for the conference will be $80 for non-students, $20 for students,
with a $10 rebate for MIND or Texas SIG membership.  We will try to arrange for
discounted air fares from American Airlines as we have done in the past.  Those
interested in presenting should send me a short (1-3 paragraph) abstract by
December 1, 1991, using either e-mail, FAX, or snail mail.  Notification of ac-
ceptance will be given December 15, 1991.  We will not be holding parallel ses-
sions, so there are limitations on the number of speakers.  However, individu-
als who send high-quality abstracts that cannot be accommodated in actual talks
will have space to present their work in posters at the conference, and will 
also be invited to contribute to the book.

	Prof. Daniel S. Levine
	Department of Mathematics
	University of Texas at Arlington
	Arlington, TX 76019-0408 

	e-mail: b344dsl at utarlg.uta.edu
	FAX: 817-794-5802
	Telephone: 817-273-3598