From karit at spine.hut.fi  Mon Jun  3 14:07:02 1991
From: karit at spine.hut.fi (Kari Torkkola)
Date: Mon, 3 Jun 91 14:07:02 DST
Subject: Research positions in speech and image processing
Message-ID: <9106031107.AA08981@spine.hut.fi.hut.fi>


                  RESEARCH POSITIONS AVAILABLE

     The newly created "Institut Dalle Molle d'Intelligence Arti-
ficielle  Perceptive"  (IDIAP)  in  Martigny Switzerland seeks to
hire qualified researchers in the areas of speech recognition and
image  manipulation.   Candidates  should  be able to conduct in-
dependent research in a UNIX environment on the  basis  of  solid
theoretical and applied knowledge.  Salaries will be aligned with
those offered by the Swiss government for  equivalent  positions.
Laboratories  are  now  being  established in the newly renovated
building that houses the  Institute,  and  international  network
connections  will  soon be in place.  Researchers are expected to
begin activity during the academic year 1991-1992.

     IDIAP is the third institute of artificial intelligence sup-
ported by the Dalle Molle Foundation, the others being ISSCO (at-
tached to the University of Geneva) and IDSIA  (situated  in  Lu-
gano).   The new institute will maintain close contact with these
latter centers as  well  as  with  the  Polytechnical  School  of
Lausanne and the University of Geneva.

     To apply for a research position at  IDIAP,  please  send  a
curriculum vita and technical reports to:

                   Daniel Osherson, Directeur
                              IDIAP
                        Case Postale 609
                        CH-1920 Martigny
                           Switzerland

     For further information by e-mail, contact:

                    osherson at disuns2.epfl.ch


From issnnet at park.bu.edu  Mon Jun  3 11:27:36 1991
From: issnnet at park.bu.edu (issnnet@park.bu.edu)
Date: Mon, 3 Jun 91 11:27:36 -0400
Subject: RFD: comp.org.issnnet
Message-ID: <9106031527.AA06005@copley.bu.edu>


                        REQUEST FOR DISCUSSION
			----------------------

GROUP NAME: 	comp.org.issnnet

STATUS:		unmoderated

CHARTER:	The newsgroup shall serve as a medium for discussions
		pertaining to the International Student Society for
		Neural Networks (ISSNNet), Inc., and to its activities
		and programs as they pertain to the role of students
		in the field of neural networks. See details below.

TARGET VOTING DATE: 	JUNE 20 - JULY 20, 1991
				   
******************************************************************************
			     PLEASE NOTE

        In agreement with USENET newsgroup guidelines for the creation
	of new newsgroups, this discussion period will continue until
	June 21, at which time voting will begin if deemed
	appropriate. ALL DISCUSSION SHOULD TAKE PLACE ON THE NEWSGROUP

			     "news.groups"
			     
        If you do not have access to USENET newsgroups but wish to
	contribute to the discussion, send your comments to:
			 issnnet at park.bu.edu
        specifying whether you would like your message relayed to
	news.groups. A call for votes will be made to the same
	newsgroups and mailing lists that originally received this
	message.
				   
    PLEASE DO NOT SEND REPLIES TO THIS MAILING LIST OR NEWSGROUP DIRECTLY!

        A call for votes will be broadcast in a timely fashion. Please
	do not send votes until then.

******************************************************************************

BACKGROUND AND INFORMATION:

   The purpose of the International Student Society for Neural
Networks (ISSNNet) is to (1) provide a means of exchanging information
among students and young professionals within the area of Neural
Networks; (2) create an opportunity for interaction between students
and professionals from academia and industry; (3) encourage support
>from academia and industry for the advancement of students in the area
of Neural Networks; (4) insure that the interest of all students in
the area of Neural Networks is taken into consideration by other
societies and institutions involved with Neural Networks; and (5) to
foster a spirit of international and interdisciplinary kinship among
students as the study of Neural Networks develops into a
self-contained discipline.

   Since its creation one year ago, ISSNNet has grown to over 300
members in more than 20 countries around the world. One of the biggest
problems we have faced thus far is to efficiently communicate with all
the members. To this end, a network of "governors" has been created.
Each governor is in charge of distributing information (such as our
newsletter) to all local members, collect dues, notify local members
of relevant activities, etc.

   However, even this system has problems. Communication to a possibly
very large number of members relies entirely on one individual, and
given the typically erratic schedule of a student, it is often
difficult to insure prompt and timely distribution to all members.

   More to the point, up until this time all governors have been
contacting a single person (yours truly), and that has been a problem.
Regular discussions on the society and related matters become very
difficult when routed through individuals in this fashion.

   The newsgroup would be primarily dedicated to discussion of items
pertaining to the society. We are about to launch a massive call for
nominations, in the hope that more students will step forward and take
a leading role in the continued success of the society.

   In addition, ISSNNet is involved with a number of projects, many of
which require extensive electronic mail discussions. For example, we
are developing a sponsorship program for students presenting papers at
NNet conferences. This alone has generated at least 100 mail messages
to the ISSNNet account, most of which could have been answered by two
or three "generic" postings.

   We have refrained from using some of the existing mailing lists and
USENET newsgroups that deal with NNets because of the non-technical
nature of our issues. In addition to messages that are strictly
society-related, we feel that there are many messages posted to these
existing bulletin boards for which our newsgroup would be a better
forum. Here is a list of topics that frequently come up, which would
be handled in comp.org.issnnet as part of our "sponsored" programs:
				   
		"What graduate school should I go to?"

Last year, ISSNNet compiled a list of graduate programs around the
world. The list will be updated later this year to include a large
number of new degree programs around the world.

				   
		      "What jobs are available?"

We asked companies that attended last year's IJCNN-San_Diego and
INNC-Paris conferences to fill out a questionnaire on employment
opportunities for NNet students.

				   
	   "Does anyone have such-and-such NNet simulator?"

Many students have put together computer simulations of NNet paradigms
and these could be shared by people on this group.

				   
		 "When is the next IJCNN conference?"

We have had a booth at past NNet conferences, and hope to continue
doing this for more and more international and local meetings. We
often have informal get-togethers at these conferences, where students
and others have the opportunity to meet.


-----------------------------------------------------------------------

For more information, please send e-mail to issnnet at park.bu.edu (ARPANET)
write to:

	ISSNNet, Inc.
	PO Box 557, New Town Br.
	Boston, MA 02258   USA

ISSNNet, Inc. is a non-profit corporation in the Commonwealth of
Massachusetts. 

ISSNNet, Inc.
P.O. Box 557, New Town Branch
Boston, MA  02258  USA

From dcp+ at cs.cmu.edu  Mon Jun  3 15:51:50 1991
From: dcp+ at cs.cmu.edu (David Plaut)
Date: Mon, 03 Jun 91 15:51:50 EDT
Subject: Preprint: Effects of Word Abstractness in a Connectionist Model of Deep Dyslexia 
Message-ID: <1831.675978710@DWEEB.BOLTZ.CS.CMU.EDU>

The following paper is available in the neuroprose archive as
plaut.cogsci91.ps.Z.  It will appear in this year's Cognitive Science
Conference proceedings.  A much longer paper presenting a wide range of related
work is in preparation and will be announced shortly.

    Effects of Word Abstractness in a Connectionist Model of Deep Dyslexia

	David C. Plaut			Tim Shallice              
	School of Computer Science	Department of Psychology  
	Carnegie Mellon University	University College, London
	dcp at cs.cmu.edu			ucjtsts at ucl.ac.uk         

Deep dyslexics are patients with neurological damage who exhibit a variety of
symptoms in oral reading, including semantic, visual and morphological effects
in their errors, a part-of-speech effect, and better performance on concrete
than abstract words.  Extending work by Hinton & Shallice (1991), we develop a
recurrent connectionist network that pronounces both concrete and abstract
words via their semantics, defined so that abstract words have fewer semantic
features.  The behavior of this network under a variety of ``lesions''
reproduces the main effects of abstractness on deep dyslexic reading: better
correct performance for concrete words, a tendency for error responses to be
more concrete than stimuli, and a higher proportion of visual errors in
response to abstract words.  Surprisingly, severe damage within the semantic
system yields better performance on *abstract* words, reminiscent of CAV, the
single, enigmatic patient with ``concrete word dyslexia.''

To retrieve this from the neuroprose archive type the following:
unix> ftp 128.146.8.62
Name: anonymous
Password: neuron
ftp> binary
ftp> cd pub/neuroprose
ftp> get plaut.cogsci91.ps.Z
ftp> quit
unix> zcat plaut.cogsci91.ps.Z | lpr

-------------------------------------------------------------------------------
David Plaut				dcp+ at cs.cmu.edu
School of Computer Science		412/268-8102
Carnegie Mellon University
Pittsburgh, PA  15213-3890

From mjolsness-eric at CS.YALE.EDU  Wed Jun  5 15:50:55 1991
From: mjolsness-eric at CS.YALE.EDU (Eric Mjolsness)
Date: Wed, 5 Jun 91 15:50:55 EDT
Subject: TR: Bayesian Inference on Visual Grammars by NNs that Optimize
Message-ID: <9106051951.AA25379@NEBULA.SYSTEMSZ.CS.YALE.EDU>

The following paper is available in the neuroprose archive as
mjolsness.grammar.ps.Z:


	    Bayesian Inference on Visual Grammars
	        by Neural Nets that Optimize


			Eric Mjolsness
        	Department of Computer Science
        		Yale University
        	   New Haven, CT 06520-2158

			YALEU/DCS/TR854
			    May 1991

Abstract:

We exhibit a systematic way to derive neural nets for vision
problems.  It involves formulating a vision problem as Bayesian
inference or decision on a comprehensive model of the visual domain
given by a probabilistic {\it grammar}.  A key feature of this
grammar is the way in which it eliminates model information, such
as object labels, as it produces an image; correspondance problems
and other noise removal tasks result.  The neural nets that arise
most directly are generalized assignment networks.  Also there are
transformations which naturally yield improved algorithms such as
correlation matching in scale space and the Frameville neural nets
for high-level vision.  Deterministic annealing provides an effective
optimization dynamics.  The grammatical method of  neural net design
allows domain knowledge to enter from all levels of the grammar,
including ``abstract'' levels remote from the final image data, and
may permit new kinds of learning as well.


The paper is 56 pages long.

To get the file from neuroprose:

              unix> ftp cheops.cis.ohio-state.edu (or 128.146.8.62)
              Name: anonymous
              Password: neuron
              ftp> cd pub/neuroprose
              ftp> binary
              ftp> get mjolsness.grammar.ps.Z
              ftp> quit
              unix> uncompress mjolsness.grammar.ps.Z
              unix> lpr mjolsness.grammar.ps (or however you print postscript)

-Eric

-------


From jm2z+ at andrew.cmu.edu  Thu Jun  6 13:47:19 1991
From: jm2z+ at andrew.cmu.edu (Javier Movellan)
Date: Thu,  6 Jun 91 13:47:19 -0400 (EDT)
Subject: Are they really worth the effort ?
Message-ID: <4cHbIbi00Uh_M2RVlH@andrew.cmu.edu>

I'd like to have a debate about the advantages of distributed over local
representations. 

I mean sure, distributed representations are great for they work in 2^n
instead of n space, they degrade gracefully and all these PDP Bible type
of things. But ...  are they really that good ? For one thing they make
our life awfully difficult in terms of undertanding and manipulating
them .. .Are they really worth the effort ? Do you have concrete
examples in your work where they did a better job than local
representations ?


Javier


From ogs0%dixie.dnet at gte.com  Thu Jun  6 17:21:22 1991
From: ogs0%dixie.dnet at gte.com (Oliver G. Selfridge)
Date: Thu, 6 Jun 91 17:21:22 -0400
Subject: Warren McCulloch's widow
Message-ID: <9106062121.AA05259@bunny.gte.com>

I sadly announce that Rook McCulloch, widow to Warren McCulloch, dies
last night at the age of 92. Warren himself, with Walter Pitts wrote
the revolutionary introduction to neural nets in the middle 40s in two
well-knwon papers. Rook maintained a bright aned contributory life up
to the end and we will all miss her.

                      Oliver Selfridge

From jm2z+ at andrew.cmu.edu  Thu Jun  6 18:19:45 1991
From: jm2z+ at andrew.cmu.edu (Javier Movellan)
Date: Thu,  6 Jun 91 18:19:45 -0400 (EDT)
Subject: are they worth the effort II
Message-ID: <4cHfI1q00WBK03HW0G@andrew.cmu.edu>

Please send your thoughts to connectionists so that we  all can be
instructed about the advantages of distributed representations.


By the way, I already got two responses that I will sumarize bellow. 

Response number one provided the following arguments:

1- Brain uses distributed representations. He cites Lashley's (1929)
experiments where rats show graceful performance degradation when they
were partially deprived of their cortex.

2- Distributed representations are more resistant to degradation. He
claims this may have military implications (systems resistant to enemy
fire type of thing). 

[ OK does anybody out ther have data showing that distributed
representations are more noise resistant than  local representations ? I
mean one can always clone the local representations and get noise
resistance that way -Javier ]


3- He claims distributed representations performed very well in his
research projects.

[ Unfortunately he confuses distributed representations with
backpropagation (BP). It is BP that worked well. It is always possible
to force BP to develop local representations and perhaps it would work
better that way. -Javier ]


Response number two claims that *very* distributed representations are
probably the wrong way to go. He said  "Sligthly" distributed
representations (like the ones used in Kruschke's ALCOVE model) are
better. Unfortunately he does not provide any data supporting this point.


I just got response # 3, which claims that distributed representations
performed consistently better than local in the NETtalk domain and in
isolated letter speech.

[ Tom, could you send me some references ? Thanks - Javier ]


-- Javier


From tgd at turing.CS.ORST.EDU  Thu Jun  6 18:11:44 1991
From: tgd at turing.CS.ORST.EDU (Tom Dietterich)
Date: Thu, 6 Jun 91 15:11:44 PDT
Subject: Are they really worth the effort ?
In-Reply-To: Javier Movellan's message of Thu,  6 Jun 91 13:47:19 -0400 (EDT) <4cHbIbi00Uh_M2RVlH@andrew.cmu.edu>
Message-ID: <9106062211.AA13213@turing.CS.ORST.EDU>

In my studies of error-correcting output codes, I found that these
codes---which are particularly neat distributed
representations---performed consistently better than local
representations in the NETtalk domain and in isolated letter speech
recognition. 

--Tom

Thomas G. Dietterich
Department of Computer Science
Dearborn Hall, 303
Oregon State University
Corvallis, OR 97331-3102
503-737-5559

From Nigel.Goddard at B.GP.CS.CMU.EDU  Thu Jun  6 19:06:13 1991
From: Nigel.Goddard at B.GP.CS.CMU.EDU (Nigel.Goddard@B.GP.CS.CMU.EDU)
Date: Thu, 6 Jun 91 19:06:13 EDT
Subject: distributed/local
Message-ID: <mailman.413.1149540223.24850.connectionists@cs.cmu.edu>


Both extremes are wrong for representing conceptual knowledge (i.e.,
one unit per concept versus all units participate in all concepts).
Disadvantages of extreme local include no tolerance to failure (neurons
die all the time), difficult to express nuance without impossibly large
numbers of units.  The big advantage is it is easy to see what is going
on, to design structures.  Disadvantages of extreme distributed include
crosstalk when more than one item is active and difficulty communicating
an active item from one part of the architecture to another (too many links
required).  The big advantages are fault-tolerance (graceful degradation)
and generalization.  The answer is something in between the extremes
(not that this is news to anyone), depending on what the task is.  Order
logn units per concept for an n-unit net might be a good place to start.  

Feldman has a TR discussing these issues in much more depth (TR 189,
"Neural Representation of Conceptual Knowledge", Computer Science Dpt,
Univ. Rochester, NY 14627).  Also published as a book chapter, I believe.


Nigel Goddard

From soller%asylum at cs.utah.edu  Thu Jun  6 22:54:43 1991
From: soller%asylum at cs.utah.edu (Jerome Soller)
Date: Thu, 6 Jun 91 20:54:43 -0600
Subject: Request for Information on Cognitive Science Curriculum
Message-ID: <9106070254.AA24372@asylum.utah.edu>

	At the University of Utah, we are in the process of putting together
a curriculum for Cognitive Science degrees at the undergraduate and graduate
level.  This faculty/student initiative is being led by Dr. Dick Burgess of
Physiology.  We were wondering what classes and sequences are considered
to form the core of established Cognitive Science degree granting programs at 
graduate and undergraduate levels?


							Jerome Soller
							Department of C.S.
							U. of Utah
							soller at asylum.utah.edu

From slehar at park.bu.edu  Fri Jun  7 08:56:58 1991
From: slehar at park.bu.edu (Steve Lehar)
Date: Fri, 7 Jun 91 08:56:58 -0400
Subject: Distributed Representations
In-Reply-To: connectionists@c.cs.cmu.edu's message of 7 Jun 91 09:39:59 GM
Message-ID: <9106071256.AA15832@park.bu.edu>


I think the essence of this debate is in the nature of the input data.
If your  input is  boolean in  nature  and reliably correct,  then the
processing  performed  on it can be  similarly boolean and  sequential
with a great saving in time and space.  It is when the input is fuzzy,
ambiguous and  distributed that  the  sequential logical  boolean type
of processing runs into problems.

A perfect example is  image understanding.  No  single local region of
the image  is sufficient   for  reliable  identification.    Try  this
yourself- punch a little hole in a big piece of  paper and lay it on a
randomly selected photograph    and see  how  much  you  can recognize
through that one local aperture.  You have no way of  knowing what the
local feature is without the global context, but  how do  you know the
global context  without   building   it up  out of   the local pieces?
Studies of  the visual system  suggest that  in nature this problem is
solved by a parallel optimization of all  the local pieces in parallel
together with many levels  of  global representations, such  that  the
final   interpretation is a   kind of relaxation due    to all  of the
constraints felt at  all of  the different representations all  at the
same time.   This is the basic  idea of Grossberg's BCS/FCS algorithm,
and is in contrast to a more  sequential "AI" approach where the local
pieces are each evaluated independantly, and the  results passed on to
the next stage.  I would  claim that such an  approach can never  work
reliably with natural images.

I would  be happy to  provide more  information on the BCS/FCS  and my
implementations of it to interested parties.

From hendler at cs.UMD.EDU  Fri Jun  7 10:40:58 1991
From: hendler at cs.UMD.EDU (Jim Hendler)
Date: Fri, 7 Jun 91 10:40:58 -0400
Subject: distributed/local
In-Reply-To: Nigel.Goddard@B.GP.CS.CMU.EDU's message of Thu, 6 Jun 91 19:06:13 EDT <9106071428.AA09615@mimsy.UMD.EDU>
Message-ID: <9106071440.AA23704@dormouse.cs.UMD.EDU>

For what it's worth, some preliminary results showing a well-behaved
relationship between local and distributed reps are in paper I had at
the NIPS conf (Advances in Neur. Info. Proc. Sys I - Touretzky (ed),
1989, p.553).  I have followed up on this work a little, with a better
anaylsis of the relationship described in last year's Cog. Sci.
Conference, but the work is pretty preliminary.  I've pretty much
stopped pursuing this actively, but anyone wanting to pick up on it is
welcome...  -J. Hendler

From hu at eceserv0.ece.wisc.edu  Fri Jun  7 11:22:28 1991
From: hu at eceserv0.ece.wisc.edu (Yu Hu)
Date: Fri, 7 Jun 91 10:22:28 -0500
Subject: What is distributed/local representation
Message-ID: <9106071522.AA18585@eceserv0.ece.wisc.edu>

While lots of buzzzzz words such as graceful degradation, appear in the
discussion, may I ask a rather naive question:  

Shall someone give a mathematically (or .....ly) sound definition of
distribution and local representation (of what?) then we proceed to 
discuss them?  

Suppose the representations are for data vector in an N-dimensional 
space.  Is Distributed representation referred to data with many non-zero
elements, and local representation to the opposite?  If not, what are they?


Regards,

Yu Hen Hu
Department of Electrical and Computer Engr.               (608)262-6724(phone)
Univ. of Wisconsin - Madison                              (608)262-1267(fax)
1415 Johnson Drive                                        hu at engr.wisc.edu 
Madison, WI 53706-1691
U.S.A.

From indurkhy at paul.rutgers.edu  Fri Jun  7 12:10:42 1991
From: indurkhy at paul.rutgers.edu (Nitin Indurkhya)
Date: Fri, 7 Jun 91 12:10:42 EDT
Subject: Are they really worth the effort ?
Message-ID: <9106071610.AA17674@paul.rutgers.edu>

>In my studies of error-correcting output codes, I found that these
>codes---which are particularly neat distributed
>representations---performed consistently better than local
>representations in the NETtalk domain and in isolated letter speech
>recognition. 

in our own studies with the NETtalk dataset that you gave us, we found
that local representations were competitive. the results are reported in
"reduced complexity rule induction" by weiss and indurkhya (to be presented
at ijcai-91).

--nitin

From lina at mimosa.physio.nwu.edu  Fri Jun  7 12:47:29 1991
From: lina at mimosa.physio.nwu.edu (Lina Massone)
Date: Fri, 7 Jun 91 11:47:29 CDT
Subject: No subject
Message-ID: <9106071647.AA05357@mimosa.physio.nwu.edu>

About distributed representations

The concept of distributed representation is intimately related to the
concept of redundancy. The central nervous system makes a great use of
redundant representations in the way receptive/projective fields are
organized.
I do not agree on the fact that distributed/redundant
representations are primarily a protection against possible injuries
or failures of the components; I'd rather consider that as a useful
side-effect. To me the main values of redundancy are: greater sensitivity,
higher resolution, improvement of signal-to-noise ratio, reduction of
demand for stability of performance and for precision in ontogenesis.
In general a comparison between the activity of a population of neurons and
the activity of a single neuron will show that the population is sensitive
to lower stimulus intensities, smaller increments, briefer events, higher
frequencies, wider dynamic ranges than a single neuron and is less
disturbed by independent drift and instability.
As far as the amount of redundancy, there is some physiological evidence
that the coding of information in the CNS is a compromise between fully
distributed and fully localized. Given that the available number of neurons
is limited, an entity (a piece of information) cannot be represented over a
very large population of neurons that overlaps almost completely with the
population activated by a different entity; this would cause a high degree
of interference and would correspond to a very inefficient memory storage
system. To maintain some degree of orthogonality within a limited number of
neurons, the CNS makes the number of neurons - active for each stimulus -
low. In other words each entity is represented across an ensemble of
neurons but the ensemble is of limited size. 
As far as coarse coding, Ken Laws raised the issue of matching the
structure of data with the code. I agree on that. The CNS 
does that by having neighboring receptors stimulated by neighboring fractions
of the impinging world, i.e. by means of a topological principle. An example
of the computational advantages of this idea for control problems is given in

L. Massone, E. Bizzi (1990) On the role of input representations in
sensorimotor mapping, Proc. IJCNN, Washington D.C.


Lina Massone

From tgd at turing.cs.orst.edu  Fri Jun  7 12:45:26 1991
From: tgd at turing.cs.orst.edu (Tom Dietterich)
Date: Fri, 7 Jun 91 09:45:26 PDT
Subject: Distributed Representations
In-Reply-To: Ken Laws's message of Thu 6 Jun 91 22:02:27-PDT <676270947.0.LAWS@AI.SRI.COM>
Message-ID: <9106071645.AA16085@turing.CS.ORST.EDU>

   Date: Thu 6 Jun 91 22:02:27-PDT
   From: Ken Laws <LAWS at ai.sri.com>
   Mail-System-Version: <SUN-MM(229)+TOPSLIB(128)@AI.SRI.COM>


   I'm not sure this is the same concept, but there were several
   papers at the last IJCAI showing that neural networks worked
   better than decision trees.  The reason seemed to be that
   neural decisions depend on all the data all the time, whereas
   local decisions use only part of the data at one time.

This is not the same concept at all.  You are worrying about locality
in the input space, whereas distributed representations usually
concern (lack of) locality in the output space or in some intermediate
representation. I have applied decision trees to learn distributed
representations of output classes, and in all of my experiments,
the distributed representation performs better than learning either
one large tree (to make a k-way discrimination) or learning k separate
trees.  I believe this is because a distributed representation is able
to correct for errors made in learning any individual output unit.
The paper "dietterich.error-correcting.ps.Z" in the neuroprose archive
presents experimental support for this claim. 

   I've never put much stock in the military reliability claims.
   A bullet through the chip or its power supply will be a real
   challenge.  Noise tolerance is important, though, and I suspect
   that neural systems really are more tolerant.

It isn't a neural vs. non-neural issue:  distributed representations
are more redundant, and hence, more resistant to (local) damage.
Noise tolerance is also not a neural vs. non-neural issue.  To achieve
noise tolerance, you must control over-fitting.  There are many ways
to do this:  low-dimensional representations, smoothness assumptions,
minimum description length methods, cross-validation, etc.

   Terry Sejnowski's original NETtalk work has always bothered me.
   He used a neural network to set up a mapping from an input
   bit string to 27 output bits, if I recall.  I have never seen
   a "control" experiment showing similar results for 27 separate
   discriminant analyses, or for a single multivariate discriminant.
   I suspect that the results would be far better.  The wonder of
   the net was not that it worked so well, but that it worked
   at all.

I think you should perform these studies before you make such claims.
I myself doubt them very much, because the NETtalk task violates the
assumptions of discriminant analysis.  In my experience,
backpropagation works quite well on the NETtalk task.  We have found
that Wolpert's HERBIE (which is a kind of weighted 4-nearest-neighbor
method) and generalized radial basis functions do better than
backpropagation, but everything else we have tried does worse
(decision trees, perceptrons, Fringe).

   I have come to believe strongly in "coarse-coded" representations,
   which are somewhat distributed.  (I have no insight as to whether
   fully distributed representations might be even better.  I suspect
   that their power is similar to adding quadratic and higher-order
   terms to a standard statistical model.)  The real win in
   coarse coding occurs if the structure of the code models
   structure in the data source (or perhaps in the problem
   to be solved).

                                           -- Ken Laws

The real win in any problem comes from good modelling, of course.  But
since we can't guarantee a priori that our representations are good
models, it is important to develop ways for recovering from
inappropriate models.  I believe distributed representations provide
one such way.

--Tom Dietterich


From dhw at t13.Lanl.GOV  Fri Jun  7 14:31:42 1991
From: dhw at t13.Lanl.GOV (David Wolpert)
Date: Fri, 7 Jun 91 12:31:42 MDT
Subject: No subject
Message-ID: <9106071831.AA11289@t13.lanl.gov>


Javier Movellan wonders about the relative "advantages of distributed
over local representations". He asks of members of the net, "Do you
have concrete examples in your work where they did a better job than
local representations?

I have concrete examples in which they do worse - sometimes far worse.
See references below.


		David Wolpert (dhw at tweety.lanl.gov)


D. H. Wolpert, "A benchmark for how well neural nets generalize",
Biological Cybernetics, 61 (1989), 303-315.

D. H. Wolpert, "Constructing a generalizer superior to NETtalk via
a mathematical theory of generalization", Neural Networks, 3 (1990),
445-452.

D. H. Wolpert, "Improving the performance of generalizers via
time-series-like pre-processing of the learning set", Los Alamos
Report LA-UR-91-350, submitted to IEEE PAMI.


From kukich at flash.bellcore.com  Fri Jun  7 17:26:05 1991
From: kukich at flash.bellcore.com (Karen Kukich)
Date: Fri, 7 Jun 91 17:26:05 -0400
Subject: distributed vs. local encoding schemes
Message-ID: <9106072126.AA06750@flash.bellcore.com>

I ran some back-prop spelling correction experiments a few years ago
in which one of the control variables was the use of distributed vs.
local encoding schemes for both input and output representations. 
Local encodings schemes were clear winners in both speed of learning
and performance (correction accuracy for novel misspellings).

To clarify, a local output scheme was simply a 1-of-n vector (n=200)
where each node represented one word in the lexicon;  a "semi-local"
input scheme was a 15*30=450-unit vector where each 30-unit block 
locally encoded one letter in a word of up to 15 characters.  This 
positionally-encoded input scheme was thus local w.r.t individual
letters in a word but distributed w.r.t the whole word.  (Incidentally,
the nets took slightly longer to learn to correct the shift-variant
insertion and deletion errors, but they eventually learned them as
well as the shift-invariant substitution and transposition errors.) 
The distributed encoding schemes were m-distance lexicodes, where
m is the Hamming distance btwn codes.  Thus lexicode-1 is just a
binary number code.  I tried lexicodes of m=1,2,3 and 4 for both
output words and input letters.  Both speed of learning and correction 
accuracy improved linearly with increasing m.  These results were 
published in a paper that appeared in the U.S. Post Office Avanced 
Technology Conference in May of 1988.  My only interpretation of the 
results is that local encoding schemes simplify the learning task 
for nets; I'm convinced that distributed schemes are essential
for cognitive processes such as semantic representation at least,
due to the need for multi-dimensional semantic access and association.

As an epilog, I ran a few more experiments afterword that left
me with a small puzzle.  In the above experiments I had also found 
that performance improved as the number of hidden nodes increased up
to about n(=200) and then leveled off.  Afterwords, I tested the
local net with the 450-unit positionally-encoded input scheme and 
NO hidden nodes and found performance equal to or better than any net 
with a hidden layer and much faster learning.  But when I tried a 
shift-invariant input encoding scheme, in which misspellings were
encoded by a 420-unit vector representing letter bigrams and unigrams,
I found similarly good performance for nets with hidden layers but 
miserable performance for a net with no hidden layer.  Apparently, 
the positionally-encoded input scheme yields a set of linearly-
separable input classes but the shift-invariant scheme does not.
It's still not clear to me why this is?

Karen Kukich
kukich at bellcore.com

From ps_coltheart at vaxa.mqcc.mq.oz.au  Sat Jun  8 10:51:22 1991
From: ps_coltheart at vaxa.mqcc.mq.oz.au (Max Coltheart)
Date: Sat, 8 Jun 91 09:51:22 est
Subject: distributed representations
Message-ID: <9106072351.AA01618@macuni.mqcc.mq.oz.au>

The original posting about this mentioned the property of graceful degradation
as one of the virtues of systems that use distributed respresentations. In what
way is this a virtue? For nets that are doing some engineering job such as
character recognition, it would obviously be good if some damage or malfunction
didn't much affect the net's performance. But for nets that are meant to be
models of cognition, the hidden assumption seems to be that after brain damage
there is graceful degradation of cognitive processing, so the fact that nets
show graceful degradation too means they have promise for modelling cognition.

But where's the evidence that brain damage degrades cognition gracefully? That
is, the person just gets a little bit worse at a lot of things? Very commonly,
exactly the opposite happens - the person remains normal at almost all kinds
of cognitive processing, but some specific cognitive task suffers catastroph-
ically. No graceful degradation here.

I could give very many examples: I'll just give one (Semanza & Zettin, 
Cognitive Neuropsychology, 1988 5 711). This patient, after his stroke, had
impaired language, but this impairment was confined to language production
(comprehension was fine) and to the production of just one type of word: proper
nouns. He could understand proper nouns normally, but could produce almost none
whilst his production of other kinds of nouns was normal. What's graceful about
this degradation of cognition? 

If cognition does *not* degrade gracefully, and neural nets do, what does this
say about neural nets as models of cognition?

Max Coltheart

From dave at cogsci.indiana.edu  Fri Jun  7 22:03:50 1991
From: dave at cogsci.indiana.edu (David Chalmers)
Date: Fri, 7 Jun 91 21:03:50 EST
Subject: distributed reps
Message-ID: <mailman.414.1149540223.24850.connectionists@cs.cmu.edu>

Properties like damage resistance, graceful degradation, etc, are all nice,
useful, cognitively plausible possibilities, but I would have thought that
by far the most important property of distributed representation is the
potential for systematic processing.

Obviously ultra-local systems (every possible concept represented by an
arbitrary symbol) don't allow much systematic processing, as each symbol
has to be handled by its own special rule: e.g. <if "CAT" do this>, <if
"DOG" do that> (though things can be improved somewhat by connecting the
symbols up, as e.g. in a semantic network).  Things are much improved by
using compositional representations, as e.g. found in standard AI.  If
you represent many concepts by compounding the basic tokens, then certain
semantic properties can be reflected in internal structure -- e.g.
"LOVES(CAT, DOG)" and "LOVES(JOHN,BILL)" have relevantly similar internal
structures -- opening the door to processing these structures in systematic
ways.

Distributed representations just take this idea a step further.  One
sees the systematicity made possible by giving representations internal
structure as above, and says "why stop there?"  e.g. why not give every
representation internal structure (why should CATs and DOGs miss out?).
Compositional representations as above only represent a limited range of
semantic properties systematically in internal structure -- namely,
compositional properties.  All kinds of other semantic properties might be
fair game.  By moving to e.g. vectorial representation for every concept,
then e.g. the similarity structure of the semantic space can be reflected
in the similarity structure of the representational space, and so on.  And
it turns out that you can process compositional properties systematically
too (though not quite as easily).  The combination of a multi-dimensional
space with a large supply of possible non-linear operations seems to open
up a lot of possible kinds of systematic processing, essentially because
these operations can chop up the space in ways that standard operations on
compositional structures can't.

The proof is in the pudding, i.e. the kinds of systematic processing that
connectionist networks exhibit all the time.  Most obviously, automatic
generalization: new inputs are slotted into some representational form,
hopefully leading to reasonable behaviour from the network.  Similarly for
dealing with old inputs in new contexts.  By comparison, with ultra-local
representations, generalization is right out (except by assimilating new
inputs into an old category, e.g. by nearest neighbour methods).  Using
compositional representations, certain kinds of generalization are obviously
possible, as with decision trees.  These suffer a bit from having to deal
directly with the original input space, rather than developing a new
representational space as with dist reps: so you (a) don't get the very
useful capacity to take a representation that's developed and use it for
other purposes (e.g. as context for a recurrent network, or as input for
some new network), and (b) are likely to have problems on very large input
spaces (anyone using decision trees for vision?).  Both (a) and (b) suggest
that decision trees may be unlikely candidates for the backbone of a
cognitive architecture (conversely, the ability of connectionist networks
to transform one representational space into another is likely to be key
to their success as a cognitive architecture).  As for generalization
performance, that's an empirical matter, but the results of Dietterich etc
seem to indicate that decision trees don't do quite as well, presumably
because of the limited ways in which they can chop up a representational
space (nasty rectangular blocks vs groovy flexible curves).  There's far
too much else that could be said, so I'll stop here.

Dave Chalmers.

From tsejnowski at UCSD.EDU  Fri Jun  7 22:48:11 1991
From: tsejnowski at UCSD.EDU (Terry Sejnowski)
Date: Fri, 7 Jun 91 19:48:11 PDT
Subject: distributed/local
Message-ID: <9106080248.AA27620@sdbio2.UCSD.EDU>

A nice paper that compares ID3 decision trees with backprop
on NETtalk and other data sets:

Shavlik, J. W., Mooney, R. J., and Towell, G. G.
Symbolic and neural learning algorithms: An experimental
comparison (revised).
Univ. Wisconsin Dept Comp. Sci Tech Report #955
(to appear Machine Learning #6).

Overall, backprop performed slightly better than ID3 but took
longer to train.  Backprop was also more effective in using
distributed coding schemes for the inputs and outputs.
An error-correcting code, or even a random code, works
better than a local code or hand-crafted features.
(Ghulum Bakiri and Tom Dietterich reached the same conclusion).

The issue of the code developed by the hidden units is also an
interesting issue.  In NETtalk, the intermediate code was
semidistributed -- around 15% of the hidden units were
used to represent each letter-to-sound correspondence.
The vowels and the consonants were fairly well
segregated, arguing for local coding at a gross population level
(something seen in the brain) but distributed coding at the
level of single units (also observed in the brain).
The degree of coarseness clearly depends on
the grain of the problem.

In the original study Charlie Rosenberg and I showed that
backprop with hidden units outperformed perceptorons,
and hence 26 independent linear discriminants.  The NETtalk
database is available to anyone who wants to benchmark
their learning algorithm.  For ftp access contact

Scott.Fahlman at b.gp.cs.cmu.edu

Terry


From french at cogsci.indiana.edu  Sat Jun  8 00:39:11 1991
From: french at cogsci.indiana.edu (Bob French)
Date: Fri, 7 Jun 91 23:39:11 EST
Subject: semi-distributed representations
Message-ID: <mailman.415.1149540223.24850.connectionists@cs.cmu.edu>

One simultaneous advantage and disadvantage of fully 
distributed representations is that one representation 
will affect many others.  This phenomenon of interference
is what allows networks to generalize but it is also what leads
to the problem of catastrophic forgetting.

It is reasonable to suppose that the amount of interference 
in backpropagation networks is directly proportional to the amount
of overlap of representations in the hidden layer (the "overlap"
of two representations can be defined as the dot product of their
activation vectors).  The greater the overlap (i.e., the more
distributed the representations), the more the network will be
affected by catastrophic forgetting, but the better it will be at
generalizing.  The less the overlap (i.e., the more local
the representations), the less the network will be affected by
catastrophic forgetting, but the worse it will be at
generalizing.  

If we want nets that do not need to be retrained completely
when new data is presented to them but still retain their
ability to generalize, we must therefore use representations
that are neither too local, nor too distributed, what I have
called "semi-distributed" representations.  

I have a paper to appear in CogSci Proceedings 1991 that proposes 
this relationship between the amount of overlap of representations 
in the hidden layer and catastrophic forgetting and generalization.
The paper outlines one simple method that allows a BP network to 
evolve its own semi-distributed representations as it learns.

               - Bob French
               Center for Research on Concepts and Cognition
               Indiana University
	       

From dcp+ at cs.cmu.edu  Sun Jun  9 09:30:32 1991
From: dcp+ at cs.cmu.edu (David Plaut)
Date: Sun, 09 Jun 91 09:30:32 EDT
Subject: distributed representations 
In-Reply-To: Your message of Sat, 08 Jun 91 10:51:22 -0400.
             <9106072351.AA01618@macuni.mqcc.mq.oz.au> 
Message-ID: <2428.676474232@DWEEB.BOLTZ.CS.CMU.EDU>

>But where's the evidence that brain damage degrades cognition gracefully? That
>is, the person just gets a little bit worse at a lot of things? Very commonly,
>exactly the opposite happens - the person remains normal at almost all kinds
>of cognitive processing, but some specific cognitive task suffers catastroph-
>ically. No graceful degradation here.

I think the issue here is a matter of scale.  "Graceful degredation" refers to
the gradual loss of function with increasing severity of damage - it says
nothing about how specific or general that function is.  Connectionist models
can be modular at a global scale, but use distributed representations and show
graceful degredation *within* modules.  I think you would agree that, within a
particular domain, this is a reasonable characterization of the behavior of
many types of patient (to the degree that we understand the modular
organization of certain aspects of cognition and the nature of individual
patients' damage).  Of course, severe damage to a module might still produce
catestrophic loss of its function, perhaps leaving the remaining functions
relatively intact.

On the other hand, the *degree* of specificity of impairment certainly places
constraints on the modular organization and the nature of the representations
within each module (although I think connectionist modeling illustrates the
danger of the "specific impairment implies separate module" logic).  Only
specific modeling work can demonstrate whether connectionist architectures and
representations can account for the behavior of specific patients in an
informative way.

-Dave
-------------------------------------------------------------------------------
David Plaut				dcp+ at cs.cmu.edu
School of Computer Science		412/268-8102
Carnegie Mellon University
Pittsburgh, PA  15213-3890

From gasser at bend.UCSD.EDU  Sun Jun  9 00:58:26 1991
From: gasser at bend.UCSD.EDU (Michael Gasser)
Date: Sat, 8 Jun 91 21:58:26 PDT
Subject: Distributed representations and graceful degradation
Message-ID: <9106090458.AA04907@bend.UCSD.EDU>

Max Coltheart discusses how damage to real neural networks often
results in more of a clumsy than a graceful sort of degradation.

But isn't degradation under conditions of increasing task complexity
a different matter?  I'm thinking of the processing of increased levels of
embedding or (possibly also) numbers of arguments in natural language.
Fixed-length distributed representations of syntactic or semantic
structure (e.g., RAAM, Elman nets) seem to model this behavior quite
well, in comparison to the usual symbolic approach (you're no more
likely to fail at 28 levels of embedding than at 2) and to localist
connectionist approaches (you can handle sentences with 3 arguments,
but 4 are out because you run of units).

Mike Gasser


From siegelma at yoko.rutgers.edu  Sun Jun  9 10:56:40 1991
From: siegelma at yoko.rutgers.edu (siegelma@yoko.rutgers.edu)
Date: Sun, 9 Jun 91 10:56:40 EDT
Subject: TR available from neuroprose; Turing equivalence
Message-ID: <9106091456.AA12844@yoko.rutgers.edu>

The following report is now available from the neuroprose archive:

           NEURAL NETS ARE UNIVERSAL COMPUTING DEVICES
             H. T. Siegelmann and E.D. Sontag.  (13pp.)

Abstract: It is folk knowledge that neural nets should be capable of
simulating arbitrary computing devices.  Past formalizations of this fact have
been proved under the hypotheses that there are potentially infinitely many
neurons available during a computation and/or that interconnections are
multiplicative.  In this work, we show the existence of a finite network, made
up of sigmoidal neurons, which simulates a universal Turing machine.  It is
composed of less than 100,000 synchronously evolving processors, interconnected
linearly.

-Hava

-----------------------------------------------------------------------------

To obtain copies of the postscript file, please use Jordan Pollack's service:

Example:
unix> ftp cheops.cis.ohio-state.edu          # (or ftp 128.146.8.62)
Name (cheops.cis.ohio-state.edu:): anonymous
Password (cheops.cis.ohio-state.edu:anonymous): <ret>
ftp> cd pub/neuroprose
ftp> binary
ftp> get
(remote-file) siegelman.turing.ps.Z
(local-file) siegelman.turing.ps.Z
ftp> quit
unix> uncompress siegelman.turing.ps.Z
unix> lpr -P(your_local_postscript_printer) siegelman.turing.ps

----------------------------------------------------------------------------
If you have any difficulties with the above, please send e-mail to
siegelma at paul.rutgers.edu.   DO NOT "reply" to this message, please.

From jagota at cs.Buffalo.EDU  Sun Jun  9 16:52:33 1991
From: jagota at cs.Buffalo.EDU (Arun Jagota)
Date: Sun, 9 Jun 91 16:52:33 EDT
Subject: Information Capacity and Local vs Distributed
Message-ID: <9106092052.AA04177@sybil.cs.Buffalo.EDU>

Dear Connectionists,
I think Information Capacity* (IC) (Abu-Mostafa, Jacques 85) is a useful
quantitative criterion for L vs D, illustrated by the following trivial 
example. 

You are given k pebbles, to be placed in k-of-n locations. 
location has pebble => `1', otherwise `0'.
IC == # distinct vectors that can be stored = C(n,k)    (n choose k)

For this e.g, its nice that the Binomial distribution quantifies IC for L vs D. 
The IC of k ~ n/2 (distributed) is by far superior. 

k = 1     ==> Local,  IC = n
k is n/2  ==> distributed, IC = C(n,n/2) is maximum
k = n-1   ==> over-distributed, IC = n

With (threshold-element) connectionist nets, the analogy holds, but the
(hidden or output layer) units [locations] are not independent. I would 
think there is scope for theory and empirical work along these lines. I 
have seen IC work on symmetric nets but even here I am unaware of work on IC 
as a function of k. I am unaware (haven't looked) of any work on FF nets.

* - IC is actually defined as log of how I have shown 
Sincerely,
Arun Jagota
jagota at cs.buffalo.edu

From peterc at chaos.cs.brandeis.edu  Mon Jun 10 00:06:49 1991
From: peterc at chaos.cs.brandeis.edu (Peter Cariani)
Date: Mon, 10 Jun 91 00:06:49 edt
Subject: (the late) Rook McCulloch
Message-ID: <9106100406.AA29926@chaos.cs.brandeis.edu>

   Rook McCulloch also edited a 4 volume set of Warren McCulloch's works,
"The Collected Works of Warren S. McCulloch", published by Intersystems Press 
in 1989 (401 Victor Way #3, Salinas, CA 93907 USA; $84 for 4 volumes, paper). 
In addition to her forward and Warren McCulloch's papers, the set also 
contains some very nice essays by Jerry Lettvin, Michael Arbib, 
F.S.C. Northrop, Heinz von Foerster, D.M. MacKay (and others). For those of
us who never knew the McCullochs, this seems to be the best available 
source of information about what they thought and felt.

Also of relevance is the book of Steve Heims on the Macy conferences and
the origins of cybernetics ("The Cybernetics Group", MIT Press, 1991) in
which Warren McCulloch's role is amply discussed.

From bates at crl.ucsd.edu  Mon Jun 10 12:25:10 1991
From: bates at crl.ucsd.edu (Elizabeth Bates)
Date: Mon, 10 Jun 91 09:25:10 PDT
Subject: response to max coltheart
Message-ID: <9106101625.AA25405@crl.ucsd.edu>


From Mailer-Daemon  Sun Jun  9 19:23:49 1991
From: Mailer-Daemon (Mail Delivery Subsystem)
Date: Sun, 9 Jun 91 16:23:49 PDT
Subject: Returned mail: User unknown
Message-ID: <9106092323.AA19415@crl.ucsd.edu>

   ----- Transcript of session follows -----
Connected to macuni.mqcc.mq.oz.au:
>>> RCPT To:<connectnet at macuni.mqcc.mq.oz.au>
<<< 550 <connectnet at macuni.mqcc.mq.oz.au>... User unknown: Connection refused
550 connectnet at macuni.mqcc.mq.oz.au... User unknown

   ----- Unsent message follows -----

From bates  Sun Jun  9 19:23:49 1991
From: bates (Elizabeth Bates)
Date: Sun, 9 Jun 91 16:23:49 PDT
Subject: distributed representations
Message-ID: <9106092323.AA19411@crl.ucsd.edu>

I respectfully disagree with Max Coltheart that brain damage usually or even
often yields discrete and domain-specific performance decrements.  to be
sure, such cases have been reported -- and indeed, their "news value" often
lies in the surprisingly discrete nature of the patient's profile.  but
such case studies typically fail to recognize issues like the peaks and
valleys that might have been there premorbidly, i.e. in the "man that used
to be".  also, we often fail to recognize that by choosing those patients
with "interesting" profiles against an unspecified number of background
patients with "uninteresting" profiles, we are capitalizing on chance
distributions across a number of noisy domains.  given 1000 patients
who are normally distributed across 100 tasks, I have a pretty solid
chance of finding a good number of striking "double dissociations" and
even more "single dissociations" entirely by chance.  For a simulation
that makes EXACTLY that point (coupled with a detailed critique of a
"real" study of 20 patients that make this very error), see Bates,
Appelbaum and Allard, "Statistical constraints on the use of single
case studies in neuropsychological research", in the last issue of
Brain and Language. -liz bates

From bates at crl.ucsd.edu  Mon Jun 10 12:29:33 1991
From: bates at crl.ucsd.edu (Elizabeth Bates)
Date: Mon, 10 Jun 91 09:29:33 PDT
Subject: Distributed representations and graceful degradation
Message-ID: <9106101629.AA25488@crl.ucsd.edu>

Marcel Just and Patricia Carpenter have a paper coming out in Psychological
Review that shows (reviewing quite a range of studies) how the ability
of normal adults to handle (read, comprehend) various levels of grammatical
complexity and ambiguity interacts with (1) that adult's working memory
span, and (2) the effects of a cognitive load imposed by a secondary
task.  The notion of graceful degradation seems to apply to their work
very well.  You can obtain a preprint of their paper by contacting them
at CMU (Psychology Department). -liz bates

From cabestan at eel.upc.es  Mon Jun 10 10:05:46 1991
From: cabestan at eel.upc.es (JOAN CABESTANY)
Date: Mon, 10 Jun 1991 14:05:46 +0000
Subject: Call for Papers IWANN'91
Message-ID: <"155*/S=cabestan/OU=eel/O=upc/PRMD=iris/ADMD= /C=es/"@MHS>


Dear Colleagues,

	Please find here the second Call for Papers for IWANN'91.
Remember that the absolute limit date for work presentation is
June 20 th.

	IWANN'91 will be held in GRANADA next September.

******************************************************************
******************************************************************


                         INTERNATIONAL WORKSHOP
                                   ON
                       ARTIFICIAL NEURAL NETWORKS


                                IWANN'91


                           Second Announcement


                             Granada, Spain
                          September 17-19, 1991


                       ORGANISED AND SPONSORED BY

            Spanish Chapter of the Computer Society of IEEE,

               AEIA (IEEE Computer Society Affiliate), and

            Department of Electronic and Computer Technology.
                      University of Granada. Spain.

SCOPE

Artificial Neural Networks (ANN) were first developed as structural or
functional modelling systems of natural ones, featuring the ability to
perform problem-solving tasks. They can be thought as computing arrays
consisting of series of repetitive uniform processors (neuron-like elements)
placed on a grid. Learning is achieved by changing the interconnections
between these processing elements. Hence, these systems are also called
connectionist models.
    ANN has become a subject of wide-spread interest: they offer an odd
scheme-based programming standpoint and exhibit higher computing speeds
than conventional von-Neumann architectures, thus easing or even enabling
handling complex task such as artificial vision, speech recognition,
information recovery in noisy environments or general pattern recognition.
    In ANN systems, collective information management is achieved by
means of parallel operation of neuron-like elements, into which information
processing is distributed. It is intended to exploit this highly parallel
processing capability as far as possible in complex problem-solving tasks.
    Cross-fertilization between the domains of artificial and real neural nets
is desirable. The more genuine problems of biological computation and
information processing in the nervous system still remain open and
contributions in this line are more than welcome. Methodology, theoretical
frames, structural and organizational principles in neuroscience, self-
organizing and co-operative processes and knowledge based descriptions of
neural tissue are relevant topics to bridge the gap between the artificial and
natural perspectives.
    The workshop intends to serve as a meeting place for engineers and
scientists working in this area, so that present contacts and relationships can
be further increased. The workshop will comprise two complementary
activities:
            .    scientific and technical conferences, and
            .    scientific communications sessions.


TOPICS

The workshop is open to all aspects of artificial neural networks, including:

 1. Neural network theories. Neural models.
 2. Biological perspectives
 3. Neural network architectures and algorithms.
 4. Software developments and tools.
 5. Hardware implementations
 6. Applications.


LOCATION

Facultad de Ciencias
Campus Universitario de Fuentenueva
Universidad de Granada
18071 GRANADA. (SPAIN)

LANGUAGES

English and Spanish will be the official working languages. English is
preferable as the working language. Simultaneous translation will be
available. Simultaneous translation will be available.


CALL FOR PAPERS

The Programme Committee seeks original papers on the six above
mentioned areas. Survey papers on the various available approaches or
particular application domains are also sought.
    In their submitted papers, authors should pay particular attention to
explaining the theoretical and technical choices involved, to make clear the
limitations encountered and to describe the current state of development of
their work.


INSTRUCTIONS TO AUTHORS

Three copies of submitted papers (not exceeding 8 pages in 21x29.7 cms
(DIN-A4), with 1,6 cm. left, right, top and bottom margins) should be
received by the Programme Chairman at the address below before June 20,
1991.

    The headlines should be centred and include:
    .   the title of paper in capitals
    .   the name(s) of author(s)
    .   the address(es) of author(s), and
    .   a 10 line abstract.

    Three blank lines should be left between each of the above items, and
four between the headlines and the body of the paper, written in English,
single-spaced and not exceeding the 8 pages limit.
    All papers received will be refereed by the Programme Committee. The
Committee will communicate their decision to the authors on July 10.
Accepted papers will be published in the proceedings to be distributed to
workshop participants.

    In addition to the paper, one sheet should be attached including the
following information:

    .   the title of the paper,
    .   the name(s) of author(s),
    .   a list of five keywords,
    .   a reference to which of the six topics the paper concerns, and
    .   postal address of one of the authors, with phone and fax numbers,
        and E-mail (if available).
    .   presentation language

    We intend to get in touch with various international publishers (such
as Springer-Verlag and Prentice-Hall) for the final version of the
proceedings.
PROGRAM AND ORGANIZATION COMMITTEE

Organization Chairman:  Alberto Prieto (Unv. Granada. Spain)

Programme Chairman:  Jos Mira (UNED. Madrid. Spain)

Senen Barro              Unv. de Santiago (E)
Francois Blayo           Ecole Polytechnique Federale de Lausanne (S)
Joan Cabestany           Unv. Pltca. de Catalunya (E)
Marie Cottrell           Unv. Paris I (F)
Jose Antonio Corrales    Unv. Oviedo. (E)
Gerard Dreyfus           ESPCI Paris (F)
Gregorio Fernandez       Unv. Pltca. de Madrid (E)
J. Simoes da Fonseca     Unv. de Lisboa (P)
Karl Goser               Unv. Dortmund (G)
Jeanny Herault           INPG Grenoble (F)
Jose Luis Huertas        CNM- Universidad de Sevilla (E)
Simon Jones              Unv. Nottingham (UK)
Chistian Jutten          INPG Grenoble (F)
Antonio Lloris           Unv. Granada (E)
Panos A. Ligomenides     Unv. of Maryland (USA)
Javier Lopez Aligue      Unv. de Extremadura. (E)
Federico Moran           Unv. Complutense. Madrid (E)
Roberto Moreno           Unv. Las Palmas Gran Canaria (E)
Franz Pichler            Johannes Kepler Univ. (Aus)
Ulrich Rueckert          Unv. Dortmund (G)
Francisco Sandoval       Unv. de Malaga (E)
Carmen Torras            Instituto de Ciberntica. CSIC. Barcelona (E)
V. Tryba                 Unv. Dortmund (G)
Elena Valderrama         CNM- Unv. Autonoma de Barcelona (E)
Michel Weinfeld          Ecole Polytechnique Paris (F)


LOCAL ORGANIZING COMMITTEE  (Universidad de Granada)
Juan Julian Merelo
Julio Ortega
Francisco J. Pelayo
Begona del Pino
Alberto Prieto   


ORGANIZING ENTITIES:

Spanish Chapter of the Computer Society of IEEE,
AEIA (IEEE Computer Society Affiliate), and
Department of Electronic and Computer Technology. University of Granada.
Spain.

SPONSORING ENTITIES:

Ayuntamiento de Granada (Dto. de Congresos)
Caja General
Universidad de Granada


                        SOME USEFUL INFORMATION

    Granada is a beautiful city that lies to the south of Spain, in which the
mixture between Christian and Muslim culture reaches its architectural peak.
The Alhambra is the most magnificent European Muslim fortress and palace
conserved to-date, and Granada nights are known in all Spain for their
liveliness, due to the high proportion of students.
    The river Genil gives rise to the Vega or Valley of Granada, where the
soil is fertile and bears the most varied crops. It has small farms and
beautiful villages, some as interesting as Santa Fe, where the voyage for the
discovery of America was negotiated. From Granada it takes only one hour
to get to the southernmost ski resort in Europe, Sierra Nevada, where
Winter sports can be enjoyed. A wide road leads right up to the Veleta
Peak, so that in Summer it can be reached by car. This road, at 3,428 m.
above sea level, is the highest in Europe. 65 Km. from the city of Granada
is Granada's Costa del Sol (so called Costa Tropical or Tropical Coast).
    The University of Granada is the third most important in Spain. It has
40,000 students, which makes up one sixth of the whole population. This
is what gives the city a youthful and dynamic atmosphere, stimulating a
"living culture".
    The weather during mid-September in Granada is warm, and
temperatures of 30 degrees Centigrade are not unusual. Temperatures can
lower during the night, so a pullover is advised. During the day, t-shirts or
light shirts and trousers are the most suitable clothes.


PRE AND POST WORKSHOP TOURS:

A-EXCURSION: September 16: Trip to Alpujarra, typical mountain
villages. Time: 9.00-20h. Price: 3500 ptas./per person (Includes Bus and
lunch).
B-EXCURSION: September 20: Trip to Costa del Sol, including Nerja
with its wonderful caves and the seaside resorts of Almunecar and
Salobre$a. Time: 9.00-20h. Price: 2000 ptas. (Includes Bus)


SOCIAL ACTIVITIES:

September 16: Pre Workshop tour (A-Excursion)

September 17:
20:00   Reception at the Hospital Real (16th Century University Central
        Services Building).
22:00   Night visit to the Alhambra.

September 18:
20:00   Reception at the "Palacio de los Cordova" (Albaic!n), given by the
        Granada City Hall (Congress Dept.).

September 19:
21:00   Official dinner

September 20: Post Workshop tour (B-Excursion)


                         PROVISIONAL SCHEDULE

September 17:

9:15 Opening session.
10:00-11:30 Lecture 1:Natural and Artificial Neural Nets; Prof. Dr. Roberto
            MORENO (Universidad de las Palmas de Gran Canaria)
11:30-12:00 Coffee-break.
12:00-13:30 Session 1.

16:00-17:30 Session 2.
17:30-18:00 Coffee-break.
18:00-19:30 Session 3


September 18:

09:30-11:00 Lecture 2:  Application and Implementation of Neural
            Networks in Microelectronics; Prof. Dr. Ing. Karl GOSER
            (Universitt Dortmund)
11:00-11:30 Coffee-break.
11:30-13:30 Session 4.

16:00-17:30 Session 5.
17:30-18:00 Coffee-break.
18:00-19:30 Session 6.


September 19:

09:00-11:00 Lecture 3: Cooperative Computing and Neural Networks; Prof.
            Panos A. LIGOMENIDES (University of Maryland)
11:00-11:30 Coffee-break.
11:30-13:30 Session 7.

16:00-17:30 Session 8.
17:30-18:00 Coffee-break.
18:00-19:30 Session 9.


This form should be sent before July 25 to:
            Viajes Internacional Expreso (V.I.E.); Galerias Preciados; 
		Carrera del Genil, s/n. 18005 GRANADA (Spain)
Tnos. (34) 58-22.44.95, (34) 58-22.75.86, (34) 58-224944; Telex: 78525


  The following hotels are available with special fees for the Workshop 
participants. The prices are per night and they include V.A.T.
and continental breakfast:


Hotel            Cat.      Single room       Double room
______________________________________________________________
Condor         ***           7700             10070 pts.
Eurobecquer    **            4630              5820


Tour A ........... 3.500 pts.
Tour B ........... 2.000 pts


Please tick the appropriate box. Reservations can be guaranteed before July
25th.  A list of other hotels is enclosed (Please address directly to them).
Payment should be made in Spanish currency.
                                           
I enclose a bank cheque payable to: V.I.E.


             INTERNATIONAL WORKSHOP ON ARTIFICIAL NEURAL NETWORKS
                                   (IWANN'91)
                         Granada, Spain, September 17-19, 1991
                                HOTEL BOOKING FORM


SURNAME ______________________________________   

FIRST NAME _______________________________

ORGANIZATION _______________________________________________________________

ADDRESS  ____________________________________________________________________  

CITY _____________________ POST CODE __________  COUNTRY______________________

TELEPHONE __________________ FAX _________________________

E-MAIL:_______________________

Accompanying person(s) ________________________________________________________

I want to reserve: _______ double room(s); ___________ single room(s)

Arrival date:__________ Time: __________ 

Departure date:_________ Time:_________ 


         INTERNATIONAL WORKSHOP ON ARTIFICIAL NEURAL NETWORKS
                    (IWANN'91)
          Granada, Spain, September 17-19, 1991
                  

 REGISTRATION FORM


SURNAME ______________________________________  

FIRST NAME _______________________________

ORGANIZATION  ________________________________________________________________

ADDRESS  ____________________________________________________________________  

CITY _____________________ POST CODE __________  COUNTRY ______________________

TELEPHONE __________________ FAX _________________________ 

E-MAIL:   _______________________

Fill in the appropriate box:


Fee                     Before June 25th             After June 25th
___________________________________________________________________
Regular                  33.000                       35.000
IEEE,AEIA,ATI members    28.000                       30.000
Scholarship               4.000                        5.000


This form should be sent as soon as possible to:

              Departamento de Electronica y Tecnologia de Computadores
              Facultad de Ciencias
              Universidad de Granada
              18071 GRANADA (SPAIN)

In order to avoid delays, please fax the registration form, 
together with a copy of the cheque or the bank transfer to:
FAX: 34-58-24.32.30 or 34-58-27.42.58

INSCRIPTION PAYMENTS:

Cheque payable to:  IWANN'91 (16.142.512)

or alternatively transfer to:

IWANN'91                                       IWANN'91
account number: 16.142.512                     account number: 007.01-450888
Caja Postal (Code: 2088-2037.1)   or to        Caja General     
Camino de Ronda, 138                           Camino de Ronda, 156
18003 GRANADA (SPAIN)                          18003 GRANADA (SPAIN)

************************************************************************


From ashley at spectrum.cs.unsw.oz.au  Tue Jun 11 00:11:13 1991
From: ashley at spectrum.cs.unsw.oz.au (Ashley Aitken)
Date: Tue, 11 Jun 91 0:11:13 AES
Subject: distributed representations
In-Reply-To: <9106072351.AA01618@macuni.mqcc.mq.oz.au>; from "Max Coltheart" at Jun 8, 91 9:51 am
Message-ID: <9106101413.10651@munnari.oz.au>

G'day,

In the discussion of "Distributed Representations", Max Coltheart writes:
>
> But for nets that are meant to be
> models of cognition, the hidden assumption seems to be that after brain damage
> there is graceful degradation of cognitive processing, so the fact that nets
> show graceful degradation too means they have promise for modelling cognition.
> 
> But where's the evidence that brain damage degrades cognition gracefully? That
> is, the person just gets a little bit worse at a lot of things? Very commonly,
> exactly the opposite happens - the person remains normal at almost all kinds
> of cognitive processing, but some specific cognitive task suffers catastroph-
> ically. No graceful degradation here.

I would suggest that Max is possibly confusing diffuse brain damage with 
catastrophic brain damage. 

Diffuse brain damage is the elimination of a small percentage of neurons 
diffusely from throughout the brain. Examples are the natural death of 
neurons throughout the brain and, perhaps, micro-lesions.

The continual death of an immense number of neurons in the brain, thankfully
only really amounts to the death of a very small percentage of the neurons 
in the brain. In any of the partitioned networks of the brain (say an area 
of the cortex) we would expect only a small number of neurons to die.

If one considers that a neuron may receive in the order of thousands of 
synapses on it's dendritic tree, it can be understood, I believe, how the 
network (thought of as a connectionist network) could continue to function 
if one or two of these were to be eliminated.

I would suggest that this continual death of neurons in the brain with the 
subtle, and often unnoticed, degradation in cognitive performance to be an 
example of (diffuse) brain damage degrading cognition gracefully. Hence, I
believe this type of degradation does show neural networks have promise 
for modelling cogntion.

Of course, this does depend on the the degradation seen in cognition being 
shown to be qualitatively the same as degradation seen in artificial neural 
networks.

Catastrophic brain damage, on the other hand, is the gross elimination of 
neurons (usually relatively localized) from the brain. Examples are lesions
resulting from head injuries or strokes, and ablation.

It would seem that in this case one is most likely seeing the complete (or 
nearly complete) elimination of an entire network (or a critical part of
it) and hence the elimination of it's associated and dependent function(s).
I don't believe anyone would suggest that the brain's function would degrade 
gracefully under such terrorist action.

Max continues:
> 
> I could give very many examples: I'll just give one (Semanza & Zettin, 
> Cognitive Neuropsychology, 1988 5 711). This patient, after his stroke, had
> impaired language, but this impairment was confined to language production
> (comprehension was fine) and to the production of just one type of word: proper
> nouns. He could understand proper nouns normally, but could produce almost none
> whilst his production of other kinds of nouns was normal. What's graceful about
> this degradation of cognition? 

I am definitely no expert neuroscientist but I would suggest that this is an 
example of catastrophic brain damage not diffuse brain damage. Hence, I would 
not expect graceful degradation of cognitive performance. It seems to me that 
this would be too much to ask of all but the most completely holographic-like 
systems.

The interesting point to be made from this example would then be that it 
appears to be evidence for a cortical region involved (directly or in-line) 
with the speech of only nouns. Amazing!

It would also be interesting to test if there is any subtle difference in our
*understanding* of a noun depending upon whether we are receiving (ie hearing
or seeing it) with when we are producing (ie speaking or imagining) it.

If this diagnosis of catastrophic brain damage is correct then I believe this 
example is mute upon whether or not the brain is functionally a Connectionist 
System. Still, the Connectionist System, in my opinion, gets the points for 
the diffuse brain damage. 

Hence Max's concluding suggestion, 

> If cognition does *not* degrade gracefully, and neural nets do, what does this
> say about neural nets as models of cognition?

becomes rather misplaced because cognition does appear to degrade gracefully 
under diffuse brain damage and catastrophically under catastrophic brain
damage. The former providing possible evidence for neural networks as 
models of cognition.


Ashley
ashley at spectrum.cs.unsw.oz.au

From kbj at jupiter.risc.rockwell.com  Mon Jun 10 13:57:41 1991
From: kbj at jupiter.risc.rockwell.com (Ken Johnson)
Date: Mon, 10 Jun 91 10:57:41 PDT
Subject: No subject
Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com>


In response to the debate on Distributed vs. Local Representations.....

Everyone in this field has a view point colored by their academic background.  So here is mine.

The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'..  The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties.  In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation.  In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'.

An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain.  In this case we have Ashby's Law of Requisite Variety.  I can't find my copy of the reference, but its by John Porter circa 1983-1987.  In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel.  Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description.

In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely.  References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington.  What we found was an important dichotomy.  Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes.  Without this characteristic pattern classification would not group very similar patterns together.  On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately.  Hence, we see proper code organization required similar codes be close while different codes needed to be far apart.  One should expect this property if the goal of the system is representationaly richness rat

The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly.  Correct utilization of neural representation bandwidth is something we don't use very well.  In fact, I'll state that we don't use it at all.  The notion of bandwidth immediately suggests time as a representational dimension we don't use.  Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted.  Thus, the code is again static.  Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and  increasing the processing capabilities of neural systems?

Ken Johnson
kbj at risc.rockwell.com

From ahmad at ICSI.Berkeley.EDU  Mon Jun 10 18:58:12 1991
From: ahmad at ICSI.Berkeley.EDU (Subutai Ahmad)
Date: Mon, 10 Jun 91 15:58:12 PDT
Subject: Preprint
Message-ID: <9106102258.AA16050@icsib18.Berkeley.EDU>


The following paper (to appear in this Cognitive Science proceedings)
is available from the neuroprose archives as ahmad.cogsci91.ps.Z
(ftp instructions below).


	  Efficient Visual Search: A Connectionist Solution

				by

		  Subutai Ahmad & Stephen Omohundro
	       International Computer Science Institute

			       Abstract

Searching for objects in scenes is a natural task for people and has
been extensively studied by psychologists. In this paper we examine
this task from a connectionist perspective. Computational complexity
arguments suggest that parallel feed-forward networks cannot perform
this task efficiently. One difficulty is that, in order to distinguish
the target from distractors, a combination of features must be
associated with a single object. Often called the binding problem,
this requirement presents a serious hurdle for connectionist models of
visual processing when multiple objects are present.  Psychophysical
experiments suggest that people use covert visual attention to get
around this problem. In this paper we describe a psychologically
plausible system which uses a focus of attention mechanism to locate
target objects. A strategy that combines top-down and bottom-up
information is used to minimize search time. The behavior of the
resulting system matches the reaction time behavior of people in
several interesting tasks.


A postscript version of the paper can be obtained by ftp from
cheops.cis.ohio-state.edu. The file is ahmad.cogsci91.ps.Z in the
pub/neuroprose directory. You can either use the Getps script or
follow these steps:

unix:2> ftp cheops.cis.ohio-state.edu
Connected to cheops.cis.ohio-state.edu.
Name (cheops.cis.ohio-state.edu:): anonymous
331 Guest login ok, send ident as password.
Password: neuron
230 Guest login ok, access restrictions apply.
ftp> cd pub/neuroprose
ftp> binary
ftp> get ahmad.cogsci91.ps.Z
ftp> quit
unix:4> uncompress ahmad.cogsci91.ps.Z 
unix:5> lpr ahmad.cogsci91.ps


--Subutai
ahmad at icsi.berkeley.edu

From crr at shum.huji.ac.il  Mon Jun 10 15:12:11 1991
From: crr at shum.huji.ac.il (crr@shum.huji.ac.il)
Date: Mon, 10 Jun 91 22:12:11 +0300
Subject: distributed vs. local encoding schemes 
Message-ID: <9106101912.AA28249@shum.huji.ac.il>

Terry Sejnowski mentioned the kinds of hidden units that we found in NETtalk.
As for the input/output representations, we ran a number of 
experiments using both local (one unit per letter/phoneme, but
more than one unit on per window) and distributed 
representations (more than one unit on per letter/phoneme).
Learning times are generally faster with distributed representations simply 
because the net inputs and resulting error gradients are larger.
(However it might be possible to boost the learning rate for the
local representation to match the distributed one.  I don't know if
this would affect generalization or not since I didn't try it.)
Using a representation that "makes sense" for the particular domain
(such as using an articulatory feature code for the phonemes -- or is this 
local because the units represent features?)
also leds to faster learning, and is more resistant to damage than
a "random" encoding of the phonemes.

Charlie Rosenberg

From CADEPS at BBRNSF11.BITNET  Tue Jun 11 08:56:05 1991
From: CADEPS at BBRNSF11.BITNET (JANSSEN Jacques)
Date: Tue, 11 Jun 91 14:56:05 +0200
Subject: No subject
Message-ID: <5901C8A706400066@BITNET.CC.CMU.EDU>

                   STEERABLE GenNets - A Query.
 
Abstract :
            One can evolve a GenNet (a neural net evolved with the genetic
algorithm) to display two separate behaviors depending upon the setting of
a clamped input control variable. By using an intermediate control value
one obtains an intermediate behavior. For example, let the behaviors be
sinusoidal oscillations of periods T1 and T2, where the control settings are
0.5 and -0.5 By using a control value of 0.3, one will get a sinusoid with
a period between T1 and T2. Why? Has anyone out there had any similar
experiences (i.e. of this sort of generalised behavioral learning), and has
anybody any idea why GenNets are capable of such a phenomenon? If I receive
some interesting replies, I'll prepare a summary and report back.
 
 
Further details.
 
                  One of the great advantages of GenNets (= using the GA to
teach your neural nets their behaviors) over traditional NN paradigms such as
backprop, Hopfield, etc is that the GA treats your NN as a black box, and it
doesnt matter how complex the internal dynamics of the NN are. All that counts
is the result. How well did the NN perform? If it did well, the bitstring which
codes for the NN's weights will survive. This allows the creation of GenNets
which can cope with both inputs and outputs which vary constantly. One does not
need stationary output values a la Hopfield etc. Hence NNs become much more
"dynamic", compared to the more "static" nature of traditional paradigms. One
can thus evolve dynamics (behaviors) on NNs (GenNets). This opens up a new
world of NN possibilities. If one can evolve a GenNet to express one behavior,
why not two? If two, can one evolve a continuum of behaviors depending
upon the setting of a controlled input value? The variable frequency generator
GenNet mentioned above shows that this is possible. But I'm damned if I know
why? Whats going on? Have any of you had similar experiences? Any clues for a
theoretical explanation for this extraordinary phenomenon?
 
P.S. To evolve this GenNet, use a fully connected net, with all external
inputs set at zero, except for two inputs. Clamp one at 0.5, and the other
at 0.5 (and then -0.5 in the second "experiment"). The fitness is the inverse
of the sum of the two sums (for the two expts) of the squares of the difference
between the desired output at each clock cycle and the actual output. Assign
one neuron to be the output neuron.
 
Cheers,
 
Hugo de Garis,
University of Brussels, Belgium,
George Mason University, VA, USA.

From thomasp at gshalle1.informatik.tu-muenchen.de  Tue Jun 11 11:50:25 1991
From: thomasp at gshalle1.informatik.tu-muenchen.de (Thomas)
Date: Tue, 11 Jun 1991 17:50:25 +0200
Subject: Research Position in SPAIN ?
Message-ID: <9106111550.AA08800@gshalle1.informatik.tu-muenchen.de>


I'm a graduate student in computer science at Munich Technical 
University and plan to work in a research position related to
neural networks in SPAIN. 

I would extremely appreciate if you could provide me some information
on university/private/company research institutes active
or interested in the field of neural network research and located
in the Madrid or Seville area.

Preferably, I would like to start working in Spain in November 91 or,
alternatively, in January/February 1992.

Sincerely,

Patrick Thomas
Institute for Medical Psychology
Goethestr. 31
8000 Munich 2

From kbj at jupiter.risc.rockwell.com  Mon Jun 10 13:57:41 1991
From: kbj at jupiter.risc.rockwell.com (Ken Johnson)
Date: Mon, 10 Jun 91 10:57:41 PDT
Subject: No subject
Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com>


In response to the debate on Distributed vs. Local Representations.....

Everyone in this field has a view point colored by their academic background.  So here is mine.

The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'..  The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties.  In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation.  In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'.

An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain.  In this case we have Ashby's Law of Requisite Variety.  I can't find my copy of the reference, but its by John Porter circa 1983-1987.  In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel.  Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description.

In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely.  References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington.  What we found was an important dichotomy.  Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes.  Without this characteristic pattern classification would not group very similar patterns together.  On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately.  Hence, we see proper code organization required similar codes be close while different codes needed to be far apart.  One should expect this property if the goal of the system !
is representationaly richness rat

The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly.  Correct utilization of neural representation bandwidth is something we don't use very well.  In fact, I'll state that we don't use it at all.  The notion of bandwidth immediately suggests time as a representational dimension we don't use.  Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted.  Thus, the code is again static.  Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and  increasing the processing capabilities of neural systems?

Ken Johnson
kbj at risc.rockwell.com

From moeller at kiti.informatik.uni-bonn.de  Thu Jun 13 03:50:34 1991
From: moeller at kiti.informatik.uni-bonn.de (Knut Moeller)
Date: Thu, 13 Jun 91 09:50:34 +0200
Subject: TR available from neuroprose; learning algorithms
Message-ID: <9106130750.AA01054@kiti.>

The following report is now available from the neuroprose archive:

	LEARNING BY ERROR-DRIVEN DECOMPOSITION
   D.Fox  V.Heinze  K.Moeller  S.Thrun  G.Veenker  (6pp.)

Abstract: In this paper we describe a new selforganizing decomposition
technique for learning high-dimensional mappings. Problem
decomposition is performed in an error-driven manner, such that the
resulting subtasks (patches) are equally well approximated.  Our
method combines an unsupervised learning scheme (Feature Maps
[Koh84]) with a nonlinear approximator (Backpropagation
[RHW86]). The resulting learning system is more stable and
effective in changing environments than plain backpropagation and much
more powerful than extended feature maps as proposed by
[RMW89]. Extensions of our method give rise to active
exploration strategies for autonomous agents facing unknown
environments.  
The appropriateness of this technique is demonstrated with an example
from mathematical function approximation.


-----------------------------------------------------------------------------

To obtain copies of the postscript file, please use Jordan Pollack's service:

Example:
unix> ftp cheops.cis.ohio-state.edu          # (or ftp 128.146.8.62)
Name (cheops.cis.ohio-state.edu:): anonymous
Password (cheops.cis.ohio-state.edu:anonymous): <ret>
ftp> cd pub/neuroprose
ftp> binary
ftp> get
(remote-file) fox.decomp.ps.Z
(local-file) fox.decomp.ps.Z
ftp> quit
unix> uncompress fox.decomp.ps.Z
unix> lpr -P((your_local_postscript_printer) fox.decomp.ps.Z

----------------------------------------------------------------------------
If you have any difficulties with the above, please send e-mail to
moeller at kiti.informatik.uni-bonn.de  

DO NOT "reply" to this message!!

From kbj at jupiter.risc.rockwell.com  Mon Jun 10 13:57:41 1991
From: kbj at jupiter.risc.rockwell.com (Ken Johnson)
Date: Mon, 10 Jun 91 10:57:41 PDT
Subject: No subject
Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com>


In response to the debate on Distributed vs. Local Representations.....

Everyone in this field has a view point colored by their academic background.  So here is mine.

The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'..  The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties.  In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation.  In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'.

An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain.  In this case we have Ashby's Law of Requisite Variety.  I can't find my copy of the reference, but its by John Porter circa 1983-1987.  In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel.  Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description.

In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely.  References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington.  What we found was an important dichotomy.  Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes.  Without this characteristic pattern classification would not group very similar patterns together.  On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately.  Hence, we see proper code organization required similar codes be close while different codes needed to be far apart.  One should expect this property if the goal of the system !
is representationaly richness rat

The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly.  Correct utilization of neural representation bandwidth is something we don't use very well.  In fact, I'll state that we don't use it at all.  The notion of bandwidth immediately suggests time as a representational dimension we don't use.  Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted.  Thus, the code is again static.  Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and  increasing the processing capabilities of neural systems?

Ken Johnson
kbj at risc.rockwell.com

From kbj at jupiter.risc.rockwell.com  Mon Jun 10 13:57:41 1991
From: kbj at jupiter.risc.rockwell.com (Ken Johnson)
Date: Mon, 10 Jun 91 10:57:41 PDT
Subject: No subject
Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com>


In response to the debate on Distributed vs. Local Representations.....

Everyone in this field has a view point colored by their academic background.  So here is mine.

The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'..  The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties.  In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation.  In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'.

An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain.  In this case we have Ashby's Law of Requisite Variety.  I can't find my copy of the reference, but its by John Porter circa 1983-1987.  In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel.  Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description.

In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely.  References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington.  What we found was an important dichotomy.  Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes.  Without this characteristic pattern classification would not group very similar patterns together.  On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately.  Hence, we see proper code organization required similar codes be close while different codes needed to be far apart.  One should expect this property if the goal of the system !
is representationaly richness rat

The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly.  Correct utilization of neural representation bandwidth is something we don't use very well.  In fact, I'll state that we don't use it at all.  The notion of bandwidth immediately suggests time as a representational dimension we don't use.  Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted.  Thus, the code is again static.  Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and  increasing the processing capabilities of neural systems?

Ken Johnson
kbj at risc.rockwell.com

From thomasp at gshalle1.informatik.tu-muenchen.de  Thu Jun 13 13:33:19 1991
From: thomasp at gshalle1.informatik.tu-muenchen.de (Thomas)
Date: Thu, 13 Jun 1991 19:33:19 +0200
Subject: Gracias & Sorry
Message-ID: <9106131733.AA19732@gshalle1.informatik.tu-muenchen.de>


Sorry for the "garbage" and muchas gracias to all those 
helping out with adresses and conference announcements.

Patrick

From utans-joachim at CS.YALE.EDU  Sat Jun 15 12:48:45 1991
From: utans-joachim at CS.YALE.EDU (Joachim Utans)
Date: Sat, 15 Jun 91 12:48:45 EDT
Subject: preprint available
Message-ID: <9106151648.AA01689@SUNNY.SYSTEMSX.CS.YALE.EDU>


The following preprint has been placed in the neuroprose archive
at Ohio State University:


       Selecting Neural Network Architectures via the Prediction Risk:
                Application to Corporate Bond Rating Prediction

     Joachim Utans                          John Moody
     Department of Electrical Engineering   Department of Computer Science
     Yale University                        Yale University
     New Haven, CT 06520                    New Haven, CT 06520


                               Abstract:

    Intuitively, the notion of generalization is closely related to the
    ability of an estimator to perform well with new observations. In
    this paper, we propose the prediction risk as a measure of the
    generalization ability of multi-layer perceptron networks and use it
    to select the optimal network architecture. The prediction risk needs
    to be estimated from the available data; here we approximate the
    prediction risk by v-fold cross-validation and asymtotic estimates of
    generalized cross-validation or Akaike's final prediction error. We
    apply the technique to the problem of predicting corporate bond
    ratings. This problem is very attractive as a case study, since it is
    characterized by the limited availability of the data and by the lack
    of complete a priori information that could be used to impose a
    structure to the network architecture.


To retrieve it by anonymous ftp:

unix> ftp cheops.cis.ohio-state.edu          # (or ftp 128.146.8.62)
Name (cheops.cis.ohio-state.edu:): anonymous
Password (cheops.cis.ohio-state.edu:anonymous): neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get utans.bondrating.ps.Z
ftp> quit
unix> uncompress utans.bondrating.ps
unix> lpr -P(your_local_postscript_printer) utans.bondrating.ps


	Joachim Utans


From h1201kam at ella.hu  Sun Jun 16 13:05:00 1991
From: h1201kam at ella.hu (Kampis Gyorgy)
Date: Sun, 16 Jun 91 13:05:00
Subject: a new book; special issue on emergence; preprint availab
Message-ID: <9106161115.AA13832@sztaki.hu>


                     ANNOUNCEMENTS

****************************************************************

1. a new book
2. a Special Issue on emergence
3. preprint available

****************************************************************
1. the book

				George Kampis 	

SELF-MODIFYING SYSTEMS IN BIOLOGY AND COGNITIVE SCIENCE:
a New Framework for Dynamics, Information and Complexity

Pergamon, Oxford-New York, March 1991, 546pp with 96 Figures


About the book:

The main theme of the book is the possibility of generating
information by a recursive self-modification and self-
redefinition in systems. 

The book offers technical discussions of a variety of systems
(Turing machines, input-output systems, synergetic systems,
connectionist networks, nonlinear dynamic systems, etc.) to
contrast them with the systems capable of self-modification. 

What in the book are characterized as 'simple systems' involve a
fixed definition of their internal modes of operations, with
variables, parts, categories, etc. invariant. Such systems can be
represented by single schemes, like computational models of the
above kind. A relevant observation concerning model schemes is
that any scheme grasps but one facet of material structure, and
hence to every model there belongs a complexity excluded by it.
In other words, to every simple system there belongs a complex
one that is implicit. 

Self-modifying systems are 'complex' in the sense that they are
characterized by the author as ones capable to access an
implicate material complexity and turn it into the information
carrying variables of a process. An example for such a system
would be a tape recorder which spontaneously accesses new modes
of information processing (e.g. bits represented as knots on the
tape). A thesis discussed in the book is that unlike current
technical systems, many natural systems know how to do that
trick, and make it their principle of functioning. 

The book develops the mathematics, philosophy and methodology for
dealing with such systems, and explains how they work. A
constructive theory of models is offered, with which the modeling
of systems can be examined in terms of algorithmic information
theory. This makes possible a novel treatment of various old
issues like causation and determinism, symbolic and nonsymbolic
systems, the origin of system complexity, and, finally, the
notion of information. The book introduces technical concepts
such as information sets, encoding languages, material
implications, supports, and reading frames, to develop these
topics, and a class of systems called 'component-systems', to
give examples for self-modifying systems. As an application, it
is discussed how the latter can be applied to understand aspects
of evolution and cognition. 


From tgelder at phil.indiana.edu  Mon Jun 17 11:45:58 1991
From: tgelder at phil.indiana.edu (Timothy van Gelder)
Date: Mon, 17 Jun 91 10:45:58 EST
Subject: distribution and its advantages
Message-ID: <mailman.417.1149540224.24850.connectionists@cs.cmu.edu>


Javier Movellan's question -- what are distributed 
representations are good for anyway? -- is I think an 
important one for connectionism and cognitive science 
generally. Trouble is, the way it was put, it presupposes that 
there is some one kind of representation that everyone is 
referring to when they talk about distribution. In fact, though 
most people have a reasonable idea what they themselves 
intend when they use the term "distributed", they usually 
don't realize that its not the way many other people use it. 

This is immediately apparent if one takes an overview of the 
responses that actually came in. Various people took it that a 
representation is distributed if it utilizes many units rather 
than just one, with the "strength" of distribution increasing 
as the total number of units (or perhaps, the proportion of 
available units) used increases. Massone by contrast thought 
the key concept is that of redundancy, which I take roughly to 
mean that a given piece of input information is represented 
multiple times. This presumably requires that many units are 
used (i.e., that there is distribution in the previous sense) but 
is a significantly stronger requirement. Massone's position 
was echoed in some other responses. Chalmers claims that a 
distributed representation is one in which every 
representation, whether of a basic concept or a more complex 
one, has a kind of semantically significant internal structure. 
This definition also seems to presuppose the first kind of 
definition, but is different from redundancy. Proposing a 
somewhat different definition again, French suggested that 
distribution is a matter of the degree of "overlap" between 
representations of different entities. And so on.

This lack of agreement over what distribution actually is at 
least partly responsible for the fact that no really clear and 
useful consensus on the advantages of distributed 
representation really emerged in the responses to the initial 
question. It manifests a wider lack of agreement over the 
concept of distribution in connectionism and cognitive 
science more generally. I once surveyed as many of the 
definitions and occurrences of "distribution", "distributed 
representation", etc., as I could find in the cognitive science 
literature, and found that there were at least 5 very different 
basic properties that people often refer to as distribution. 
These ranged from a very simple notion of "spread-out-
ness" - each entity being represented by activity in many 
units rather than just one - at one extreme, to complete 
functional equipotentiality at the other. (A representation is 
functionally equipotential when any part of it can stand in for 
the whole thing. Holograms are famous for exhibiting a form 
of equipotentiality.) Authors often picked up multiple strands 
and ran them together in one characterization, or defined 
distribution differently on different occasions, sometimes 
even in the same work. 

Probably the two most common definitions are (1) the notion 
of simple extendedness just mentioned (i.e., using "many" 
units to represent a given item) and (2) superimposition of 
representations. We have superimposition when there are 
multiple items being represented at the same time, but no 
way of pointing to the discrete part of the representation 
which is responsible for item A, the discrete part which is 
responsible for item B, and so forth. Think of the weights in 
a standard feed-forward network. Here multiple input-output 
associations are represented at the same time, but there is (in 
general) no separate set of weights for each association. 

To see how these two senses simultaneously dominate 
connectionist discussions of distribution, think again of the 
answers to Movellan's question. Many of the answers took 
the form, roughly, that "when I used representations 
involving activity in many units rather than just one in such 
and such a network, I found better (or worse!) performance". 
Other responses, particularly those that made reference to 
the brain or neuropsychological results, were more concerned 
with the extent to which there is separate or discrete storage 
of the various components of our knowledge in a given 
circumscribed domain. (In these contexts, "graceful 
degradation" in performance is often thought to be a 
consequence of knowledge being stored in an inextricably 
superimposed fashion.)

In one sense, it is not surprising that these are the two most 
common notions of distribution. Perhaps the only thing that 
is really clear about distribution is the opposition between 
distribution and localization: whatever distributed 
representations are, they are non-local. Trouble is, "local" 
turns out to be ambiguous. Sometimes "local" means 
restricted in extent (e.g., using only one unit rather than 
many), and sometimes it means not overlapping with the 
representation of anything else. The two most common 
senses of "distribution" mentioned a moment ago simply 
result from denying locality in these two distinct senses. 

It seems to me that a necessary condition for any significant 
progress on the question "what are distributed 
representations good for?" is that this general state of 
confusion over what "distributed" means be resolved. This 
means clearly laying out the different senses that are floating 
around, picking out the one that is the most central and most 
theoretically significant, and giving it a reasonably precise 
definition. I attempted this in Ch.1 of my PhD dissertation 
(Distributed Representation, University of Pittsburgh 1989); 
a shorter overview of some of the material from that chapter 
has recently appeared as "What is the D in PDP? An overview 
of the concept of distribution" in Stich, Ramsey & Rumelhart 
(eds) Philosophy and Connectionist Theory. 

In my opinion, the most important concept in the vicinity of 
distribution is that of superimposition of representations, 
and it is for this that the term "distributed" should really be 
reserved. One advantage of this strategy is that 
superimposition admits of a surprisingly clear and satisfying 
mathematical definition:

Suppose R is a representation of multiple items. If the 
representings of the different items are fully superimposed, 
every part of the representation R must be implicated in 
representing each item. If this is achieved in a non-trivial 
way there must be some encoding process that generates R 
given the various items to be stored, and which makes R 
vary, at every point, as a function of each item. This process 
will be implementing a certain kind of transformation from 
items to representations. This suggests thinking of 
distribution more generally in terms of mathematical 
transformations exhibiting a certain abstract structure of 
dependency of the output on the input. More precisely, define 
any transformation from a function F to another function G 
as strongly distributing just in case the value of G at any 
point varies with the value of F at every point; the Fourier 
transform is a classic example. Similarly, a transformation 
from F to G is weakly distributing, relative to a division of 
the domain of F into a number of sub-domains, just in case 
the value of G at every point varies as a function of the value 
of F at at least one point in each sub-domain. The classic 
example here is the linear associator, in which a series of 
vector pairs are stored in a weight matrix by first forming, 
and then adding together, their respective outer products. 
Each element of the matrix varies with every stored vector, 
but only with one element of each of those vectors. (The 
"functions" F and G in this case describe the input vectors 
and the association matrix respectively; e.g., given an 
argument specifying a place in an input vector, F returns the 
value of the vector at that place.)

Clearly, a given distributing transformation yields a 
whole space of functions resulting from applying that 
transformation to different inputs (i.e., different functions 
F). If we think of these output functions as descriptions of 
representations, and the input functions as descriptions of 
items to be represented, the distributing transformation is 
defining a whole space or scheme of distributed 
representations. To be a distributed representation, then, is 
to be a member of such a scheme; it is to be a representation 
R of a series of items C such that the encoding process which 
generates R on the basis of C implements a given distributing 
transformation.

Basically, then, distributed representations are what you get 
from distributing transformations, which are 
transformations which make each part of the output (the 
representation) depend on every part of the input (what 
you're representing). Now, mathematically speaking, there is 
a vast number of different kinds of distributing 
transformations, and so there is a vast number of possible 
instantiations of distributed representation. Connectionists 
can be seen as exploring that portion of the space of possible 
transformations that you can handle with n-dimensional 
vector operations, learning algorithms, etc. In other domains 
such as optics it is possible to implement other forms of 
distributing transformations and hence to get distributed 
representations with different properties.

There are a number of reasons for wanting to define 
distributed representation in terms of superimposition 
generally, and distributed transformations in particular:
(a) superimposition is certainly one of the most common of 
the standard senses of "distribution" in current usage, and so 
we remain as close as possible to that usage;
(b) superimposition admits of a precise mathematical 
definition, so those who think clarity only comes from 
formalization should be kept happy; 
(c) various popular properties of distributed representation 
such as automatic generalization and graceful degradation are 
a natural consequence of distribution defined this way;
(d) in practice, in a connectionist context, distribution in the 
sense of requiring many units rather than just one is a 
necessary precondition of this more full-blooded notion; 
hence any advantages that accrue to representations in virtue 
of utilizing many units also accrue to superimposed 
representations;
(e) a number of other interesting theoretical results follow 
from defining distribution this way: in particular, it can be 
shown that distributed representations cannot be symbolic in 
nature, on a reasonably precise definition of "symbolic" (see 
e.g. my "Why distributed representation is inherently non-
symbolic", in G. Dorffner (ed.) Konnektionismus in Artificial 
Intelligence und Kognitionsforschung. Berlin: Springer-
Verlag, 1990; 58-66). 

On the basis of this kind of definition of what distributed 
representation is, what kind of answer can be given to the 
"what are distributed representations good for?" question? 
Well, the kind of answer you will find satisfying will depend 
very much on what your theoretical interests are. A 
connectionist whose concerns have more of an applied, 
engineering focus will want to know what specific processing 
benefits arise from using representations generated by 
distributing transformations. As mentioned in (c) above, I 
think that some of the favorite virtues of distribution are 
best seen as an immediate consequence of superimposition. 
The technical issues here still need much clarification, 
however.

As a cognitive scientist, on the other hand, I'm interested in 
more general questions such as - what are the advantages of 
distribution for human knowledge representation? Here I 
don't have any actual answers ready to hand; the most I can 
do the moment is point to the kind of question that seems the 
most interesting. Speaking at the broadest possible level: 
various difficulties encountered in mainstream AI, combined 
with some philosophical reflections, suggest that everyday 
commonsense knowledge cannot be fully and effectively 
captured in any kind of purely symbolic format; that, in other 
words, symbolic representation is fundamentally the wrong 
medium for capturing at least certain kinds of human 
knowledge. Just above I mentioned that distributed 
representation (defined in terms of superimposition) can be 
shown to be intrinsically non-symbolic. The obvious 
suggestion then is: perhaps the most important advantage of 
distributed representation is that it (and it alone?) is capable 
of representing the kind of knowledge that underlies everyday 
human competence?

Tim van Gelder

From tsejnowski at UCSD.EDU  Mon Jun 17 13:14:00 1991
From: tsejnowski at UCSD.EDU (Terry Sejnowski)
Date: Mon, 17 Jun 91 10:14:00 PDT
Subject: Santa Fe Time Series Competition
Message-ID: <9106171714.AA23031@sdbio2.UCSD.EDU>

             A Time Series Prediction and Analysis Competition

                         The Santa Fe Institute
                     
                    August 1, 1991 - December 31, 1991

A wide range of new techniques are now being applied to the time series
analysis problems of predicting the future behavior of a system and deducing
properties of the system that produced the time series. Such problems arise in
most observational disciplines, including physics, biology, and economics; new
tools, such as the use of connectionist models for forecasting, or the
extraction of parameters of nonlinear systems with time-delay embedding,
promise to provide results that are unobtainable with more traditional time
series techniques. Unfortunately, the realization and evaluation of this
promise has been hampered by the difficulty of making rigorous comparisons
between competing techniques, particularly ones that come from different
disciplines.

In order to facilitate such comparisons and to foster contact among the
relevant disciplines, the Santa Fe Institute is organizing a time series
analysis and prediction competition. A few carefully chosen experimental time
series will be made available through a computer at the Santa Fe Institute,
and quantitative analyses of these data will be collected in the areas of
forecasting, characterization (evaluating dynamical measures of the system
such as the number of degrees of freedom and the information production rate),
and system identification (inferring a model of the system's governing
equations). At the close of the competition the performance of the techniques
submitted will be compared and published, and the server will continue to
operate as an archive of data, programs, and comparisons among algorithms.
There will be no monetary prizes. A workshop is planned for the Spring of 1992
to explore the results of the competition.

The competition does not require advance registration; to enter, simply
retrieve the data and submit your analysis. The detailed description of the
competition categories and instructions for retrieving the data and entering
the competition will be available after August 1 through four routes:

                             ACCESSING THE DATA
                             --------- --- ----

        ftp: Ftp to sfi.santafe.edu (192.12.12.1) as user "tsguest" and use
             "tsguest" for the password. Get the file "instructions".    
    dial-up: There are two dial-up lines: 505-988-1705 (2400 baud), and
             505-986-0252 (any speed to 9600 baud). The settings for both
             lines are no parity, 8 bit words, 1 stop bit. At the connect
             press return; at the <cmd> prompt type "login tsguest" and
             use "tsguest" for the password. At the next <cmd> prompt type
             "telnet sfi" and login as user "tsguest" (password "tsguest").
             Using either "kermit" or "xmodem", retrieve the file
             instructions". When you are finished, logout from sfi and from
             the <cmd> prompt.
mail server: Send email to tserver at sfi.santafe.edu with the phrase
             "send time series instructions" in either the subject or the body
             of the message. The mailer will return a file with more
             detailed instructions for requesting the data and submitting
             analyses.
   pc disks: The data is available on disks in either IBM-PC or Mac
             formats. To cover the cost of distributing the data, send $25 to
             Time Series Competition Disks, The Santa Fe Institute, 1120
             Canyon Road, Santa Fe, NM 87501, and specify the machine type,
             disk size, and disk density required. Instructions will be
             included with the disks on submitting a return disk with the
             analysis of the data.


                           FOR MORE INFORMATION
                           --- ---- -----------

Further questions about the competition, or inquiries about contributing
data to be used in the competition, should be directed to:

                          Time Series Competition
                            Santa Fe Institute
                       1660 Old Pecos Trail, Suite A
                            Santa Fe, NM 87501
                             (505) 984--8800
                         tserver at sfi.santafe.edu

or to one of the organizers:

        Neil Gershenfeld                Andreas Weigend
        Department of Physics           Xerox Palo Alto Research Center
        Harvard University              3333 Coyote Hill Road
        15 Oxford Street                Palo Alto, CA  94304
        Cambridge, MA  02138            (415) 322-4066
        (617) 495-5641                  andreas at sfi.santafe.edu
        neilg at sfi.santafe.edu           
    

                              ADVISORY BOARD
                              -------- -----

     Prof. Leon Glass              Department of Physiology
                                   McGill University

     Prof. Clive W. J. Granger     Center for Econometric Analysis
                                   Department of Economics
                                   University of California, San Diego

     Prof. William H. Press        Department of Physics and Center
                                   for Astrophysics
                                   Harvard University

     Prof. Maurice B. Priestley    Department of Mathematics
                                   The University of Manchester Institute of
                                   Science and Technology

     Prof. Itamar Procaccia        Department of Chemical Physics
                                   The Weizmann Institute of Science

     Prof. T. Subba Rao            Department of Mathematics
                                   The University of Manchester Institute of
                                   Science and Technology
 
     Prof. Harry L. Swinney        Department of Physics
                                   University of Texas at Austin


From pazzani%pan.ICS.UCI.EDU at VM.TCS.Tulane.EDU  Tue Jun 18 14:10:12 1991
From: pazzani%pan.ICS.UCI.EDU at VM.TCS.Tulane.EDU (Michael Pazzani)
Date: Tue, 18 Jun 91 11:10:12 -0700
Subject: Special Issue of Machine Learning Journal
Message-ID: <9106181110.aa28419@PARIS.ICS.UCI.EDU>


MACHINE LEARNING will be publishing a special issue on Computer Models
of Human Learning.  The ideal paper would describe an aspect of human
learning, present a computational model of the learning behavior,
evaluate how the performance of the model compares to the performance
of human learners, and describe any additional predictions made by the
computational model. Since it is hoped that the papers will be of
interest to both cognitive psychologists and computer scientists,
papers should be clearly written and provide the background
information necessary to appreciate the contribution of the
computational model.

Manuscripts must be received by April 1, 1992, to assure full
consideration.  One copy should be mailed to the editor:

        Michael Pazzani
        Department of Information and Computer Science
        University of California,
        Irvine, CA 92717
        USA

In addition, four copies should be mailed to:
        Karen Cullen
        MACH Editorial Office
        Kluwer Academic Publishers
        101 Philip Drive
        Assinippi Park
        Norwell, MA 02061
        USA

Papers will be subject to the standard review process.  Please pass
this announcement along to interested colleagues.

From pazzani at pan.ICS.UCI.EDU  Tue Jun 18 14:10:12 1991
From: pazzani at pan.ICS.UCI.EDU (Michael Pazzani)
Date: Tue, 18 Jun 91 11:10:12 -0700
Subject: Special Issue of Machine Learning Journal
Message-ID: <9106181110.aa28419@PARIS.ICS.UCI.EDU>


MACHINE LEARNING will be publishing a special issue on Computer Models
of Human Learning.  The ideal paper would describe an aspect of human
learning, present a computational model of the learning behavior,
evaluate how the performance of the model compares to the performance
of human learners, and describe any additional predictions made by the
computational model. Since it is hoped that the papers will be of
interest to both cognitive psychologists and computer scientists,
papers should be clearly written and provide the background
information necessary to appreciate the contribution of the
computational model.

Manuscripts must be received by April 1, 1992, to assure full
consideration.  One copy should be mailed to the editor:

	Michael Pazzani
	Department of Information and Computer Science
	University of California,
	Irvine, CA 92717
	USA

In addition, four copies should be mailed to:
	Karen Cullen
	MACH Editorial Office
	Kluwer Academic Publishers
	101 Philip Drive
	Assinippi Park
	Norwell, MA 02061
	USA

Papers will be subject to the standard review process.  Please pass
this announcement along to interested colleagues.

From pollack at cis.ohio-state.edu  Tue Jun 18 11:28:35 1991
From: pollack at cis.ohio-state.edu (Jordan B Pollack)
Date: Tue, 18 Jun 91 11:28:35 -0400
Subject: Neuroprose Turbulence Expected
Message-ID: <9106181528.AA01029@dendrite.cis.ohio-state.edu>

Cheops, the pyramid machine upon which NEUROPROSE resides, will be
decommissioned. The Neuroprose archive will move, with luck, to a new
Sparcserver at the same IP address also called Cheops.  But between
today and July 1, all cis.ohio-state.edu systems (including email) 
will be pretty wobbly, so expect delays. 

Jordan Pollack                            Assistant Professor
CIS Dept/OSU                              Laboratory for AI Research
2036 Neil Ave                             Email: pollack at cis.ohio-state.edu
Columbus, OH 43210                        Phone: (614)292-4890 (then * to fax)


From uh311ae at sunmanager.lrz-muenchen.de  Tue Jun 18 18:31:23 1991
From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges)
Date: 19 Jun 91 00:31:23+0200
Subject: large SIMD nn machines, ASI
Message-ID: <9106182231.AA12381@sunmanager.lrz-muenchen.de>

Hello,

I wonder wether there are any other beta testers of the ASI Cnaps
machine out who might want to share some experiences. Specifically,
has anyone
    - implemented a non-local algorithm (CG, PCG),
    - implemented a good random number generator memory efficient
      enough to be put into node memory / what do you think about
      tables or host communication for an alternative implementation ?
    - thought about interfacing some hardware as preprocessor, pi-
      ping data in via DMA ?
    - found a job for idle processors (small net sizes)
    - liked the 1-bit weight mode
    - ported the debugger to Irix
    - (other)
?
Some of these questions should be familiar to other SIMD programmers
to ( I have the witbrock gf11 paper). Thank you for any hints.

Cheers, Henrik  (Rick at vee.lrz-muenchen.de)

H. Klagges, Laser Institute Prof Haensch, PhysDep U of Munich, FRG
+ IBM Research Division, Binnig group


From uh311ae at sunmanager.lrz-muenchen.de  Tue Jun 18 18:44:50 1991
From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges)
Date: 19 Jun 91 00:44:50+0200
Subject: Backpercolation
Message-ID: <9106182244.AA12446@sunmanager.lrz-muenchen.de>

Hello,

I wonder wether the backpercolation algorithm (see back articles in
comp.ai.neural-nets) is important or not. I got some very preliminary
results on very simple problems (n-n-n linear channel with few (3-10)
patterns) which look not bad, but complicated ones don't seem parti-
cularly zooming yet (yes, there are some bugs in my code left). If
anyone likes a C++ - backperc - server object (guarantedly broken) to
avoid reinventing the wheel and get some basic data structures, let
me know. The only problem: Mark Jurik (mgj at cup.portal.com) wants you
to sign a nondisclosure thing first before I can sent it out to you.

Anyway, if someone else has some first results, I would really like
to see them.

Cheers, Henrik  (Rick at vee.lrz-muenchen.de)


From ITGT500 at INDYCMS.BITNET  Tue Jun 18 16:32:57 1991
From: ITGT500 at INDYCMS.BITNET (Bo Xu)
Date: Tue, 18 Jun 91 15:32:57 EST
Subject: Distributed vs. local representation
Message-ID: <25A077F07E800064@BITNET.CC.CMU.EDU>

Following I would like to state my views on the distributed and local
representations.  All comments are more than welcome.
 
I think that if we define a strict local representation as: "one object
(or item, entity, etc.) is represented by one node (or unit, neuron, etc.)
only, and one node represents only one object", then all the other situations
probably can be classified as distributed representation (either semi- or
full-distributed).  In other words, only the one-to-one representation
belongs to local representation.  The others, multiple-to-one, one-to-
multiple, and multiple-to-multiple representations all belong to distributed
representation.  Therefore, the distributed representation has more senses
than local.  This may help reduce the confusion regarding these definitions.
 
Because distributed representation covers more range than local, there are
many different appearance in distributed representation.
 
One point unnoticed up to now is the difference between the "binary
representation" (the node takes binary values only) and the "analog
representation" (the node takes analog values).  In NETtalk and many other
examples, the distributed representation used seems to be the binary one.
However, the world seems favoring and is taking the analog form.  Therefore,
analog distributed representation probably is the one that is working and
dominating our cognitive processes.  I met one such problem in our work on
parabolic problem.  We found that if it was not impossible, it would be very
difficult to use (strict) local or binary distributed representation to
solve the parabolic problem.  It was only the analog distributed representation
that worked well.  We concluded that from the practical application viewpoint,
local and the distributed representation all would work if the training and
test patterns were discrete and finite.  However, if the training and/or test
patterns were continuous and infinite, only distributed representation worked.
 
-Bo

From aam9n at hagar2.acc.Virginia.EDU  Wed Jun 19 04:39:39 1991
From: aam9n at hagar2.acc.Virginia.EDU (Ali Ahmad Minai)
Date: Wed, 19 Jun 91 04:39:39 EDT
Subject: Distributed Representations
Message-ID: <9106190839.AA00322@hagar2.acc.Virginia.EDU>


We connectionists never tire of talking about "distributed representations",
and with good reason. However, I have never come across a rigorous definition
of the concept. Now, I realize that this notion, like most powerful ones,
will necessarily be diminished in any process of definition, however inclusive
that might be. That has not fazed us in trying to define entropy, information,
complexity, learnability --- and probability! My question is: has anyone
rigorously, or even empirically, tried to come up with a definition for
distributed representations --- especially a way to quantify distributed-ness?
I suppose high-order statistics represent a way to look at this, but have
there been any attempts to develop a definition specifically in the context
of connectionist networks? And would that be such a bad thing?


Ali Minai
Dept of EE
University of Virginia
aam9n at Virginia.EDU

From maureen at ai.toronto.edu  Wed Jun 19 11:38:49 1991
From: maureen at ai.toronto.edu (Maureen Smith)
Date: Wed, 19 Jun 1991 11:38:49 -0400
Subject: Announce new CRG Technical Report
Message-ID: <91Jun19.113852edt.780@neuron.ai.toronto.edu>


The following technical report is available for ftp from the neuroprose
archive. A hardcopy may also be requested. (See below for details.)

Though written for a statistics audience, this report should be of interest
to connectionists and others interested in machine learning, as it reports
a Bayesian solution for one type of "unsupervised concept learning". The
technique employed is also related to that used in Boltzmann Machines.


            Bayesian Mixture Modeling by Monte Carlo Simulation

                             Radford M. Neal

                       Technical Report CRG-TR-91-2
                      Department of Computer Science
                          University of Toronto

    It is shown that Bayesian inference from data modeled by a mixture
    distribution can feasibly be performed via Monte Carlo simulation.
    This method exhibits the true Bayesian predictive distribution,
    implicitly integrating over the entire underlying parameter space. 
    An infinite number of mixture components can be accommodated without
    difficulty, using a prior distribution for mixing proportions that
    selects a reasonable subset of components to explain any finite
    training set. The need to decide on a ``correct'' number of components
    is thereby avoided. The feasibility of the method is shown empirically
    for a simple classification task.


To obtain a compressed PostScript version of this report from neuroprose,
ftp to "cheops.cis.ohio-state.edu" (128.146.8.62), log in as "anonymous"
with password "neuron", set the transfer mode to "binary", change to the
directory "pub/neuroprose", and get the file "neal.bayes.ps.Z". Then
use the command "uncompress neal.bayes.ps.Z" to convert the file to 
PostScript.

To obtain a hardcopy version of the paper by physical mail, send mail
to :		Maureen Smith 
		Department of Computer Science
		University of Toronto
		6 King's College Road
		Toronto, Ontario
		M5A 1A4

From schraudo at cs.UCSD.EDU  Wed Jun 19 21:39:56 1991
From: schraudo at cs.UCSD.EDU (Nici Schraudolph)
Date: Wed, 19 Jun 91 18:39:56 PDT
Subject: hertz.refs.bib patch
Message-ID: <9106200139.AA29142@beowulf.ucsd.edu>

In adding the "HKP:" prefix to the citation keys in the BibTeX version
of the Hertz/Krogh/Palmer bibliography I forgot to modify the internal
cross-citations accordingly.  I've appended the necessary patch below;
it only involves three lines, but those who don't feel up to the task
can ftp the patched file (still called hertz.refs.bib.Z) from neuroprose.

My apologies for the invonvenience,

- Nici Schraudolph.


Here's the patch:

*** hertz.refs.bib	Wed Jun 19 18:23:36 1991
***************
*** 73,80 ****
  @string{snowbird = "Neural Networks for Computing"}
  
  % -------------------------------- Books ---------------------------------
! @string{inAR = "Reprinted in \cite{Anderson88}"}
! @string{partinAR = "Partially reprinted in \cite{Anderson88}"}
  @string{pdp = "Parallel Distributed Processing"}
  
  % ------------------------------- Journals ---------------------------------
--- 73,80 ----
  @string{snowbird = "Neural Networks for Computing"}
  
  % -------------------------------- Books ---------------------------------
! @string{inAR = "Reprinted in \cite{HKP:Anderson88}"}
! @string{partinAR = "Partially reprinted in \cite{HKP:Anderson88}"}
  @string{pdp = "Parallel Distributed Processing"}
  
  % ------------------------------- Journals ---------------------------------
***************
*** 3500,3506 ****
          pages = "75--112",
        journal =  cogsci,
         volume =  9,
!          note = "Reprinted in \cite[chapter 5]{Rumelhart86a}",
           year =  1985
    }
  
--- 3500,3506 ----
          pages = "75--112",
        journal =  cogsci,
         volume =  9,
!          note = "Reprinted in \cite[chapter 5]{HKP:Rumelhart86a}",
           year =  1985
    }
  

From rich at gte.com  Thu Jun 20 10:24:53 1991
From: rich at gte.com (Rich Sutton)
Date: Thu, 20 Jun 91 10:24:53 -0400
Subject: Job Announcement - GTE
Message-ID: <9106201424.AA29945@bunny>


The connectionist machine learning project at GTE Laboratories is
looking for a researcher in computational models of learning and
adaptive control.  Applications from highly-qualified candidates are
solicited.  A demonstrated ability to perform and publish world-class
research is required.  The ideal candidate would also be interested in
pursuing applications of their research within GTE businesses.  GTE is a
large company with major businesses in local telphone operations, mobile
communications, lighting, precision materials, and government systems.
GTE Labs has had one of the largest machine learning research groups in
industry for about seven years.

A doctorate in Computer Science, Computer Engineering or Mathematics
is required.  A demonstrated ability to communicate effectively in
writing and in technical and business presentations is also required.

Please send resumes and correspondence to:

June Pierce
GTE Labs MS-44
40 Sylvan Road
Waltham, MA 02254
USA

From ga1043 at sdcc6.UCSD.EDU  Thu Jun 20 12:48:06 1991
From: ga1043 at sdcc6.UCSD.EDU (ga1043)
Date: Thu, 20 Jun 91 09:48:06 PDT
Subject: Super-Turing discussion
Message-ID: <9106201648.AA15438@sdcc6.UCSD.EDU>


A couple of months ago, there was a discussion on the network about neural
nets, their capabilities, super-Turing machines, etc.  About five or six
references were mentioned.  Does anyone have a list of those refereces, or
a copy of that discussion?  If you could forward the information to me at
ga1043 at sdcc6.ucsd.edu, I would appreciate it.

Valerie Hardcastle

From rstark at aipna.edinburgh.ac.uk  Thu Jun 20 12:29:54 1991
From: rstark at aipna.edinburgh.ac.uk (rstark@aipna.edinburgh.ac.uk)
Date: Thu, 20 Jun 91 12:29:54 BST
Subject: Distributed vs. Localist Representations
Message-ID: <4210.9106201129@fal.aipna.ed.ac.uk>


One aspect of this issue which seems implicit in much of this discussion
is the notion that distributed representation can be considered
a *relative* property.  Thus the "room schema" network is "distributed"
relative to rooms, but "localist" relative to ovens.  Likewise,
the Jets and Sharks model, which is considered to be strictly localist
in the sense that each unit explictly represents a single concept
(eg. "is-in-thirties"), does produce representations that
are distributed relative to individual gang members.  Andy Clark
notes this in Microcognition.  

Does this seem correct?  Is anyone uncomfortable with calling the
Jets and Sharks a "distributed" model since each individual is
represented by a pattern over the units (one unit active in each
competition network), even though each unit can be clearly labelled in
a localist fashion?

Note that his notion of relativity in distributed
representation is (I believe) distinct from its continuous aspects
(seen in references to "paritally-" or "semi-" distributed representations),
which may be quantifiable using eg. Tim Van Gelder's proposal
of degree of superimposition.

-Randall Stark

---------------------------------------------------------------------------
Randall Stark TEL: (+44)-31-650-2725       | Dept of Artifical Intelligence
JANET: rstark at uk.ac.ed.aipna               | 80, South Bridge
ARPA: rstark%uk.ac.ed.aipna at nsfnet-relay   | University of Edinburgh
UUCP: ...!uunet!mcsun!ukc!aipna!rstark     | Edinburgh, EH1 1HN, UK
---------------------------------------------------------------------------

From haffner at lannion.cnet.fr  Fri Jun 21 11:36:19 1991
From: haffner at lannion.cnet.fr (Haffner Patrick)
Date: 21 Jun 91 17:36:19+0200
Subject: POST-DOCTORAL VACANCY : Connectionism and Oral Dialogue
Message-ID: <9106211536.AA02620@lsun26>

Applications are invited for research assistantship(s) for post-doctoral
or sabbatical candidates. Funding at the  French National Telecommunications
Research Centre (Centre National d'Etudes des Telecommunications, CNET)
will commence in September '91 for a two-year period ; the work location
will be Lannion, Brittany, France. Experience is required in Natural
Language Processing, especially Oral Dialogue Processing, by Connectionist
methods. Applicants should specify the period between Sept '91 and Sept '93
which interests them.
Applications, including CV/Resume, should be sent to :

Mme Christel Sorin

CNET LAA/TSS/RCP
BP 40
 22301 LANNION CEDEX
FRANCE

TEL : +33 96-05-31-40
FAX : +33 96-05-35-30
E-MAIL : sorin at lannion.cnet.fr


From ITGT500 at INDYCMS.BITNET  Thu Jun 20 11:55:32 1991
From: ITGT500 at INDYCMS.BITNET (Bo Xu)
Date: Thu, 20 Jun 91 10:55:32 EST
Subject: Distributed Representations
In-Reply-To: Your message of Wed, 19 Jun 91 04:39:39 EDT
Message-ID: <mailman.418.1149540224.24850.connectionists@cs.cmu.edu>

 
----------------------------Original message----------------------------
Two days ago I mentioned (strict) local representation, binary distributed
representation, and analog distributed representation.
 
As an attempt to answer Ali Minai's question, I try to give my understandings
on representations as follows:
 
(1). In my opinion, the key points underlying the definitions of representa-
tions are the correspondences between the objects (or items, entities, etc.)
to be represented by the units (or nodes, neurons, etc.) of the network and
the units.  The objects can be classified according to the properties they
have.  More than one object probably can possess the same property.  In this
case, these objects should be classified into the same group with this
property.  The units can represent different properties of the objects, or
different objects within the same property group.
     As mentioned in my mail two days ago, there are four kinds of
correspondences existed for the relationships between the objects and units:
one-to-one, multiple-to-one, one-to-multiple, and multiple-to-multiple.
     If we define the (strict) local representation as the one that represents
the one-to-one correspondence only, then all the other three correspondences
can be called distributed representations.
     However, since there are three different correspondences in distributed
representation, the word "distributed representation" will probably be a too
broad or too general concept if we try to use one definition "distributed
representation" to refer to all the three correspondences.  It is perhaps
due to this too general word or concept that brought about the confusion on the
advantages and disadvantages of local representation vs. distributed
representation.
 
(2). In an attempt to clarify these confusions, I think it is necessary
to give more specific definitions to all these four different correspondences.
     Followings are my attempt to define these representations:
 
Local Representation ---- The one-to-one correspondence in which each object
      is represented by one unit, and each unit represents only one object.
      Units in local representation always take binary values.
 
Binary Distributed Representation ---- The one-to-multiple correspondence
      in which each object is represented by multiple units and each unit
      is employed to represent only one object.  The unit takes only binary
      values here because it represents only one object, there is no need
      for it to take analog values.
 
Analog Distributed Representation ---- The multiple-to-one correspondence
      in which multiple objects with the same property are represented by
      one unit and each unit represents multiple objects with the same
      property only.  Here the unit takes different analog values for
      different objects within this property group. Different analog
      values are used to differentiate these different objects within the
      same property group.
 
Mixed Distributed Representation ---- The multiple-to-multiple correspondence
       in which multiple objects of multiple properties are represented by
       one unit and each unit represents multiple objects with multiple
       properties. Here, the units take either binary or analog values
       depending on the properties and the object they represent.
 
 
I am not sure whether the above definitions clarify these concepts and
reduce the confusions on these problems or not.
 
Welcome your comments on above statements.
 
 
Bo Xu
Dept. of Physiology and Biophysics
School of Medicine
Indiana University
ITGT500 at INDYCMS.BITNET

From hwang at pierce.ee.washington.edu  Fri Jun 21 14:54:47 1991
From: hwang at pierce.ee.washington.edu ( J. N. Hwang)
Date: Fri, 21 Jun 91 11:54:47 PDT
Subject: IJCNN'91 Presidents' Forum  (new announcement from Prof. Marks)
Message-ID: <9106211854.AA13350@pierce.ee.washington.edu.>


News release
IEEE NEURAL NETWORKS COUNCIL IS SPONSORING A PRESIDENTS' FORUM AT
IJCNN `91 IN SEATTLE, WASHINGTON

Robert J. Marks II, Professor at the University of Washington and President of the IEEE Neural Networks Council (NNC), has announced that for the first time 
the IEEE/NNC will be sponsoring a Presidents' Forum during IJCNN `91 in
Seattle, Washington, July 8-12, 1991.

The participants of the Presidents' Forum will be the Presidents of the
major artificial neural network societies of the world, including the China
Neural Networks Committee, the Joint European Neural Network Initiative, the
Japanese Neural Networks Society and the Russian Neural Networks Society.  The 
Forum will be open to conference attendees and the press on Wednesday evening,
6:30-8:30 pm, July 10, 1991, at the Washington State Convention Center in
Seattle.  Each President will give a short (15-20 minute) presentation of the
activities of their society, followed by a short question/answer period.  
Robert J. Marks II will be this year's moderator.


From aam9n at honi4.acc.virginia.edu  Thu Jun 20 17:39:38 1991
From: aam9n at honi4.acc.virginia.edu (aam9n)
Date: Thu, 20 Jun 91 17:39:38 EDT
Subject: Distributed Representations
Message-ID: <9106202139.AA00551@honi4.acc.Virginia.EDU>

 
Bo Xu presents a very interesting classification of representations
in terms of their distribution over representational units. The definitions
of each class are internally clear enough, but I have some comments about
how "distributivity" is defined, and where it leads. Let's take the definitions
that Bo Xu gives:
 
>Local Representation ---- The one-to-one correspondence in which each object
>      is represented by one unit, and each unit represents only one object.
>      Units in local representation always take binary values.
 
No quarrel about this one being a local representation.
 
>Binary Distributed Representation ---- The one-to-multiple correspondence
>      in which each object is represented by multiple units and each unit
>      is employed to represent only one object.  The unit takes only binary
>      values here because it represents only one object, there is no need
>      for it to take analog values.
 
Suppose I have two objects --- an apple and a pear --- and six representational
units r1.....r6. Then, if I read this definition correctly, a distributed
representation might be 000111 <-> apple and 111000 <-> pear. Since the units
are binary, they are presumably "on" if the object is present and "off" if it
is not. No reference is made to "properties" defining the object, and so there
is no semantic content in any unit beyond that of mere signification: each
unit is, ideally, identical. The question is: why have three units signifying
one object when they work as one? One reason might be to achieve redundancy,
and consequent fault-tolerance, through a voting scheme (e.g. 101001 <-> pear).
Is this a distributed representation, though? To decide that, I must have
an *external* definition of what it means for a representation to be
distributed. Tentatively, I say that "a representation is distributed over
a group of units if no single unit's correct operation is critical to the
representation". This certainly holds in the above example. It holds, indeed,
in all error-correcting codes. In a binary distributed representation, then,
I can define the "degree of distributivity" as the minimum Hamming distance
of the code. This is quite consistent, if rather disappointingly mundane.
 
>Analog Distributed Representation ---- The multiple-to-one correspondence
>      in which multiple objects with the same property are represented by
>      one unit and each unit represents multiple objects with the same
>      property only.  Here the unit takes different analog values for
>      different objects within this property group. Different analog
>      values are used to differentiate these different objects within the
>      same property group.
 
Here, under the obvious reading of this definition, I have two categories
(units) called "fruits" and "vegetables". Each represents many objects
with different values, but mutually exclusively. Thus, I might have
apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code
0.1,0.1. This is obviously equivalent to a binary representation with
each unit replaced by, say, n binary units. The question is: does this
code embody the principle of dispensibility? Not necessarily. One wrong bit
could change an apple into a lemon, or even lose all information about the
category of the object. Thus, in the general case, such a representation
is "distributed" only in the physical sense of activating (or not activating)
units in a group. Each unit is still functionally critical.
 
>Mixed Distributed Representation ---- The multiple-to-multiple correspondence
>       in which multiple objects of multiple properties are represented by
>       one unit and each unit represents multiple objects with multiple
>       properties. Here, the units take either binary or analog values
>       depending on the properties and the object they represent.
 
Now here we have what most people mean by "distributed representations". We
have many properties, each represented by a unit, and many objects. Each
object can be encoded in terms of its properties. If the set of properties
does not have enough discrimination, multiple objects could have the same
code. Even if the property set is sufficient for unique representation, it
is possible that the malfunction of one unit may change one object to
another. The question then is: is this dependency small or large? Does
a small malfunction in a unit cause catastrophic change in the semantic
content of the whole group of units? I can "distribute" my representation
over all the atoms in the universe, but if that doesn't give me some
protection from point failures, I have not truly "distributed" things
at all --- merely multiplied the local representation. Now, of course, in
the "real" world where things are uniformly or normally distributed and
errors are uncorrelated, increasing the size of a representation over a
set of independent units will almost always confer some degree of protection
from catastrophic point failures. An important issue is how to *maximize*
this. And to do that, we must be able to measure it. One way would be to
minimize the average information each representational unit conveys about
the represented objects, which is a simple maximum entropy formulation.
This requirement must, of course, be balanced by an adequate representation
imperative. Other formulations are certainly possible, and probably much
better. In any case, many of the more interesting issues in distributed
representation arise when the "object" being represented is only implicitly
available, or when the representation is distributed over a hierarchy of
units, not all of which are directly observable, and not all of which
count in the final encoding.
 
Comments?
 
Ali Minai
aam9n at Virginia.EDU

From aam9n at honi4.acc.virginia.edu  Thu Jun 20 17:39:38 1991
From: aam9n at honi4.acc.virginia.edu (aam9n)
Date: Thu, 20 Jun 91 17:39:38 EDT
Subject: Distributed Representations
Message-ID: <9106202139.AA00551@honi4.acc.Virginia.EDU>

 
Bo Xu presents a very interesting classification of representations
in terms of their distribution over representational units. The definitions
of each class are internally clear enough, but I have some comments about
how "distributivity" is defined, and where it leads. Let's take the
 definitions
that Bo Xu gives:
 
>Local Representation ---- The one-to-one correspondence in which each object
>      is represented by one unit, and each unit represents only one object.
>      Units in local representation always take binary values.
 
No quarrel about this one being a local representation.
 
>Binary Distributed Representation ---- The one-to-multiple correspondence
>      in which each object is represented by multiple units and each unit
>      is employed to represent only one object.  The unit takes only binary
>      values here because it represents only one object, there is no need
>      for it to take analog values.
 
Suppose I have two objects --- an apple and a pear --- and six
 representational
units r1.....r6. Then, if I read this definition correctly, a distributed
representation might be 000111 <-> apple and 111000 <-> pear. Since the units
are binary, they are presumably "on" if the object is present and "off" if it
is not. No reference is made to "properties" defining the object, and so there
is no semantic content in any unit beyond that of mere signification: each
unit is, ideally, identical. The question is: why have three units signifying
one object when they work as one? One reason might be to achieve redundancy,
and consequent fault-tolerance, through a voting scheme (e.g. 101001 <->
 pear).
Is this a distributed representation, though? To decide that, I must have
an *external* definition of what it means for a representation to be
distributed. Tentatively, I say that "a representation is distributed over
a group of units if no single unit's correct operation is critical to the
representation". This certainly holds in the above example. It holds, indeed,
in all error-correcting codes. In a binary distributed representation, then,
I can define the "degree of distributivity" as the minimum Hamming distance
of the code. This is quite consistent, if rather disappointingly mundane.
 
>Analog Distributed Representation ---- The multiple-to-one correspondence
>      in which multiple objects with the same property are represented by
>      one unit and each unit represents multiple objects with the same
>      property only.  Here the unit takes different analog values for
>      different objects within this property group. Different analog
>      values are used to differentiate these different objects within the
>      same property group.
 
Here, under the obvious reading of this definition, I have two categories
(units) called "fruits" and "vegetables". Each represents many objects
with different values, but mutually exclusively. Thus, I might have
apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code
0.1,0.1. This is obviously equivalent to a binary representation with
each unit replaced by, say, n binary units. The question is: does this
code embody the principle of dispensibility? Not necessarily. One wrong bit
could change an apple into a lemon, or even lose all information about the
category of the object. Thus, in the general case, such a representation
is "distributed" only in the physical sense of activating (or not activating)
units in a group. Each unit is still functionally critical.
 
>Mixed Distributed Representation ---- The multiple-to-multiple correspondence
>       in which multiple objects of multiple properties are represented by
>       one unit and each unit represents multiple objects with multiple
>       properties. Here, the units take either binary or analog values
>       depending on the properties and the object they represent.
 
Now here we have what most people mean by "distributed representations". We
have many properties, each represented by a unit, and many objects. Each
object can be encoded in terms of its properties. If the set of properties
does not have enough discrimination, multiple objects could have the same
code. Even if the property set is sufficient for unique representation, it
is possible that the malfunction of one unit may change one object to
another. The question then is: is this dependency small or large? Does
a small malfunction in a unit cause catastrophic change in the semantic
content of the whole group of units? I can "distribute" my representation
over all the atoms in the universe, but if that doesn't give me some
protection from point failures, I have not truly "distributed" things
at all --- merely multiplied the local representation. Now, of course, in
the "real" world where things are uniformly or normally distributed and
errors are uncorrelated, increasing the size of a representation over a
set of independent units will almost always confer some degree of protection
from catastrophic point failures. An important issue is how to *maximize*
this. And to do that, we must be able to measure it. One way would be to
minimize the average information each representational unit conveys about
the represented objects, which is a simple maximum entropy formulation.
This requirement must, of course, be balanced by an adequate representation
imperative. Other formulations are certainly possible, and probably much
better. In any case, many of the more interesting issues in distributed
representation arise when the "object" being represented is only implicitly
available, or when the representation is distributed over a hierarchy of
units, not all of which are directly observable, and not all of which
count in the final encoding.
 
Comments?
 
Ali Minai
aam9n at Virginia.EDU
 
 
From ITGT500 at INDYCMS.BITNET  Sat Jun 22 11:38:17 1991
From: ITGT500 at INDYCMS.BITNET (Bo Xu)
Date: Sat, 22 Jun 91 10:38:17 EST
Subject: Distributed Representations
Message-ID: <29E19BB296800064@BITNET.CC.CMU.EDU>

Ali Minai presented a good example of apple and pear.  I am going to answer
some questions he raised.  Let's look at his statements first.
 
>is not. No reference is made to "properties" defining the object, and so there
>is no semantic content in any unit beyond that of mere signification: each
 
This is a very good question.  Generally speaking, there are many properties
existed at the same time for each object.  Let's take the apple as an example.
An apple can be classified according to its taste, color, size, shape, or
whether it is a fruit or not (as Ali Minai chose) etc.  Different people will
choose different criteria to meet the purpose of their applications.
 
>unit is, ideally, identical. The question is: why have three units signifying
>one object when they work as one? One reason might be to achieve redundancy,
>and consequent fault-tolerance, through a voting scheme (e.g. 101001 <-> pear).
Redundancy and fault-tolerance may be reasons for binary distributed
representation.  Another reason probably comes from the faster convergence
rate consideration.  Karen Kukich has done some interesting work and concludes
that the advantage of local representation is the faster convergence rate
(see K. Kukich, "Variations on a Back-Propagation Name Recognition Net" in the
Proceedings of the United States Postal Service Advanced Technology
Conference, Vol. 2, 722-735).  The binary distributed representation is similar
to local representation in that they all take binary values.  However,
as to why "three" instead of "five" or any other numbers, I also don't know.
This question is probably similar to the question of "how many hidden units are
needed for a specific task?".  It may depend on to what degree the redundancy
is needed.
 
>Here, under the obvious reading of this definition, I have two categories
>(units) called "fruits" and "vegetables". Each represents many objects
>with different values, but mutually exclusively. Thus, I might have
>apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code
>0.1,0.1. This is obviously equivalent to a binary representation with
>each unit replaced by, say, n binary units. The question is: does this
>code embody the principle of dispensibility? Not necessarily. One wrong bit
>could change an apple into a lemon, or even lose all information about the
>category of the object. Thus, in the general case, such a representation
>is "distributed" only in the physical sense of activating (or not activating)
>units in a group. Each unit is still functionally critical.
 
It is true if there is a bit of error, the apple will change to lemon etc.
However, the key point here is that the neural net's fault-tolerance
characteristic exists only after it is trained and has reached an accuracy
criterion.  If we are dealing with many objects and use 0.1 as a value to
differentiate different objects, we will train the net to reach a criterion
at least smaller than 0.1 (otherwise, the net will be of no use).  Thus, for
seen patterns, the error will not be so big that an apple will turn into
a lemon.  For unseen patterns, bigger errors probably will occur, and apples
probably will turn to lemons or whatsoever.  However, this time we may not
attribute the problem to the representation used only.  This is related to
the generalizability of the net, and the learning algorithm, units responsive
characteristics and even the topology of the net all probably are playing
roles for the generalizability of the net.
 
>Now here we have what most people mean by "distributed representations". We
 
>nother. The question then is: is this dependency small or large? Does
>small malfunction in a unit cause catastrophic change in the semantic
>content of the whole group of units? I can "distribute" my representation
 
When talking about the representations, the graceful degradation of brain
is introduced as a criterion.  However, since the neural net is still far
away from a real brain model, some cautiousness should be taken when relating
the neural net to brain.  The first thing to be made clear is that which
layer of neural net we are refering to.  Most people refer to the interface
layers (the input and output layers) of neural net when they talk about
the local/distributed representations. However, they refer to all layers (both
the interface layers and hidden layers) when they talk about the graceful
degradation.
      However, what are the justices for the interface layers to possess
graceful degradation?  If we say that neural net resembles brain in some
aspects, then the resemblance most likely lies in the hidden layers instead
of the interface layers. The criterion of graceful degradation should be made
on the hidden layers instead of the interface layers.  In most of current nets,
the hidden layers are using mixed distributed representation, and thus
possess the graceful degradation characteristics.
      As to the interface layers (input/output layers), we can demand them
to possess the graceful degradation characteristics too.  However, in my
opinion, this will lead to many additional problems and confusions.
The mixed distributed representation is good for hidden layers, not for
interface layers.  I think for the interface layers, the analog distributed
representation works best because: (1) Considerations at the interface layers
should be practicality instead of graceful degradation.  There is no justice
and no need for the interface layers to possess the graceful degradation.
(2). The analog distributed representation has classified the objects to
be represented.  The objects with the same property are classified into the
same group.  The differences between the objects in the same group are
represented by different analog values of the unit representing this
property group (eg, assume that there are four apples and three pears, then
in analog distributed representation, two units should be used: unit A for
apple and unit P for pear.  The four apples can be represented by letting unit
A take four different analog values.  The three pears can be represented by
letting unit P take three different analog values.).  This is the most natural
way when we deal with many objects.  Why should we sacrifice the natural
way (analog distributed representation) for the graceful degradation (which
may not belong to the interface layers. The hidden layers are using mixed
distributed representation and possess graceful degradation) when we are
considering the interface layers?  We used the analog distributed
representation in a parabolic problem (a task mapping the parabola curve
we used to compare the performances of BPNN and PPNN) and found that the
analog distributed representation was the best and most natural representation
for problems (such as the parabolic problem) which has continuous and infinite
training/test patterns (objects).
 
In sum, I think that we should be more specific when we talk about the
representations and brain-like characteristics of neural nets:
(1) For the interface layers (input/output layers), the analog distributed
representation is the best choice because at the interface layers, the priority
of consideration is practicality, and the analog distributed representation
is the most natural one and most easily to be used in dealing with many
objects.
(2) For the hidden layers, the mixed distributed representation is the best
choice because the graceful degradation requirement now is the priority to
be taken into account of for hidden layers.  Fortunately, most of the current
network architechures have ensured such requirement for hidden layers.
 
 
Bo Xu
ITGT500 at INDYCMS.BITNET

From aam9n at hagar3.acc.Virginia.EDU  Sat Jun 22 21:49:33 1991
From: aam9n at hagar3.acc.Virginia.EDU (Ali Ahmad Minai)
Date: Sat, 22 Jun 91 21:49:33 EDT
Subject: Distributed Representations
Message-ID: <9106230149.AA00465@hagar3.acc.Virginia.EDU>

Bo Xu raises some questions about distributed representations in the
context of feed-forward neural networks, particularly with regard to
graceful degradation. I do not agree that to require graceful degradation
is to imply "brain-like" networks. In my opinion, the very notion of
distribution is fundamentally linked to the requirement that each
representational unit be minimally loaded, and that each representation
be as homogeneously distributed over all representational units as possible.
That this produces graceful degradation is partly true (only to the first
order, given the non-linearity of the system), but that is incidental.

Speaking of which layers to apply the definition to, I think that in a
feed-forward associative network (analog or binary), the hidden neurons
(or all the weights) are the representational units. The input neurons
merely distribute the prior part of the association, and the output neurons
merely produce the posterior part. The latter are thus a "recovery mechanism"
designed to "decode" the distributed representation of the hidden units and
recover the "original" item. Of course, in a heteroassociative system, the
"recovered original" is not the same as the "stored original". I realize that
this is stretching the definition of "representation", but it seems quite
natural to me.

The issue of a "recovery mechanism" is quite fundamental to the question
of representational distribution. Without a requirement for adequate
recoverability, any finite medium could be "distributedly" loaded with
a potentially infinite number of representations, without being able
to reproduce any of them. To ensure adequate recoverability, however,
representations must be "distinct", or mutually non-interacting, in some
sense. Given the countervailing requirement of distributedness, the
obvious route of separation by localization is not available, and we
must arrive at some compromise principle of minimum mutual disturbance,
such as a requirement for orthogonality or linear independence (rather
artificial, if you ask me).

My point is that defining distributed representations only in terms
of unconstrained characteristics is a partial solution. Internal
and external constraining factors must be included in the formulation
to adequately ground the definition. These are provided by the
requirements of maximum dispensibility and adequate recoverability.
Zillions of issues remain unaddressed by this formulation too, especially
those of consistent measurement. I feel that each domain and situation
will have to supply its own specifics.

I am not sure I understand Bo Xu's assertion that analog representations
are "more natural". Certainly, to approximate a parabola (which I have
done hundreds of times with different neural nets) would imply using an
analog representation, but it is not clear if that is so natural for
classifying apples and pears. Using different analog values to indicate
intra-class variations is reasonable and, under specific circumstances,
might even be provably better than a binary representation. But I would
be very hesitant to generalize over all possible circumstances. In any
case, a global characterization of distributed representation should depend
of specifics only for details, and should apply to both discrete and analog
representations.

Ali Minai
University of Virginia
aam9n at Virginia.EDU

From ross at psych.psy.uq.oz.au  Sun Jun 23 01:52:51 1991
From: ross at psych.psy.uq.oz.au (Ross Gayler)
Date: Sun, 23 Jun 1991 15:52:51 +1000
Subject: Distributed vs Localist Representations
Message-ID: <9106230552.AA02343@psych.psy.uq.oz.au>

Randall Stark (rstark at aipna.edinburgh.ac.uk) writes:

>One aspect of this issue which seems implicit in much of this discussion
>is the notion that distributed representation can be considered
>a *relative* property.  Thus the "room schema" network is "distributed"
>relative to rooms, but "localist" relative to ovens.

A related point was raised by Paul Smolensky in his work on variable binding
using tensor representations.  By his definition a representation is
distributed if enitities of external interest (objects, attributes, values
or whatever) are represented as patterns across multiple units.  The point
Paul makes is that in much connectionist work the variables are localised
while the values are distributed.  That is, the set of units is typically
divided into disjoint groups that function as registers or variables.
Each variable is able to hold a pattern of activations that is a distributed
value.

He proposed a mechanism in which the variables are not disjoint sets of units
but instead are patterns that are bound to the patterns representing values.
Using this scheme a binding of a variable with a value is itself represented
as a pattern distributed over units and multiple bindings can be simultaneously
represented on the same units.  The nice point about this is that it puts
variables and values on an equal footing, they are both patterns.  In fact the
system does not need to distinguish between them from a processing perspective.
Whether something is a variable or a value is a question of how it is used,
not how it is represented or implemented.

Ross Gayler
ross at psych.psy.uq.oz.au

From aarons at cogs.sussex.ac.uk  Sun Jun 23 16:13:31 1991
From: aarons at cogs.sussex.ac.uk (Aaron Sloman)
Date: Sun, 23 Jun 91 21:13:31 +0100
Subject: Varieties of intelligence (long)
Message-ID: <1666.9106232013@csrn.cogs.susx.ac.uk>


A friend, Gerry Martin, is interested in "achievers", how they differ
and the conditions that create them or enable them to achieve.

I offered to try to find out if anyone knew of relevant work on
different kinds of (human) intelligence, how they develop, what they
are, and what (social) mechanisms if any enable them to be matched with
opportunities for development or fulfilment.

There's a collection of related questions.

1. To what extent does evolution produce variation in intellectual
capabilities, motivations, etc.? How far is the observable variation due
to environmental factors?

This is an old question, of course, and very ill-defined (e.g. there is
probably no meaningful metric for the contributions of genetic and
environmental factors to individual development). It is clear that
physical variability is inherent in evolutionary mechanisms: without
this there could not be (Darwinian) evolution.

The same must presumably be true for "mental" variability. Do genetic
factors produce different kinds of differences: in intellectual
capabilities, motivational patterns, perceptual abilities, memory
abilities, problem solving abilities, etc.

I think it was Waddington who offered the metaphor of the "epigenetic
landscape" genetically determining the opportunities for development of
an individual. The route actually taken through the landscape would
depend on the individual's environment. So our question is how different
are the landscapes (the sets of possible developmental routes) with
which each human child is born, and to what extent do they determine
different opportunities for mental, as well as physical development?
(Obviously the two are linked: a blind child won't as easily become a
great painter.) (Piaget suggested that all the human landscapes have a
common structure, with well defined stages. I suspect this view will not
survive close analysis.)

For intelligent social animals, mental variability is more important
than physical variability: a social system has more diversity of
intellectual and motivational requirements in its "jobs" than diversity
of physical requirements. (Perhaps not if you include the "jobs" done
for us by other animals, plants, microorganisms, machines, etc., without
which our society could not survive.)

Anyhow, without variation in mental properties (whether produced
genetically or not) it could be hard to achieve the division of labour
that enables a complex social system to work. Aldous Huxley's book
"Brave New World" takes this idea towards an unpalatable conclusion.

The need for mental variability goes beyond infrastructure: without such
variability all artists would be painters, or all would be composers, or
all would be poets, and all scientists would be physicists, or
biologists... Division of labour is required not only for the enabling
mechanisms of society, but also for cultural richness.


2. What is the form of this variability?

Folk psychology has it that there are different kinds of genius -
musical geniuses, mathematical geniuses, geniuses in biology, great
actors and actresses, etc. Could any of these have excelled in any other
field? Would the right education have turned Mozart into a great
mathematicion, or would his particular "gifts" never have engaged with
advanced mathematics? Could a suitable background have made Newton a
great composer? Does anyone have any insight into the genetic
requirements for different kinds of creative excellence?

We can distinguish two broad questions:
    (a) is there wide variability in DEGREE in innate capabilities
    (b) is there also wide variability in KIND (domain, field of
        application, or whatever)?

In either case it would be interesting to know what kinds of mechanisms
account for the differences? Could they be quantitative (as many naive
scientists have supposed -- e.g. number of brain cells, number of
connections, speed of transmission of signals, etc.) or are the relevant
differences more likely to be structural -- i.e. differences in hardware
or software organisation?

It looks as if many ordinary human learning capabilities need specific
pre-determined structures, providing the basis for learning abilities:
e.g. learning languages with complex syntax, learning music, learning to
control limbs, learning to see structured objects, learning to play
games, learning mathematics, and so on. (Some of the structures creating
these capabilities might be shared between different kinds of
potential.)

If these enabling structures are not "all-or-nothing" systems there
could sometimes be partial structures at birth, giving some individuals
subsets of "normal" capabilities. Are these all a result of pre-natal
damage, or might the gene pool INHERENTLY generate such variety? (An
unpalatable truth?)

Does the gene pool also produce some individuals with powerful supersets
of what is relatively common? Are there importantly different supersets,
corresponding to distinct "gifts"? (E.g. Mozart, Newton, Shakespeare.)

What are the additional mechanisms these individuals have? Can those
born without be given them artificially? (E.g. through special training,
hormone treatment, etc..)


3. To what extent do different approaches to AI (I include connectionism
as a sub-field of AI) provide tools to model different sorts of
mentalities?

As far as I know, although there has been much empirical research (e.g.
on twins) to find out what is and what is not determined genetically,
there there has been very little discussion of mechanisms that might be
related to such variability.

>From an AI standpoint it is easy to speculate about ways in which
learning systems could be designed that are initially highly sensitive
to minor and subtle environmental differences and which, through various
kinds of positive feedback, amplify differences so that even individuals
that start off very similar could, in a rich and varied environment, end
up very different. This sort of thing could be a consequence of
multi-layered self-modifying architectures with thresholds of various
kinds that get modified by "experience" and which thereby change the
behaviour of systems which cause other thresholds to be modified. Even
without thresholds, hierarchies of condition-action rules, where some of
the actions create or alter other rules, would also provide for enormous
variability. (As could hierarchies of pdp networks, some of which
change the topology of others.)

Cascades of such changes could produce huge qualitative variation in
various kinds of intellectual capabilities as well as variation in
motivational, emotional and personality traits, aesthetic tastes, etc.

Such architectures might allow relatively small genetic differences as
well as small environmental differences to produce vast differences in
adult capabilities.

Variation in tastes in food, or preferences for mates, despite common
biological needs, seem to be partly a result of cultural feedback
through such developmental mechanisms. But is it all environmental? I
gather there are genetic factors that stop some people liking the tastes
of certain foods. What about a taste for mathematics, or a general taste
for intellectual achievement?


4. Does anyone have any notion of the kinds of differences in
implementation that could account for differences in tastes,
capabilities, etc. Would it require:

(a) differences in underlying physical architectures (e.g. different
    divisions of brains into cooperative sub-nets, or different
    connection topologies among neurones?),
(b) differences in the contents of "knowledge bases", "plan databases",
    skill databases, etc. (By "database" I include what can be stored
    in a trainable network.)
(c) differences in numerical parameters.

or something quite different?

I suspect there's a huge variety of distinct ways in which qualitative
differences in capability can emerge: some closer to hardware
differences, some closer to software differences. The latter might in
principle be easier to change, but not in practice, if for example, it
requires de-compiling a huge and messy system.

The only AI-related work that I know of that explicitly deals not
only with the design or development of a single agent, but with variable
populations, is work on genetic algorithms, which can produce a family
of slightly different design solutions.

Of course, it is premature for anyone to consider modelling evolutionary
processes that would produce collections of "complete" intelligent
agents (as opposed to collections of solutions to simple problems like
planning problems, recognition problems, or whatever). But has anyone
investigated general principles involved in mechanisms that could
produce populations of agents with important MENTAL differences? Are
there any general principles? (Are the mental epigenetic landscapes for
a species importantly different in structure from the physical ones?
Perhaps for some organisms, e.g. ants, there's a lot less difference
than for others, e.g. chimpanzees?)


5. There are related questions about the need for or possibility of
social engineering. (The questions are fraught with political and
ethical problems.) In particular, if truly gifted individuals have
narrowly targetted potential, are there mechanims that enable such
potential to be matched with appropriate opportunities for development
and application? Do rare needs have a way of "attracting" those with the
rare ability to tackle them?

What mechanisms can help to match individuals with unusual combinations
of motives and capabilities, with tasks or roles that require those
combinations? In a crude and only partly successful way the educational
system and career advisory services attempt to do this. Special schools
or special lessons for gifted children attempt to enhance the
match-making. However, these formal institutions work only insofar as
there are fairly broad and widely-recognized categories of individuals
and of tasks.

They don't address the problem of matching the potentially very high
achievers to very specific opportunities and tasks that need them. Some
job advertisements and recruitment services attempt to do this but
there's no guarantee that they make contact with really suitable
candidates, and we all know how difficult selection is. Also these
mechanisms assume that the need has been identified. There was no
institution that identified the need for a theory of gravity and
recruited Newton, provided him with opportunties, etc. Was it pure
chance then that he was "found"? Or were there many others who might
have achieved what he did? Or were there unrecognized social mechanisms
that "arranged" the match? If so, how far afield could he have been born
without defeating the match-making?

If the potentially very high acheivers only have very small areas in
which their potential can be realized, and if each type is very rare,
there may be no general way to set up conditions that bring them into
the appropriate circumstances. An important example might turn out to be
the problem of matching the particular collection of talents, knowledge,
and opportunity that would enable a cure for AIDS to be found.

In a homogenous global culture with richly integrated (electronic?)
information systems it might be possible to reduce the risks of such
lost opportunities, but only if there are ways of recognizing in advance
that a particular individual is likely to be well suited to a particular
task. The more narrowly defined and rare the task and the capabilities,
the less likely it is that the match can be recognized in advance.

Is the idea that there are important but extremely difficult tasks and
challenges that only a very few individuals have the potential to cope
with just a romantic myth? Or is every solvable problem, every
achievable goal, solvable by a large subset of humanity, given
the right training and opportunity?

(Will we ever know whether nobody but Fermat had what it takes to prove
his "last" theorem?)

Even if the "romantic myth" is close to the truth, there may be no way
of setting up social mechanisms with a good chance of bringing important
opportunities and appropriately gifted individuals together: social
systems are so complex that all attempts to control them, however
well-meaning, invariably have a host of unintended, often undesirable,
consequences, some of them long term and far less obvious than missiles
that hit the wrong target.

Could some variant of AI help here? It seems unlikely that connectionist
pattern recognition techniques could work. (E.g. where would training
sets come from?) Could some more abstract sort of expert system help?
Neither could inform us that the person capable of solving a particular
problem is an unknown child in a remote underdeveloped community.

Perhaps there is nothing for it, but to rely on chance, co-incidence, or
whatever combination of ill-understood biological and social processes
have worked up to now in enabling humankind to achieve what
distinguishes us from ants and apes) including our extremes of
ecological vandalism).

-----------------------------------------------------------------------

I don't know if I have captured Gerry's questions well: he hasn't seen
this message. But if you have any relevant comments including
pointers to literature, information about work in progress, criticisms
of the presuppositions of the questions, conjectures about the answers,
etc. I'll be interested to receive them and to pass them on.

I'll post this to connectionists and the comp.ai newsgroup. (Should it
go to others?)

Apologies for length.
Aaron Sloman,
School of Cognitive and Computing Sciences,
Univ of Sussex, Brighton, BN1 9QH, England
    EMAIL   aarons at cogs.sussex.ac.uk
After 18th July 1991:
    School of Computer Science. The University of Birmingham, UK.
    Email: A.Sloman at cs.bham.ac.uk

From ITGT500 at INDYCMS.BITNET  Mon Jun 24 10:45:52 1991
From: ITGT500 at INDYCMS.BITNET (Bo Xu)
Date: Mon, 24 Jun 91 09:45:52 EST
Subject: Distributed Representations
Message-ID: <AE89B8428E800064@BITNET.CC.CMU.EDU>

Ali Minai mentioned a good point on where the representations are considered.
Let's see his messages first:
 
>Speaking of which layers to apply the definition to, I think that in a
>feed-forward associative network (analog or binary), the hidden neurons
>(or all the weights) are the representational units. The input neurons
>merely distribute the prior part of the association, and the output neurons
>merely produce the posterior part. The latter are thus a "recovery mechanism"
>designed to "decode" the distributed representation of the hidden units and
>recover the "original" item. Of course, in a heteroassociative system, the
>"recovered original" is not the same as the "stored original". I realize that
>this is stretching the definition of "representation", but it seems quite
>natural to me.
 
I think according to the criterion of where representations exist, the
representations can be classified into two different types:
 
(1). External representations ---- The representations existed at the
         interface layers (input and/or output layers).  They are
         responsible for the information transmission between the network
         and the outside world (coding the input information at the input
         layer and decoding the output information at the output layer).
 
(2). Internal representations ---- The representations existed at the
         hidden layers.  These representations are used to encode the
         mappings from the input field to the output field. The mappings
         are the core of the neural net.
 
If I understand correctly, Ali Minai is referring to the internal
representations only, and neglect the external representations.  The internal
representations are very important representations.  However, these
representations are determined by the topology of the network, and we cannot
change them unless we change the network topology.  Most of the current
networks' topology ensure that the internal representations are mixed
distributed representations (as I pointed out several days ago).  Their
working mechanisms are still a black-box.
 
Without changing the topology of the network, what we can choose and
select are the external representations only.  They should not be neglected.
 
>Zillions of issues remain unaddressed by this formulation too, especially
>those of consistent measurement. I feel that each domain and situation
>will have to supply its own specifics.
 
>I am not sure I understand Bo Xu's assertion that analog representations
>are "more natural". Certainly, to approximate a parabola (which I have
>done hundreds of times with different neural nets) would imply using an
>analog representation, but it is not clear if that is so natural for
>classifying apples and pears. Using different analog values to indicate
>intra-class variations is reasonable and, under specific circumstances,
>might even be provably better than a binary representation. But I would
>be very hesitant to generalize over all possible circumstances. In any
>case, a global characterization of distributed representation should depend
>of specifics only for details, and should apply to both discrete and analog
>representations.
 
It's true that there will be zillions of issues in practical applications.
However, it's also due to this fact, it will be very difficult (if not
impossible) to study all these zillions issues first before drawing some
conclusions.  Some generalizations based on limited studies are probably
necessary and of help when facing such a situation.
 
I want to thank Ali Minai for his comments.  All of his comments are very
valuable and thought-stimulating.
 
 
Bo Xu
Indiana University
ITGT500 at INDYCMS.BITNET

From aam9n at hagar2.acc.Virginia.EDU  Mon Jun 24 22:29:34 1991
From: aam9n at hagar2.acc.Virginia.EDU (Ali Ahmad Minai)
Date: Mon, 24 Jun 91 22:29:34 EDT
Subject: Distributed Representations
Message-ID: <9106250229.AA00528@hagar2.acc.Virginia.EDU>

This is in response to Bo Xu's last posting regarding distributed
representations. I think one of the problems is a basic incompatibility
in our notions of "representations" and where they exist. I would like to
clarify my earlier posting somewhat on this point.
 
I wrote:

>>Speaking of which layers to apply the definition to, I think that in a
>>feed-forward associative network (analog or binary), the hidden neurons
>>(or all the weights) are the representational units. The input neurons
>>merely distribute the prior part of the association, and the output neurons
>>merely produce the posterior part. The latter are thus a "recovery mechanism"
>>designed to "decode" the distributed representation of the hidden units and
>>recover the "original" item. Of course, in a heteroassociative system, the
>>"recovered original" is not the same as the "stored original". I realize that
>>this is stretching the definition of "representation", but it seems quite
>>natural to me.

To which Bo replied:
 
>I think according to the criterion of where representations exist, the
>representations can be classified into two different types:
> 
>(1). External representations ---- The representations existed at the
>         interface layers (input and/or output layers).  They are
>         responsible for the information transmission between the network
>         and the outside world (coding the input information at the input
>         layer and decoding the output information at the output layer).
> 
>(2). Internal representations ---- The representations existed at the
>         hidden layers.  These representations are used to encode the
>         mappings from the input field to the output field. The mappings
>         are the core of the neural net.
> 
>If I understand correctly, Ali Minai is referring to the internal
>representations only, and neglect the external representations.  The internal
>representations are very important representations.  However, these
>representations are determined by the topology of the network, and we cannot
>change them unless we change the network topology.  Most of the current
>networks' topology ensure that the internal representations are mixed
>distributed representations (as I pointed out several days ago).  Their
>working mechanisms are still a black-box.
> 
>Without changing the topology of the network, what we can choose and
>select are the external representations only.  They should not be neglected.

First, let me state what I meant by the "stored" and "recovered"
representations in the heteroassociative case. We can see the process
of the heteroassociation of an input vector U and output vector V in
a feed-forward network as a process of encoding a representation of
the vector UV over the hidden units of the network. This is what I call
"storage". There is a special requirement here that, given U, a
mechanism should be able to produce V over the output units, thus
"completing the pattern". The process of doing this is what I call
"recovery" (or "recall"). The way I see it (and I believe most other
connectionists too) is that the representational part of the network
consists of its "internals" --- either the weights, or the hidden units.
Far from being uncontrollable, as Bo Xu states, these are *precisely* the
things that we *do* control --- not in a micro sense, but through complex
global schemes such as training algorithms. The prior to be stored, which
Bo takes to be the representation, is, to me, just a given that has been
through some unspecified preprocessing. It is the "object" to be
represented (though I agree that all objects are themselves representations).

From rosauer at ira.uka.de  Tue Jun 25 14:27:57 1991
From: rosauer at ira.uka.de (Bernd Rosauer)
Date: Tue, 25 Jun 91 14:27:57 MET DST
Subject: genetic algorithms + neural networks
Message-ID: <mailman.420.1149540224.24850.connectionists@cs.cmu.edu>


I am interested in any kind of combination of genetic algorithms
and neural network training. I am aware of the papers presented at
	* Connectionist Models Summer School, 1990
	* First International Workshop on Parallel Problem Solving
	  from Nature, 1990
	* Third International Conference on Genetic Algorithms, 1989
	* Advances in Neural Information Processing Systems 2, 1989.
Please, let me know if there is any further work on that topic.
Post to <rosauer at ira.uka.de>, so I will summarize here.

Thanks a lot
			
               Bernd


From stork at GUALALA.CRC.RICOH.COM  Mon Jun 24 20:36:49 1991
From: stork at GUALALA.CRC.RICOH.COM (David Stork)
Date: Mon, 24 Jun 91 17:36:49 -0700
Subject: Job offer
Message-ID: <9106250036.AA11456@cache.CRC.Ricoh.Com>


The Ricoh California Research Center has an oppening for a staff programmer
or researcher in neural networks and connectionism.  This opening is for a
B.S. or possibly M.S.-level graduate in Physics, Computer Science, Math,
Electrical Engineering, Cognitive Science, Psychology, and related topics.
A background in some hardware design is a plus.

The Ricoh California Research Center is located in Menlo Park, about one
mile from Stanford University.  

Contact:

Dr. David G. Stork
Ricoh California Research Center
2882 Sand Hill Road #115
Menlo Park, CA 94025-7022
stork at crc.ricoh.com

From issnnet at park.bu.edu  Tue Jun 25 15:39:29 1991
From: issnnet at park.bu.edu (issnnet@park.bu.edu)
Date: Tue, 25 Jun 91 15:39:29 -0400
Subject: Call For Votes: comp.org.issnnet
Message-ID: <9106251939.AA04607@copley.bu.edu>


			    CALL FOR VOTES
			   ----------------

GROUP NAME: 	comp.org.issnnet

STATUS:		unmoderated

CHARTER:	The newsgroup shall serve as a medium for discussions
		pertaining to the International Student Society for
		Neural Networks (ISSNNet), Inc., and to its activities
		and programs as they pertain to the role of students
		in the field of neural networks. Details were posted
		in the REQUEST FOR DISCUSSION, and can be requested
		from <issnnet at park.bu.edu>.

VOTING PERIOD: 	JUNE 25 - JULY 25, 1991
				   
******************************************************************************
			    VOTING PROCESS

If you wish to vote for or against the creation of comp.org.issnnet,
please send your vote to:
				   
			 issnnet at park.bu.edu

To facilitate collection and sorting of votes, please include one of
these lines in your "subject:" entry:


If you favor creation of comp.org.issnnet, your subject should read:

	YES - comp.org.issnnet


If you DO NOT favor creation of comp.org.issnnet, use the subject:

	NO - comp.org.issnnet
				   

     YOUR VOTE ONLY COUNTS IF SENT DIRECTLY TO THE ABOVE ADDRESS.

-----------------------------------------------------------------------

For more information, please send e-mail to issnnet at park.bu.edu (ARPANET)
write to:

	ISSNNet, Inc.
	PO Box 557, New Town Br.
	Boston, MA 02258   USA

ISSNNet, Inc. is a non-profit corporation in the Commonwealth of
Massachusetts. 

NOTE -- NEW SURFACE ADDRESS:

ISSNNet, Inc.
P.O. Box 15661
Boston, MA  02215  USA

From koch at CitIago.Bitnet  Thu Jun 27 06:12:08 1991
From: koch at CitIago.Bitnet (Christof Koch)
Date: Thu, 27 Jun 91 03:12:08 PDT
Subject: Phase-locking without oscillations
Message-ID: <910627031202.20402f6a@Iago.Caltech.Edu>

The following paper  is available by anyonymous  FTP  from Ohio  State
University from pub/neuroprose. The file is called  "koch.syncron.ps.Z".
 
 
                        A SIMPLE NETWORK SHOWING BURST
                SYNCHRONIZATION WITHOUT FREQUENCY-LOCKING
 
 
                        Christof Koch and Heinz Schuster
 
 
ABSTRACT: The  dynamic  behavior of  a network  model   consisting  of
all-to-all excitatory coupled binary neurons with global inhibition is
studied analytically  and  numerically. It  is shown that   for random
input signals,  the output   of  the network consists of  synchronized
bursts  with  apparently random  intermissions  of noisy  activity. We
introduce the  fraction of simultaneously firing  neurons as a measure
for synchrony  and  prove   that  its temporal  correlation   function
displays,  besides a delta peak  at  zero indicating random processes,
strongly  damped  oscillations.  Our results suggest that  synchronous
bursts  can be  generated   by a  simple   neuronal architecture which
amplifies incoming coincident signals. This synchronization process is
accompanied by damped  oscillations which,  by themselves, however, do
not play any constructive role in this and can therefore be considered
to be an epiphenomenon.
 
Key words: neuronal networks / stochastic activity / burst
synchronization / phase-locking / oscillations
 
 
For comments, send e-mail to koch at iago.caltech.edu.
 
Christof
 
 
P.S. And this is how you can FTP and print the file:
 
 
              unix> ftp cheops.cis.ohio-state.edu (or 128.146.8.62)
              Name: anonymous
              Password: neuron
              ftp> cd pub/neuroprose (actually, cd neuroprose)
              ftp> binary
              ftp> get koch.syncron.ps.Z
              ftp> quit
              unix> uncompress koch.syncron.ps.Z
              unix> lpr koch.syncron.ps
 
              Read and be illuminated.
 
 
From nowlan at helmholtz.sdsc.edu  Thu Jun 27 14:38:58 1991
From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan)
Date: Thu, 27 Jun 91 11:38:58 MST
Subject: Thesis/TR available
Message-ID: <9106271838.AA27191@bose>


The following technical report version of my thesis is now available from
the School of Computer Science, Carnegie Mellon University:

-------------------------------------------------------------------------------

			Soft Competitive Adaptation:
		    Neural Network Learning Algorithms 
		   based on Fitting Statistical Mixtures

   	                      CMU-CS-91-126

  			    Steven J. Nowlan
			School of Computer Science
			Carnegie Mellon University


			       ABSTRACT

In this thesis, we consider learning algorithms for neural networks which are
based on fitting a mixture probability density to a set of data.

We begin with an unsupervised algorithm which is an
alternative to the classical winner-take-all competitive algorithms. Rather
than updating only the parameters of the ``winner'' on each case, the
parameters of all competitors are updated in proportion to their relative
responsibility for the case.
Use of such a ``soft'' competitive algorithm is shown to give better
performance than the more traditional algorithms, with little additional cost.

We then consider a supervised modular architecture in which a number
of simple ``expert'' networks compete to solve distinct pieces of a large
task. A soft competitive mechanism is used to determine how much an expert
learns on a case, based on how well the expert performs relative to the other
expert networks. At the same time, a separate gating network learns to weight
the output of each expert according to a prediction of its relative performance
based on the input to the system.
Experiments on a number of tasks illustrate that this architecture is capable
of uncovering interesting task decompositions and of generalizing better than a
single network with small training sets.

Finally, we consider learning algorithms in which we assume that the actual
output of the network should fall into one of a small number of classes or
clusters. The objective of learning is to make the variance of these classes as
small as possible.
In the classical decision-directed algorithm, we decide that an
output belongs to the class it is closest to and minimize the squared distance
between the output and the center (mean) of this closest class. In the
``soft'' version of this algorithm, we minimize the squared
distance between the actual output and a weighted average of the means of all
of the classes. The weighting factors are the relative probability that
the output belongs to each class. This idea may also be used to model the
weights of a network, to produce networks which generalize better from small
training sets.

-------------------------------------------------------------------------------

Unfortunately there is NOT an electronic version of this TR. Copies may be
ordered by sending a request for TR CMU-CS-91-126 to:

Computer Science Documentation
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA
15213
USA

There will be a charge of $10.00 U.S. for orders from the U.S., Canada or
Mexico and $15.00 U.S. for overseas orders to cover copying and mailing
costs (the TR is 314 pages in length). Checks and money orders should
be made payable to Carnegie Mellon University. Note that if your institution
is part of the Carnegie Mellon Technical Report Exchange Program there will
be NO charge for this TR.

REQUESTS SENT DIRECTLY TO MY E-MAIL ADDRESS WILL BE FILED IN /dev/null.


		- Steve

(P.S. Please note my new e-mail address is nowlan at helmholtz.sdsc.edu).


------- End of Forwarded Message


From D.M.Shumsheruddin at computer-science.birmingham.ac.uk  Thu Jun 27 06:06:51 1991
From: D.M.Shumsheruddin at computer-science.birmingham.ac.uk (Dean Shumsheruddin)
Date: Thu, 27 Jun 91 11:06:51 +0100
Subject: Request for references on navigation
Message-ID: <961.9106271006@christopher-robin.cs.bham.ac.uk>

I am looking for references to work on neural nets for navigation in
graph-structured environments.
I've already found the papers by Pomerleau and Bachrach in NIPS 3.
I would greatly appreciate information about related work. 
If there is sufficient interest I'll post a summary to the list.

Dean Shumsheruddin 
University of Birmingham, UK
dms at cs.bham.ac.uk

From russ at oceanus.mitre.org  Fri Jun 28 10:47:09 1991
From: russ at oceanus.mitre.org (Russell Leighton)
Date: Fri, 28 Jun 91 10:47:09 EDT
Subject: Aspirin/MIGRAINES v4.0 Users
Message-ID: <9106281447.AA13459@oceanus.mitre.org>

                        Aspirin/MIGRAINES v4.0 Users

Could those groups presently using the Aspirin/MIGRAINES v4.0 
neural network simulator from MITRE please reply to this
message. A brief description of your motivation for
using this software would be useful but not necessary.

We are compiling a list of users so that we may more easily
distribute the next release of software (Aspirin/MIGRAINES v5.0).

Thank you.

Russell Leighton

INTERNET: russ at dash.mitre.org

Russell Leighton
MITRE Signal Processing Lab
7525 Colshire Dr.
McLean, Va. 22102
USA


From karit at spine.hut.fi  Mon Jun  3 14:07:02 1991
From: karit at spine.hut.fi (Kari Torkkola)
Date: Mon, 3 Jun 91 14:07:02 DST
Subject: Research positions in speech and image processing
Message-ID: <9106031107.AA08981@spine.hut.fi.hut.fi>


                  RESEARCH POSITIONS AVAILABLE

     The newly created "Institut Dalle Molle d'Intelligence Arti-
ficielle  Perceptive"  (IDIAP)  in  Martigny Switzerland seeks to
hire qualified researchers in the areas of speech recognition and
image  manipulation.   Candidates  should  be able to conduct in-
dependent research in a UNIX environment on the  basis  of  solid
theoretical and applied knowledge.  Salaries will be aligned with
those offered by the Swiss government for  equivalent  positions.
Laboratories  are  now  being  established in the newly renovated
building that houses the  Institute,  and  international  network
connections  will  soon be in place.  Researchers are expected to
begin activity during the academic year 1991-1992.

     IDIAP is the third institute of artificial intelligence sup-
ported by the Dalle Molle Foundation, the others being ISSCO (at-
tached to the University of Geneva) and IDSIA  (situated  in  Lu-
gano).   The new institute will maintain close contact with these
latter centers as  well  as  with  the  Polytechnical  School  of
Lausanne and the University of Geneva.

     To apply for a research position at  IDIAP,  please  send  a
curriculum vita and technical reports to:

                   Daniel Osherson, Directeur
                              IDIAP
                        Case Postale 609
                        CH-1920 Martigny
                           Switzerland

     For further information by e-mail, contact:

                    osherson at disuns2.epfl.ch


From issnnet at park.bu.edu  Mon Jun  3 11:27:36 1991
From: issnnet at park.bu.edu (issnnet@park.bu.edu)
Date: Mon, 3 Jun 91 11:27:36 -0400
Subject: RFD: comp.org.issnnet
Message-ID: <9106031527.AA06005@copley.bu.edu>


                        REQUEST FOR DISCUSSION
			----------------------

GROUP NAME: 	comp.org.issnnet

STATUS:		unmoderated

CHARTER:	The newsgroup shall serve as a medium for discussions
		pertaining to the International Student Society for
		Neural Networks (ISSNNet), Inc., and to its activities
		and programs as they pertain to the role of students
		in the field of neural networks. See details below.

TARGET VOTING DATE: 	JUNE 20 - JULY 20, 1991
				   
******************************************************************************
			     PLEASE NOTE

        In agreement with USENET newsgroup guidelines for the creation
	of new newsgroups, this discussion period will continue until
	June 21, at which time voting will begin if deemed
	appropriate. ALL DISCUSSION SHOULD TAKE PLACE ON THE NEWSGROUP

			     "news.groups"
			     
        If you do not have access to USENET newsgroups but wish to
	contribute to the discussion, send your comments to:
			 issnnet at park.bu.edu
        specifying whether you would like your message relayed to
	news.groups. A call for votes will be made to the same
	newsgroups and mailing lists that originally received this
	message.
				   
    PLEASE DO NOT SEND REPLIES TO THIS MAILING LIST OR NEWSGROUP DIRECTLY!

        A call for votes will be broadcast in a timely fashion. Please
	do not send votes until then.

******************************************************************************

BACKGROUND AND INFORMATION:

   The purpose of the International Student Society for Neural
Networks (ISSNNet) is to (1) provide a means of exchanging information
among students and young professionals within the area of Neural
Networks; (2) create an opportunity for interaction between students
and professionals from academia and industry; (3) encourage support
>from academia and industry for the advancement of students in the area
of Neural Networks; (4) insure that the interest of all students in
the area of Neural Networks is taken into consideration by other
societies and institutions involved with Neural Networks; and (5) to
foster a spirit of international and interdisciplinary kinship among
students as the study of Neural Networks develops into a
self-contained discipline.

   Since its creation one year ago, ISSNNet has grown to over 300
members in more than 20 countries around the world. One of the biggest
problems we have faced thus far is to efficiently communicate with all
the members. To this end, a network of "governors" has been created.
Each governor is in charge of distributing information (such as our
newsletter) to all local members, collect dues, notify local members
of relevant activities, etc.

   However, even this system has problems. Communication to a possibly
very large number of members relies entirely on one individual, and
given the typically erratic schedule of a student, it is often
difficult to insure prompt and timely distribution to all members.

   More to the point, up until this time all governors have been
contacting a single person (yours truly), and that has been a problem.
Regular discussions on the society and related matters become very
difficult when routed through individuals in this fashion.

   The newsgroup would be primarily dedicated to discussion of items
pertaining to the society. We are about to launch a massive call for
nominations, in the hope that more students will step forward and take
a leading role in the continued success of the society.

   In addition, ISSNNet is involved with a number of projects, many of
which require extensive electronic mail discussions. For example, we
are developing a sponsorship program for students presenting papers at
NNet conferences. This alone has generated at least 100 mail messages
to the ISSNNet account, most of which could have been answered by two
or three "generic" postings.

   We have refrained from using some of the existing mailing lists and
USENET newsgroups that deal with NNets because of the non-technical
nature of our issues. In addition to messages that are strictly
society-related, we feel that there are many messages posted to these
existing bulletin boards for which our newsgroup would be a better
forum. Here is a list of topics that frequently come up, which would
be handled in comp.org.issnnet as part of our "sponsored" programs:
				   
		"What graduate school should I go to?"

Last year, ISSNNet compiled a list of graduate programs around the
world. The list will be updated later this year to include a large
number of new degree programs around the world.

				   
		      "What jobs are available?"

We asked companies that attended last year's IJCNN-San_Diego and
INNC-Paris conferences to fill out a questionnaire on employment
opportunities for NNet students.

				   
	   "Does anyone have such-and-such NNet simulator?"

Many students have put together computer simulations of NNet paradigms
and these could be shared by people on this group.

				   
		 "When is the next IJCNN conference?"

We have had a booth at past NNet conferences, and hope to continue
doing this for more and more international and local meetings. We
often have informal get-togethers at these conferences, where students
and others have the opportunity to meet.


-----------------------------------------------------------------------

For more information, please send e-mail to issnnet at park.bu.edu (ARPANET)
write to:

	ISSNNet, Inc.
	PO Box 557, New Town Br.
	Boston, MA 02258   USA

ISSNNet, Inc. is a non-profit corporation in the Commonwealth of
Massachusetts. 

ISSNNet, Inc.
P.O. Box 557, New Town Branch
Boston, MA  02258  USA

From dcp+ at cs.cmu.edu  Mon Jun  3 15:51:50 1991
From: dcp+ at cs.cmu.edu (David Plaut)
Date: Mon, 03 Jun 91 15:51:50 EDT
Subject: Preprint: Effects of Word Abstractness in a Connectionist Model of Deep Dyslexia 
Message-ID: <1831.675978710@DWEEB.BOLTZ.CS.CMU.EDU>

The following paper is available in the neuroprose archive as
plaut.cogsci91.ps.Z.  It will appear in this year's Cognitive Science
Conference proceedings.  A much longer paper presenting a wide range of related
work is in preparation and will be announced shortly.

    Effects of Word Abstractness in a Connectionist Model of Deep Dyslexia

	David C. Plaut			Tim Shallice              
	School of Computer Science	Department of Psychology  
	Carnegie Mellon University	University College, London
	dcp at cs.cmu.edu			ucjtsts at ucl.ac.uk         

Deep dyslexics are patients with neurological damage who exhibit a variety of
symptoms in oral reading, including semantic, visual and morphological effects
in their errors, a part-of-speech effect, and better performance on concrete
than abstract words.  Extending work by Hinton & Shallice (1991), we develop a
recurrent connectionist network that pronounces both concrete and abstract
words via their semantics, defined so that abstract words have fewer semantic
features.  The behavior of this network under a variety of ``lesions''
reproduces the main effects of abstractness on deep dyslexic reading: better
correct performance for concrete words, a tendency for error responses to be
more concrete than stimuli, and a higher proportion of visual errors in
response to abstract words.  Surprisingly, severe damage within the semantic
system yields better performance on *abstract* words, reminiscent of CAV, the
single, enigmatic patient with ``concrete word dyslexia.''

To retrieve this from the neuroprose archive type the following:
unix> ftp 128.146.8.62
Name: anonymous
Password: neuron
ftp> binary
ftp> cd pub/neuroprose
ftp> get plaut.cogsci91.ps.Z
ftp> quit
unix> zcat plaut.cogsci91.ps.Z | lpr

-------------------------------------------------------------------------------
David Plaut				dcp+ at cs.cmu.edu
School of Computer Science		412/268-8102
Carnegie Mellon University
Pittsburgh, PA  15213-3890

From mjolsness-eric at CS.YALE.EDU  Wed Jun  5 15:50:55 1991
From: mjolsness-eric at CS.YALE.EDU (Eric Mjolsness)
Date: Wed, 5 Jun 91 15:50:55 EDT
Subject: TR: Bayesian Inference on Visual Grammars by NNs that Optimize
Message-ID: <9106051951.AA25379@NEBULA.SYSTEMSZ.CS.YALE.EDU>

The following paper is available in the neuroprose archive as
mjolsness.grammar.ps.Z:


	    Bayesian Inference on Visual Grammars
	        by Neural Nets that Optimize


			Eric Mjolsness
        	Department of Computer Science
        		Yale University
        	   New Haven, CT 06520-2158

			YALEU/DCS/TR854
			    May 1991

Abstract:

We exhibit a systematic way to derive neural nets for vision
problems.  It involves formulating a vision problem as Bayesian
inference or decision on a comprehensive model of the visual domain
given by a probabilistic {\it grammar}.  A key feature of this
grammar is the way in which it eliminates model information, such
as object labels, as it produces an image; correspondance problems
and other noise removal tasks result.  The neural nets that arise
most directly are generalized assignment networks.  Also there are
transformations which naturally yield improved algorithms such as
correlation matching in scale space and the Frameville neural nets
for high-level vision.  Deterministic annealing provides an effective
optimization dynamics.  The grammatical method of  neural net design
allows domain knowledge to enter from all levels of the grammar,
including ``abstract'' levels remote from the final image data, and
may permit new kinds of learning as well.


The paper is 56 pages long.

To get the file from neuroprose:

              unix> ftp cheops.cis.ohio-state.edu (or 128.146.8.62)
              Name: anonymous
              Password: neuron
              ftp> cd pub/neuroprose
              ftp> binary
              ftp> get mjolsness.grammar.ps.Z
              ftp> quit
              unix> uncompress mjolsness.grammar.ps.Z
              unix> lpr mjolsness.grammar.ps (or however you print postscript)

-Eric

-------


From jm2z+ at andrew.cmu.edu  Thu Jun  6 13:47:19 1991
From: jm2z+ at andrew.cmu.edu (Javier Movellan)
Date: Thu,  6 Jun 91 13:47:19 -0400 (EDT)
Subject: Are they really worth the effort ?
Message-ID: <4cHbIbi00Uh_M2RVlH@andrew.cmu.edu>

I'd like to have a debate about the advantages of distributed over local
representations. 

I mean sure, distributed representations are great for they work in 2^n
instead of n space, they degrade gracefully and all these PDP Bible type
of things. But ...  are they really that good ? For one thing they make
our life awfully difficult in terms of undertanding and manipulating
them .. .Are they really worth the effort ? Do you have concrete
examples in your work where they did a better job than local
representations ?


Javier


From ogs0%dixie.dnet at gte.com  Thu Jun  6 17:21:22 1991
From: ogs0%dixie.dnet at gte.com (Oliver G. Selfridge)
Date: Thu, 6 Jun 91 17:21:22 -0400
Subject: Warren McCulloch's widow
Message-ID: <9106062121.AA05259@bunny.gte.com>

I sadly announce that Rook McCulloch, widow to Warren McCulloch, dies
last night at the age of 92. Warren himself, with Walter Pitts wrote
the revolutionary introduction to neural nets in the middle 40s in two
well-knwon papers. Rook maintained a bright aned contributory life up
to the end and we will all miss her.

                      Oliver Selfridge

From jm2z+ at andrew.cmu.edu  Thu Jun  6 18:19:45 1991
From: jm2z+ at andrew.cmu.edu (Javier Movellan)
Date: Thu,  6 Jun 91 18:19:45 -0400 (EDT)
Subject: are they worth the effort II
Message-ID: <4cHfI1q00WBK03HW0G@andrew.cmu.edu>

Please send your thoughts to connectionists so that we  all can be
instructed about the advantages of distributed representations.


By the way, I already got two responses that I will sumarize bellow. 

Response number one provided the following arguments:

1- Brain uses distributed representations. He cites Lashley's (1929)
experiments where rats show graceful performance degradation when they
were partially deprived of their cortex.

2- Distributed representations are more resistant to degradation. He
claims this may have military implications (systems resistant to enemy
fire type of thing). 

[ OK does anybody out ther have data showing that distributed
representations are more noise resistant than  local representations ? I
mean one can always clone the local representations and get noise
resistance that way -Javier ]


3- He claims distributed representations performed very well in his
research projects.

[ Unfortunately he confuses distributed representations with
backpropagation (BP). It is BP that worked well. It is always possible
to force BP to develop local representations and perhaps it would work
better that way. -Javier ]


Response number two claims that *very* distributed representations are
probably the wrong way to go. He said  "Sligthly" distributed
representations (like the ones used in Kruschke's ALCOVE model) are
better. Unfortunately he does not provide any data supporting this point.


I just got response # 3, which claims that distributed representations
performed consistently better than local in the NETtalk domain and in
isolated letter speech.

[ Tom, could you send me some references ? Thanks - Javier ]


-- Javier


From tgd at turing.CS.ORST.EDU  Thu Jun  6 18:11:44 1991
From: tgd at turing.CS.ORST.EDU (Tom Dietterich)
Date: Thu, 6 Jun 91 15:11:44 PDT
Subject: Are they really worth the effort ?
In-Reply-To: Javier Movellan's message of Thu,  6 Jun 91 13:47:19 -0400 (EDT) <4cHbIbi00Uh_M2RVlH@andrew.cmu.edu>
Message-ID: <9106062211.AA13213@turing.CS.ORST.EDU>

In my studies of error-correcting output codes, I found that these
codes---which are particularly neat distributed
representations---performed consistently better than local
representations in the NETtalk domain and in isolated letter speech
recognition. 

--Tom

Thomas G. Dietterich
Department of Computer Science
Dearborn Hall, 303
Oregon State University
Corvallis, OR 97331-3102
503-737-5559

From Nigel.Goddard at B.GP.CS.CMU.EDU  Thu Jun  6 19:06:13 1991
From: Nigel.Goddard at B.GP.CS.CMU.EDU (Nigel.Goddard@B.GP.CS.CMU.EDU)
Date: Thu, 6 Jun 91 19:06:13 EDT
Subject: distributed/local
Message-ID: <mailman.413.1149591215.29955.connectionists@cs.cmu.edu>


Both extremes are wrong for representing conceptual knowledge (i.e.,
one unit per concept versus all units participate in all concepts).
Disadvantages of extreme local include no tolerance to failure (neurons
die all the time), difficult to express nuance without impossibly large
numbers of units.  The big advantage is it is easy to see what is going
on, to design structures.  Disadvantages of extreme distributed include
crosstalk when more than one item is active and difficulty communicating
an active item from one part of the architecture to another (too many links
required).  The big advantages are fault-tolerance (graceful degradation)
and generalization.  The answer is something in between the extremes
(not that this is news to anyone), depending on what the task is.  Order
logn units per concept for an n-unit net might be a good place to start.  

Feldman has a TR discussing these issues in much more depth (TR 189,
"Neural Representation of Conceptual Knowledge", Computer Science Dpt,
Univ. Rochester, NY 14627).  Also published as a book chapter, I believe.


Nigel Goddard

From soller%asylum at cs.utah.edu  Thu Jun  6 22:54:43 1991
From: soller%asylum at cs.utah.edu (Jerome Soller)
Date: Thu, 6 Jun 91 20:54:43 -0600
Subject: Request for Information on Cognitive Science Curriculum
Message-ID: <9106070254.AA24372@asylum.utah.edu>

	At the University of Utah, we are in the process of putting together
a curriculum for Cognitive Science degrees at the undergraduate and graduate
level.  This faculty/student initiative is being led by Dr. Dick Burgess of
Physiology.  We were wondering what classes and sequences are considered
to form the core of established Cognitive Science degree granting programs at 
graduate and undergraduate levels?


							Jerome Soller
							Department of C.S.
							U. of Utah
							soller at asylum.utah.edu

From slehar at park.bu.edu  Fri Jun  7 08:56:58 1991
From: slehar at park.bu.edu (Steve Lehar)
Date: Fri, 7 Jun 91 08:56:58 -0400
Subject: Distributed Representations
In-Reply-To: connectionists@c.cs.cmu.edu's message of 7 Jun 91 09:39:59 GM
Message-ID: <9106071256.AA15832@park.bu.edu>


I think the essence of this debate is in the nature of the input data.
If your  input is  boolean in  nature  and reliably correct,  then the
processing  performed  on it can be  similarly boolean and  sequential
with a great saving in time and space.  It is when the input is fuzzy,
ambiguous and  distributed that  the  sequential logical  boolean type
of processing runs into problems.

A perfect example is  image understanding.  No  single local region of
the image  is sufficient   for  reliable  identification.    Try  this
yourself- punch a little hole in a big piece of  paper and lay it on a
randomly selected photograph    and see  how  much  you  can recognize
through that one local aperture.  You have no way of  knowing what the
local feature is without the global context, but  how do  you know the
global context  without   building   it up  out of   the local pieces?
Studies of  the visual system  suggest that  in nature this problem is
solved by a parallel optimization of all  the local pieces in parallel
together with many levels  of  global representations, such  that  the
final   interpretation is a   kind of relaxation due    to all  of the
constraints felt at  all of  the different representations all  at the
same time.   This is the basic  idea of Grossberg's BCS/FCS algorithm,
and is in contrast to a more  sequential "AI" approach where the local
pieces are each evaluated independantly, and the  results passed on to
the next stage.  I would  claim that such an  approach can never  work
reliably with natural images.

I would  be happy to  provide more  information on the BCS/FCS  and my
implementations of it to interested parties.

From hendler at cs.UMD.EDU  Fri Jun  7 10:40:58 1991
From: hendler at cs.UMD.EDU (Jim Hendler)
Date: Fri, 7 Jun 91 10:40:58 -0400
Subject: distributed/local
In-Reply-To: Nigel.Goddard@B.GP.CS.CMU.EDU's message of Thu, 6 Jun 91 19:06:13 EDT <9106071428.AA09615@mimsy.UMD.EDU>
Message-ID: <9106071440.AA23704@dormouse.cs.UMD.EDU>

For what it's worth, some preliminary results showing a well-behaved
relationship between local and distributed reps are in paper I had at
the NIPS conf (Advances in Neur. Info. Proc. Sys I - Touretzky (ed),
1989, p.553).  I have followed up on this work a little, with a better
anaylsis of the relationship described in last year's Cog. Sci.
Conference, but the work is pretty preliminary.  I've pretty much
stopped pursuing this actively, but anyone wanting to pick up on it is
welcome...  -J. Hendler

From hu at eceserv0.ece.wisc.edu  Fri Jun  7 11:22:28 1991
From: hu at eceserv0.ece.wisc.edu (Yu Hu)
Date: Fri, 7 Jun 91 10:22:28 -0500
Subject: What is distributed/local representation
Message-ID: <9106071522.AA18585@eceserv0.ece.wisc.edu>

While lots of buzzzzz words such as graceful degradation, appear in the
discussion, may I ask a rather naive question:  

Shall someone give a mathematically (or .....ly) sound definition of
distribution and local representation (of what?) then we proceed to 
discuss them?  

Suppose the representations are for data vector in an N-dimensional 
space.  Is Distributed representation referred to data with many non-zero
elements, and local representation to the opposite?  If not, what are they?


Regards,

Yu Hen Hu
Department of Electrical and Computer Engr.               (608)262-6724(phone)
Univ. of Wisconsin - Madison                              (608)262-1267(fax)
1415 Johnson Drive                                        hu at engr.wisc.edu 
Madison, WI 53706-1691
U.S.A.

From indurkhy at paul.rutgers.edu  Fri Jun  7 12:10:42 1991
From: indurkhy at paul.rutgers.edu (Nitin Indurkhya)
Date: Fri, 7 Jun 91 12:10:42 EDT
Subject: Are they really worth the effort ?
Message-ID: <9106071610.AA17674@paul.rutgers.edu>

>In my studies of error-correcting output codes, I found that these
>codes---which are particularly neat distributed
>representations---performed consistently better than local
>representations in the NETtalk domain and in isolated letter speech
>recognition. 

in our own studies with the NETtalk dataset that you gave us, we found
that local representations were competitive. the results are reported in
"reduced complexity rule induction" by weiss and indurkhya (to be presented
at ijcai-91).

--nitin

From lina at mimosa.physio.nwu.edu  Fri Jun  7 12:47:29 1991
From: lina at mimosa.physio.nwu.edu (Lina Massone)
Date: Fri, 7 Jun 91 11:47:29 CDT
Subject: No subject
Message-ID: <9106071647.AA05357@mimosa.physio.nwu.edu>

About distributed representations

The concept of distributed representation is intimately related to the
concept of redundancy. The central nervous system makes a great use of
redundant representations in the way receptive/projective fields are
organized.
I do not agree on the fact that distributed/redundant
representations are primarily a protection against possible injuries
or failures of the components; I'd rather consider that as a useful
side-effect. To me the main values of redundancy are: greater sensitivity,
higher resolution, improvement of signal-to-noise ratio, reduction of
demand for stability of performance and for precision in ontogenesis.
In general a comparison between the activity of a population of neurons and
the activity of a single neuron will show that the population is sensitive
to lower stimulus intensities, smaller increments, briefer events, higher
frequencies, wider dynamic ranges than a single neuron and is less
disturbed by independent drift and instability.
As far as the amount of redundancy, there is some physiological evidence
that the coding of information in the CNS is a compromise between fully
distributed and fully localized. Given that the available number of neurons
is limited, an entity (a piece of information) cannot be represented over a
very large population of neurons that overlaps almost completely with the
population activated by a different entity; this would cause a high degree
of interference and would correspond to a very inefficient memory storage
system. To maintain some degree of orthogonality within a limited number of
neurons, the CNS makes the number of neurons - active for each stimulus -
low. In other words each entity is represented across an ensemble of
neurons but the ensemble is of limited size. 
As far as coarse coding, Ken Laws raised the issue of matching the
structure of data with the code. I agree on that. The CNS 
does that by having neighboring receptors stimulated by neighboring fractions
of the impinging world, i.e. by means of a topological principle. An example
of the computational advantages of this idea for control problems is given in

L. Massone, E. Bizzi (1990) On the role of input representations in
sensorimotor mapping, Proc. IJCNN, Washington D.C.


Lina Massone

From tgd at turing.cs.orst.edu  Fri Jun  7 12:45:26 1991
From: tgd at turing.cs.orst.edu (Tom Dietterich)
Date: Fri, 7 Jun 91 09:45:26 PDT
Subject: Distributed Representations
In-Reply-To: Ken Laws's message of Thu 6 Jun 91 22:02:27-PDT <676270947.0.LAWS@AI.SRI.COM>
Message-ID: <9106071645.AA16085@turing.CS.ORST.EDU>

   Date: Thu 6 Jun 91 22:02:27-PDT
   From: Ken Laws <LAWS at ai.sri.com>
   Mail-System-Version: <SUN-MM(229)+TOPSLIB(128)@AI.SRI.COM>


   I'm not sure this is the same concept, but there were several
   papers at the last IJCAI showing that neural networks worked
   better than decision trees.  The reason seemed to be that
   neural decisions depend on all the data all the time, whereas
   local decisions use only part of the data at one time.

This is not the same concept at all.  You are worrying about locality
in the input space, whereas distributed representations usually
concern (lack of) locality in the output space or in some intermediate
representation. I have applied decision trees to learn distributed
representations of output classes, and in all of my experiments,
the distributed representation performs better than learning either
one large tree (to make a k-way discrimination) or learning k separate
trees.  I believe this is because a distributed representation is able
to correct for errors made in learning any individual output unit.
The paper "dietterich.error-correcting.ps.Z" in the neuroprose archive
presents experimental support for this claim. 

   I've never put much stock in the military reliability claims.
   A bullet through the chip or its power supply will be a real
   challenge.  Noise tolerance is important, though, and I suspect
   that neural systems really are more tolerant.

It isn't a neural vs. non-neural issue:  distributed representations
are more redundant, and hence, more resistant to (local) damage.
Noise tolerance is also not a neural vs. non-neural issue.  To achieve
noise tolerance, you must control over-fitting.  There are many ways
to do this:  low-dimensional representations, smoothness assumptions,
minimum description length methods, cross-validation, etc.

   Terry Sejnowski's original NETtalk work has always bothered me.
   He used a neural network to set up a mapping from an input
   bit string to 27 output bits, if I recall.  I have never seen
   a "control" experiment showing similar results for 27 separate
   discriminant analyses, or for a single multivariate discriminant.
   I suspect that the results would be far better.  The wonder of
   the net was not that it worked so well, but that it worked
   at all.

I think you should perform these studies before you make such claims.
I myself doubt them very much, because the NETtalk task violates the
assumptions of discriminant analysis.  In my experience,
backpropagation works quite well on the NETtalk task.  We have found
that Wolpert's HERBIE (which is a kind of weighted 4-nearest-neighbor
method) and generalized radial basis functions do better than
backpropagation, but everything else we have tried does worse
(decision trees, perceptrons, Fringe).

   I have come to believe strongly in "coarse-coded" representations,
   which are somewhat distributed.  (I have no insight as to whether
   fully distributed representations might be even better.  I suspect
   that their power is similar to adding quadratic and higher-order
   terms to a standard statistical model.)  The real win in
   coarse coding occurs if the structure of the code models
   structure in the data source (or perhaps in the problem
   to be solved).

                                           -- Ken Laws

The real win in any problem comes from good modelling, of course.  But
since we can't guarantee a priori that our representations are good
models, it is important to develop ways for recovering from
inappropriate models.  I believe distributed representations provide
one such way.

--Tom Dietterich


From dhw at t13.Lanl.GOV  Fri Jun  7 14:31:42 1991
From: dhw at t13.Lanl.GOV (David Wolpert)
Date: Fri, 7 Jun 91 12:31:42 MDT
Subject: No subject
Message-ID: <9106071831.AA11289@t13.lanl.gov>


Javier Movellan wonders about the relative "advantages of distributed
over local representations". He asks of members of the net, "Do you
have concrete examples in your work where they did a better job than
local representations?

I have concrete examples in which they do worse - sometimes far worse.
See references below.


		David Wolpert (dhw at tweety.lanl.gov)


D. H. Wolpert, "A benchmark for how well neural nets generalize",
Biological Cybernetics, 61 (1989), 303-315.

D. H. Wolpert, "Constructing a generalizer superior to NETtalk via
a mathematical theory of generalization", Neural Networks, 3 (1990),
445-452.

D. H. Wolpert, "Improving the performance of generalizers via
time-series-like pre-processing of the learning set", Los Alamos
Report LA-UR-91-350, submitted to IEEE PAMI.


From kukich at flash.bellcore.com  Fri Jun  7 17:26:05 1991
From: kukich at flash.bellcore.com (Karen Kukich)
Date: Fri, 7 Jun 91 17:26:05 -0400
Subject: distributed vs. local encoding schemes
Message-ID: <9106072126.AA06750@flash.bellcore.com>

I ran some back-prop spelling correction experiments a few years ago
in which one of the control variables was the use of distributed vs.
local encoding schemes for both input and output representations. 
Local encodings schemes were clear winners in both speed of learning
and performance (correction accuracy for novel misspellings).

To clarify, a local output scheme was simply a 1-of-n vector (n=200)
where each node represented one word in the lexicon;  a "semi-local"
input scheme was a 15*30=450-unit vector where each 30-unit block 
locally encoded one letter in a word of up to 15 characters.  This 
positionally-encoded input scheme was thus local w.r.t individual
letters in a word but distributed w.r.t the whole word.  (Incidentally,
the nets took slightly longer to learn to correct the shift-variant
insertion and deletion errors, but they eventually learned them as
well as the shift-invariant substitution and transposition errors.) 
The distributed encoding schemes were m-distance lexicodes, where
m is the Hamming distance btwn codes.  Thus lexicode-1 is just a
binary number code.  I tried lexicodes of m=1,2,3 and 4 for both
output words and input letters.  Both speed of learning and correction 
accuracy improved linearly with increasing m.  These results were 
published in a paper that appeared in the U.S. Post Office Avanced 
Technology Conference in May of 1988.  My only interpretation of the 
results is that local encoding schemes simplify the learning task 
for nets; I'm convinced that distributed schemes are essential
for cognitive processes such as semantic representation at least,
due to the need for multi-dimensional semantic access and association.

As an epilog, I ran a few more experiments afterword that left
me with a small puzzle.  In the above experiments I had also found 
that performance improved as the number of hidden nodes increased up
to about n(=200) and then leveled off.  Afterwords, I tested the
local net with the 450-unit positionally-encoded input scheme and 
NO hidden nodes and found performance equal to or better than any net 
with a hidden layer and much faster learning.  But when I tried a 
shift-invariant input encoding scheme, in which misspellings were
encoded by a 420-unit vector representing letter bigrams and unigrams,
I found similarly good performance for nets with hidden layers but 
miserable performance for a net with no hidden layer.  Apparently, 
the positionally-encoded input scheme yields a set of linearly-
separable input classes but the shift-invariant scheme does not.
It's still not clear to me why this is?

Karen Kukich
kukich at bellcore.com

From ps_coltheart at vaxa.mqcc.mq.oz.au  Sat Jun  8 10:51:22 1991
From: ps_coltheart at vaxa.mqcc.mq.oz.au (Max Coltheart)
Date: Sat, 8 Jun 91 09:51:22 est
Subject: distributed representations
Message-ID: <9106072351.AA01618@macuni.mqcc.mq.oz.au>

The original posting about this mentioned the property of graceful degradation
as one of the virtues of systems that use distributed respresentations. In what
way is this a virtue? For nets that are doing some engineering job such as
character recognition, it would obviously be good if some damage or malfunction
didn't much affect the net's performance. But for nets that are meant to be
models of cognition, the hidden assumption seems to be that after brain damage
there is graceful degradation of cognitive processing, so the fact that nets
show graceful degradation too means they have promise for modelling cognition.

But where's the evidence that brain damage degrades cognition gracefully? That
is, the person just gets a little bit worse at a lot of things? Very commonly,
exactly the opposite happens - the person remains normal at almost all kinds
of cognitive processing, but some specific cognitive task suffers catastroph-
ically. No graceful degradation here.

I could give very many examples: I'll just give one (Semanza & Zettin, 
Cognitive Neuropsychology, 1988 5 711). This patient, after his stroke, had
impaired language, but this impairment was confined to language production
(comprehension was fine) and to the production of just one type of word: proper
nouns. He could understand proper nouns normally, but could produce almost none
whilst his production of other kinds of nouns was normal. What's graceful about
this degradation of cognition? 

If cognition does *not* degrade gracefully, and neural nets do, what does this
say about neural nets as models of cognition?

Max Coltheart

From dave at cogsci.indiana.edu  Fri Jun  7 22:03:50 1991
From: dave at cogsci.indiana.edu (David Chalmers)
Date: Fri, 7 Jun 91 21:03:50 EST
Subject: distributed reps
Message-ID: <mailman.414.1149591215.29955.connectionists@cs.cmu.edu>

Properties like damage resistance, graceful degradation, etc, are all nice,
useful, cognitively plausible possibilities, but I would have thought that
by far the most important property of distributed representation is the
potential for systematic processing.

Obviously ultra-local systems (every possible concept represented by an
arbitrary symbol) don't allow much systematic processing, as each symbol
has to be handled by its own special rule: e.g. <if "CAT" do this>, <if
"DOG" do that> (though things can be improved somewhat by connecting the
symbols up, as e.g. in a semantic network).  Things are much improved by
using compositional representations, as e.g. found in standard AI.  If
you represent many concepts by compounding the basic tokens, then certain
semantic properties can be reflected in internal structure -- e.g.
"LOVES(CAT, DOG)" and "LOVES(JOHN,BILL)" have relevantly similar internal
structures -- opening the door to processing these structures in systematic
ways.

Distributed representations just take this idea a step further.  One
sees the systematicity made possible by giving representations internal
structure as above, and says "why stop there?"  e.g. why not give every
representation internal structure (why should CATs and DOGs miss out?).
Compositional representations as above only represent a limited range of
semantic properties systematically in internal structure -- namely,
compositional properties.  All kinds of other semantic properties might be
fair game.  By moving to e.g. vectorial representation for every concept,
then e.g. the similarity structure of the semantic space can be reflected
in the similarity structure of the representational space, and so on.  And
it turns out that you can process compositional properties systematically
too (though not quite as easily).  The combination of a multi-dimensional
space with a large supply of possible non-linear operations seems to open
up a lot of possible kinds of systematic processing, essentially because
these operations can chop up the space in ways that standard operations on
compositional structures can't.

The proof is in the pudding, i.e. the kinds of systematic processing that
connectionist networks exhibit all the time.  Most obviously, automatic
generalization: new inputs are slotted into some representational form,
hopefully leading to reasonable behaviour from the network.  Similarly for
dealing with old inputs in new contexts.  By comparison, with ultra-local
representations, generalization is right out (except by assimilating new
inputs into an old category, e.g. by nearest neighbour methods).  Using
compositional representations, certain kinds of generalization are obviously
possible, as with decision trees.  These suffer a bit from having to deal
directly with the original input space, rather than developing a new
representational space as with dist reps: so you (a) don't get the very
useful capacity to take a representation that's developed and use it for
other purposes (e.g. as context for a recurrent network, or as input for
some new network), and (b) are likely to have problems on very large input
spaces (anyone using decision trees for vision?).  Both (a) and (b) suggest
that decision trees may be unlikely candidates for the backbone of a
cognitive architecture (conversely, the ability of connectionist networks
to transform one representational space into another is likely to be key
to their success as a cognitive architecture).  As for generalization
performance, that's an empirical matter, but the results of Dietterich etc
seem to indicate that decision trees don't do quite as well, presumably
because of the limited ways in which they can chop up a representational
space (nasty rectangular blocks vs groovy flexible curves).  There's far
too much else that could be said, so I'll stop here.

Dave Chalmers.

From tsejnowski at UCSD.EDU  Fri Jun  7 22:48:11 1991
From: tsejnowski at UCSD.EDU (Terry Sejnowski)
Date: Fri, 7 Jun 91 19:48:11 PDT
Subject: distributed/local
Message-ID: <9106080248.AA27620@sdbio2.UCSD.EDU>

A nice paper that compares ID3 decision trees with backprop
on NETtalk and other data sets:

Shavlik, J. W., Mooney, R. J., and Towell, G. G.
Symbolic and neural learning algorithms: An experimental
comparison (revised).
Univ. Wisconsin Dept Comp. Sci Tech Report #955
(to appear Machine Learning #6).

Overall, backprop performed slightly better than ID3 but took
longer to train.  Backprop was also more effective in using
distributed coding schemes for the inputs and outputs.
An error-correcting code, or even a random code, works
better than a local code or hand-crafted features.
(Ghulum Bakiri and Tom Dietterich reached the same conclusion).

The issue of the code developed by the hidden units is also an
interesting issue.  In NETtalk, the intermediate code was
semidistributed -- around 15% of the hidden units were
used to represent each letter-to-sound correspondence.
The vowels and the consonants were fairly well
segregated, arguing for local coding at a gross population level
(something seen in the brain) but distributed coding at the
level of single units (also observed in the brain).
The degree of coarseness clearly depends on
the grain of the problem.

In the original study Charlie Rosenberg and I showed that
backprop with hidden units outperformed perceptorons,
and hence 26 independent linear discriminants.  The NETtalk
database is available to anyone who wants to benchmark
their learning algorithm.  For ftp access contact

Scott.Fahlman at b.gp.cs.cmu.edu

Terry


From french at cogsci.indiana.edu  Sat Jun  8 00:39:11 1991
From: french at cogsci.indiana.edu (Bob French)
Date: Fri, 7 Jun 91 23:39:11 EST
Subject: semi-distributed representations
Message-ID: <mailman.415.1149591215.29955.connectionists@cs.cmu.edu>

One simultaneous advantage and disadvantage of fully 
distributed representations is that one representation 
will affect many others.  This phenomenon of interference
is what allows networks to generalize but it is also what leads
to the problem of catastrophic forgetting.

It is reasonable to suppose that the amount of interference 
in backpropagation networks is directly proportional to the amount
of overlap of representations in the hidden layer (the "overlap"
of two representations can be defined as the dot product of their
activation vectors).  The greater the overlap (i.e., the more
distributed the representations), the more the network will be
affected by catastrophic forgetting, but the better it will be at
generalizing.  The less the overlap (i.e., the more local
the representations), the less the network will be affected by
catastrophic forgetting, but the worse it will be at
generalizing.  

If we want nets that do not need to be retrained completely
when new data is presented to them but still retain their
ability to generalize, we must therefore use representations
that are neither too local, nor too distributed, what I have
called "semi-distributed" representations.  

I have a paper to appear in CogSci Proceedings 1991 that proposes 
this relationship between the amount of overlap of representations 
in the hidden layer and catastrophic forgetting and generalization.
The paper outlines one simple method that allows a BP network to 
evolve its own semi-distributed representations as it learns.

               - Bob French
               Center for Research on Concepts and Cognition
               Indiana University
	       

From dcp+ at cs.cmu.edu  Sun Jun  9 09:30:32 1991
From: dcp+ at cs.cmu.edu (David Plaut)
Date: Sun, 09 Jun 91 09:30:32 EDT
Subject: distributed representations 
In-Reply-To: Your message of Sat, 08 Jun 91 10:51:22 -0400.
             <9106072351.AA01618@macuni.mqcc.mq.oz.au> 
Message-ID: <2428.676474232@DWEEB.BOLTZ.CS.CMU.EDU>

>But where's the evidence that brain damage degrades cognition gracefully? That
>is, the person just gets a little bit worse at a lot of things? Very commonly,
>exactly the opposite happens - the person remains normal at almost all kinds
>of cognitive processing, but some specific cognitive task suffers catastroph-
>ically. No graceful degradation here.

I think the issue here is a matter of scale.  "Graceful degredation" refers to
the gradual loss of function with increasing severity of damage - it says
nothing about how specific or general that function is.  Connectionist models
can be modular at a global scale, but use distributed representations and show
graceful degredation *within* modules.  I think you would agree that, within a
particular domain, this is a reasonable characterization of the behavior of
many types of patient (to the degree that we understand the modular
organization of certain aspects of cognition and the nature of individual
patients' damage).  Of course, severe damage to a module might still produce
catestrophic loss of its function, perhaps leaving the remaining functions
relatively intact.

On the other hand, the *degree* of specificity of impairment certainly places
constraints on the modular organization and the nature of the representations
within each module (although I think connectionist modeling illustrates the
danger of the "specific impairment implies separate module" logic).  Only
specific modeling work can demonstrate whether connectionist architectures and
representations can account for the behavior of specific patients in an
informative way.

-Dave
-------------------------------------------------------------------------------
David Plaut				dcp+ at cs.cmu.edu
School of Computer Science		412/268-8102
Carnegie Mellon University
Pittsburgh, PA  15213-3890

From gasser at bend.UCSD.EDU  Sun Jun  9 00:58:26 1991
From: gasser at bend.UCSD.EDU (Michael Gasser)
Date: Sat, 8 Jun 91 21:58:26 PDT
Subject: Distributed representations and graceful degradation
Message-ID: <9106090458.AA04907@bend.UCSD.EDU>

Max Coltheart discusses how damage to real neural networks often
results in more of a clumsy than a graceful sort of degradation.

But isn't degradation under conditions of increasing task complexity
a different matter?  I'm thinking of the processing of increased levels of
embedding or (possibly also) numbers of arguments in natural language.
Fixed-length distributed representations of syntactic or semantic
structure (e.g., RAAM, Elman nets) seem to model this behavior quite
well, in comparison to the usual symbolic approach (you're no more
likely to fail at 28 levels of embedding than at 2) and to localist
connectionist approaches (you can handle sentences with 3 arguments,
but 4 are out because you run of units).

Mike Gasser


From siegelma at yoko.rutgers.edu  Sun Jun  9 10:56:40 1991
From: siegelma at yoko.rutgers.edu (siegelma@yoko.rutgers.edu)
Date: Sun, 9 Jun 91 10:56:40 EDT
Subject: TR available from neuroprose; Turing equivalence
Message-ID: <9106091456.AA12844@yoko.rutgers.edu>

The following report is now available from the neuroprose archive:

           NEURAL NETS ARE UNIVERSAL COMPUTING DEVICES
             H. T. Siegelmann and E.D. Sontag.  (13pp.)

Abstract: It is folk knowledge that neural nets should be capable of
simulating arbitrary computing devices.  Past formalizations of this fact have
been proved under the hypotheses that there are potentially infinitely many
neurons available during a computation and/or that interconnections are
multiplicative.  In this work, we show the existence of a finite network, made
up of sigmoidal neurons, which simulates a universal Turing machine.  It is
composed of less than 100,000 synchronously evolving processors, interconnected
linearly.

-Hava

-----------------------------------------------------------------------------

To obtain copies of the postscript file, please use Jordan Pollack's service:

Example:
unix> ftp cheops.cis.ohio-state.edu          # (or ftp 128.146.8.62)
Name (cheops.cis.ohio-state.edu:): anonymous
Password (cheops.cis.ohio-state.edu:anonymous): <ret>
ftp> cd pub/neuroprose
ftp> binary
ftp> get
(remote-file) siegelman.turing.ps.Z
(local-file) siegelman.turing.ps.Z
ftp> quit
unix> uncompress siegelman.turing.ps.Z
unix> lpr -P(your_local_postscript_printer) siegelman.turing.ps

----------------------------------------------------------------------------
If you have any difficulties with the above, please send e-mail to
siegelma at paul.rutgers.edu.   DO NOT "reply" to this message, please.

From jagota at cs.Buffalo.EDU  Sun Jun  9 16:52:33 1991
From: jagota at cs.Buffalo.EDU (Arun Jagota)
Date: Sun, 9 Jun 91 16:52:33 EDT
Subject: Information Capacity and Local vs Distributed
Message-ID: <9106092052.AA04177@sybil.cs.Buffalo.EDU>

Dear Connectionists,
I think Information Capacity* (IC) (Abu-Mostafa, Jacques 85) is a useful
quantitative criterion for L vs D, illustrated by the following trivial 
example. 

You are given k pebbles, to be placed in k-of-n locations. 
location has pebble => `1', otherwise `0'.
IC == # distinct vectors that can be stored = C(n,k)    (n choose k)

For this e.g, its nice that the Binomial distribution quantifies IC for L vs D. 
The IC of k ~ n/2 (distributed) is by far superior. 

k = 1     ==> Local,  IC = n
k is n/2  ==> distributed, IC = C(n,n/2) is maximum
k = n-1   ==> over-distributed, IC = n

With (threshold-element) connectionist nets, the analogy holds, but the
(hidden or output layer) units [locations] are not independent. I would 
think there is scope for theory and empirical work along these lines. I 
have seen IC work on symmetric nets but even here I am unaware of work on IC 
as a function of k. I am unaware (haven't looked) of any work on FF nets.

* - IC is actually defined as log of how I have shown 
Sincerely,
Arun Jagota
jagota at cs.buffalo.edu

From peterc at chaos.cs.brandeis.edu  Mon Jun 10 00:06:49 1991
From: peterc at chaos.cs.brandeis.edu (Peter Cariani)
Date: Mon, 10 Jun 91 00:06:49 edt
Subject: (the late) Rook McCulloch
Message-ID: <9106100406.AA29926@chaos.cs.brandeis.edu>

   Rook McCulloch also edited a 4 volume set of Warren McCulloch's works,
"The Collected Works of Warren S. McCulloch", published by Intersystems Press 
in 1989 (401 Victor Way #3, Salinas, CA 93907 USA; $84 for 4 volumes, paper). 
In addition to her forward and Warren McCulloch's papers, the set also 
contains some very nice essays by Jerry Lettvin, Michael Arbib, 
F.S.C. Northrop, Heinz von Foerster, D.M. MacKay (and others). For those of
us who never knew the McCullochs, this seems to be the best available 
source of information about what they thought and felt.

Also of relevance is the book of Steve Heims on the Macy conferences and
the origins of cybernetics ("The Cybernetics Group", MIT Press, 1991) in
which Warren McCulloch's role is amply discussed.

From bates at crl.ucsd.edu  Mon Jun 10 12:25:10 1991
From: bates at crl.ucsd.edu (Elizabeth Bates)
Date: Mon, 10 Jun 91 09:25:10 PDT
Subject: response to max coltheart
Message-ID: <9106101625.AA25405@crl.ucsd.edu>


From Mailer-Daemon  Sun Jun  9 19:23:49 1991
From: Mailer-Daemon (Mail Delivery Subsystem)
Date: Sun, 9 Jun 91 16:23:49 PDT
Subject: Returned mail: User unknown
Message-ID: <9106092323.AA19415@crl.ucsd.edu>

   ----- Transcript of session follows -----
Connected to macuni.mqcc.mq.oz.au:
>>> RCPT To:<connectnet at macuni.mqcc.mq.oz.au>
<<< 550 <connectnet at macuni.mqcc.mq.oz.au>... User unknown: Connection refused
550 connectnet at macuni.mqcc.mq.oz.au... User unknown

   ----- Unsent message follows -----

From bates  Sun Jun  9 19:23:49 1991
From: bates (Elizabeth Bates)
Date: Sun, 9 Jun 91 16:23:49 PDT
Subject: distributed representations
Message-ID: <9106092323.AA19411@crl.ucsd.edu>

I respectfully disagree with Max Coltheart that brain damage usually or even
often yields discrete and domain-specific performance decrements.  to be
sure, such cases have been reported -- and indeed, their "news value" often
lies in the surprisingly discrete nature of the patient's profile.  but
such case studies typically fail to recognize issues like the peaks and
valleys that might have been there premorbidly, i.e. in the "man that used
to be".  also, we often fail to recognize that by choosing those patients
with "interesting" profiles against an unspecified number of background
patients with "uninteresting" profiles, we are capitalizing on chance
distributions across a number of noisy domains.  given 1000 patients
who are normally distributed across 100 tasks, I have a pretty solid
chance of finding a good number of striking "double dissociations" and
even more "single dissociations" entirely by chance.  For a simulation
that makes EXACTLY that point (coupled with a detailed critique of a
"real" study of 20 patients that make this very error), see Bates,
Appelbaum and Allard, "Statistical constraints on the use of single
case studies in neuropsychological research", in the last issue of
Brain and Language. -liz bates

From bates at crl.ucsd.edu  Mon Jun 10 12:29:33 1991
From: bates at crl.ucsd.edu (Elizabeth Bates)
Date: Mon, 10 Jun 91 09:29:33 PDT
Subject: Distributed representations and graceful degradation
Message-ID: <9106101629.AA25488@crl.ucsd.edu>

Marcel Just and Patricia Carpenter have a paper coming out in Psychological
Review that shows (reviewing quite a range of studies) how the ability
of normal adults to handle (read, comprehend) various levels of grammatical
complexity and ambiguity interacts with (1) that adult's working memory
span, and (2) the effects of a cognitive load imposed by a secondary
task.  The notion of graceful degradation seems to apply to their work
very well.  You can obtain a preprint of their paper by contacting them
at CMU (Psychology Department). -liz bates

From cabestan at eel.upc.es  Mon Jun 10 10:05:46 1991
From: cabestan at eel.upc.es (JOAN CABESTANY)
Date: Mon, 10 Jun 1991 14:05:46 +0000
Subject: Call for Papers IWANN'91
Message-ID: <"155*/S=cabestan/OU=eel/O=upc/PRMD=iris/ADMD= /C=es/"@MHS>


Dear Colleagues,

	Please find here the second Call for Papers for IWANN'91.
Remember that the absolute limit date for work presentation is
June 20 th.

	IWANN'91 will be held in GRANADA next September.

******************************************************************
******************************************************************


                         INTERNATIONAL WORKSHOP
                                   ON
                       ARTIFICIAL NEURAL NETWORKS


                                IWANN'91


                           Second Announcement


                             Granada, Spain
                          September 17-19, 1991


                       ORGANISED AND SPONSORED BY

            Spanish Chapter of the Computer Society of IEEE,

               AEIA (IEEE Computer Society Affiliate), and

            Department of Electronic and Computer Technology.
                      University of Granada. Spain.

SCOPE

Artificial Neural Networks (ANN) were first developed as structural or
functional modelling systems of natural ones, featuring the ability to
perform problem-solving tasks. They can be thought as computing arrays
consisting of series of repetitive uniform processors (neuron-like elements)
placed on a grid. Learning is achieved by changing the interconnections
between these processing elements. Hence, these systems are also called
connectionist models.
    ANN has become a subject of wide-spread interest: they offer an odd
scheme-based programming standpoint and exhibit higher computing speeds
than conventional von-Neumann architectures, thus easing or even enabling
handling complex task such as artificial vision, speech recognition,
information recovery in noisy environments or general pattern recognition.
    In ANN systems, collective information management is achieved by
means of parallel operation of neuron-like elements, into which information
processing is distributed. It is intended to exploit this highly parallel
processing capability as far as possible in complex problem-solving tasks.
    Cross-fertilization between the domains of artificial and real neural nets
is desirable. The more genuine problems of biological computation and
information processing in the nervous system still remain open and
contributions in this line are more than welcome. Methodology, theoretical
frames, structural and organizational principles in neuroscience, self-
organizing and co-operative processes and knowledge based descriptions of
neural tissue are relevant topics to bridge the gap between the artificial and
natural perspectives.
    The workshop intends to serve as a meeting place for engineers and
scientists working in this area, so that present contacts and relationships can
be further increased. The workshop will comprise two complementary
activities:
            .    scientific and technical conferences, and
            .    scientific communications sessions.


TOPICS

The workshop is open to all aspects of artificial neural networks, including:

 1. Neural network theories. Neural models.
 2. Biological perspectives
 3. Neural network architectures and algorithms.
 4. Software developments and tools.
 5. Hardware implementations
 6. Applications.


LOCATION

Facultad de Ciencias
Campus Universitario de Fuentenueva
Universidad de Granada
18071 GRANADA. (SPAIN)

LANGUAGES

English and Spanish will be the official working languages. English is
preferable as the working language. Simultaneous translation will be
available. Simultaneous translation will be available.


CALL FOR PAPERS

The Programme Committee seeks original papers on the six above
mentioned areas. Survey papers on the various available approaches or
particular application domains are also sought.
    In their submitted papers, authors should pay particular attention to
explaining the theoretical and technical choices involved, to make clear the
limitations encountered and to describe the current state of development of
their work.


INSTRUCTIONS TO AUTHORS

Three copies of submitted papers (not exceeding 8 pages in 21x29.7 cms
(DIN-A4), with 1,6 cm. left, right, top and bottom margins) should be
received by the Programme Chairman at the address below before June 20,
1991.

    The headlines should be centred and include:
    .   the title of paper in capitals
    .   the name(s) of author(s)
    .   the address(es) of author(s), and
    .   a 10 line abstract.

    Three blank lines should be left between each of the above items, and
four between the headlines and the body of the paper, written in English,
single-spaced and not exceeding the 8 pages limit.
    All papers received will be refereed by the Programme Committee. The
Committee will communicate their decision to the authors on July 10.
Accepted papers will be published in the proceedings to be distributed to
workshop participants.

    In addition to the paper, one sheet should be attached including the
following information:

    .   the title of the paper,
    .   the name(s) of author(s),
    .   a list of five keywords,
    .   a reference to which of the six topics the paper concerns, and
    .   postal address of one of the authors, with phone and fax numbers,
        and E-mail (if available).
    .   presentation language

    We intend to get in touch with various international publishers (such
as Springer-Verlag and Prentice-Hall) for the final version of the
proceedings.
PROGRAM AND ORGANIZATION COMMITTEE

Organization Chairman:  Alberto Prieto (Unv. Granada. Spain)

Programme Chairman:  Jos Mira (UNED. Madrid. Spain)

Senen Barro              Unv. de Santiago (E)
Francois Blayo           Ecole Polytechnique Federale de Lausanne (S)
Joan Cabestany           Unv. Pltca. de Catalunya (E)
Marie Cottrell           Unv. Paris I (F)
Jose Antonio Corrales    Unv. Oviedo. (E)
Gerard Dreyfus           ESPCI Paris (F)
Gregorio Fernandez       Unv. Pltca. de Madrid (E)
J. Simoes da Fonseca     Unv. de Lisboa (P)
Karl Goser               Unv. Dortmund (G)
Jeanny Herault           INPG Grenoble (F)
Jose Luis Huertas        CNM- Universidad de Sevilla (E)
Simon Jones              Unv. Nottingham (UK)
Chistian Jutten          INPG Grenoble (F)
Antonio Lloris           Unv. Granada (E)
Panos A. Ligomenides     Unv. of Maryland (USA)
Javier Lopez Aligue      Unv. de Extremadura. (E)
Federico Moran           Unv. Complutense. Madrid (E)
Roberto Moreno           Unv. Las Palmas Gran Canaria (E)
Franz Pichler            Johannes Kepler Univ. (Aus)
Ulrich Rueckert          Unv. Dortmund (G)
Francisco Sandoval       Unv. de Malaga (E)
Carmen Torras            Instituto de Ciberntica. CSIC. Barcelona (E)
V. Tryba                 Unv. Dortmund (G)
Elena Valderrama         CNM- Unv. Autonoma de Barcelona (E)
Michel Weinfeld          Ecole Polytechnique Paris (F)


LOCAL ORGANIZING COMMITTEE  (Universidad de Granada)
Juan Julian Merelo
Julio Ortega
Francisco J. Pelayo
Begona del Pino
Alberto Prieto   


ORGANIZING ENTITIES:

Spanish Chapter of the Computer Society of IEEE,
AEIA (IEEE Computer Society Affiliate), and
Department of Electronic and Computer Technology. University of Granada.
Spain.

SPONSORING ENTITIES:

Ayuntamiento de Granada (Dto. de Congresos)
Caja General
Universidad de Granada


                        SOME USEFUL INFORMATION

    Granada is a beautiful city that lies to the south of Spain, in which the
mixture between Christian and Muslim culture reaches its architectural peak.
The Alhambra is the most magnificent European Muslim fortress and palace
conserved to-date, and Granada nights are known in all Spain for their
liveliness, due to the high proportion of students.
    The river Genil gives rise to the Vega or Valley of Granada, where the
soil is fertile and bears the most varied crops. It has small farms and
beautiful villages, some as interesting as Santa Fe, where the voyage for the
discovery of America was negotiated. From Granada it takes only one hour
to get to the southernmost ski resort in Europe, Sierra Nevada, where
Winter sports can be enjoyed. A wide road leads right up to the Veleta
Peak, so that in Summer it can be reached by car. This road, at 3,428 m.
above sea level, is the highest in Europe. 65 Km. from the city of Granada
is Granada's Costa del Sol (so called Costa Tropical or Tropical Coast).
    The University of Granada is the third most important in Spain. It has
40,000 students, which makes up one sixth of the whole population. This
is what gives the city a youthful and dynamic atmosphere, stimulating a
"living culture".
    The weather during mid-September in Granada is warm, and
temperatures of 30 degrees Centigrade are not unusual. Temperatures can
lower during the night, so a pullover is advised. During the day, t-shirts or
light shirts and trousers are the most suitable clothes.


PRE AND POST WORKSHOP TOURS:

A-EXCURSION: September 16: Trip to Alpujarra, typical mountain
villages. Time: 9.00-20h. Price: 3500 ptas./per person (Includes Bus and
lunch).
B-EXCURSION: September 20: Trip to Costa del Sol, including Nerja
with its wonderful caves and the seaside resorts of Almunecar and
Salobre$a. Time: 9.00-20h. Price: 2000 ptas. (Includes Bus)


SOCIAL ACTIVITIES:

September 16: Pre Workshop tour (A-Excursion)

September 17:
20:00   Reception at the Hospital Real (16th Century University Central
        Services Building).
22:00   Night visit to the Alhambra.

September 18:
20:00   Reception at the "Palacio de los Cordova" (Albaic!n), given by the
        Granada City Hall (Congress Dept.).

September 19:
21:00   Official dinner

September 20: Post Workshop tour (B-Excursion)


                         PROVISIONAL SCHEDULE

September 17:

9:15 Opening session.
10:00-11:30 Lecture 1:Natural and Artificial Neural Nets; Prof. Dr. Roberto
            MORENO (Universidad de las Palmas de Gran Canaria)
11:30-12:00 Coffee-break.
12:00-13:30 Session 1.

16:00-17:30 Session 2.
17:30-18:00 Coffee-break.
18:00-19:30 Session 3


September 18:

09:30-11:00 Lecture 2:  Application and Implementation of Neural
            Networks in Microelectronics; Prof. Dr. Ing. Karl GOSER
            (Universitt Dortmund)
11:00-11:30 Coffee-break.
11:30-13:30 Session 4.

16:00-17:30 Session 5.
17:30-18:00 Coffee-break.
18:00-19:30 Session 6.


September 19:

09:00-11:00 Lecture 3: Cooperative Computing and Neural Networks; Prof.
            Panos A. LIGOMENIDES (University of Maryland)
11:00-11:30 Coffee-break.
11:30-13:30 Session 7.

16:00-17:30 Session 8.
17:30-18:00 Coffee-break.
18:00-19:30 Session 9.


This form should be sent before July 25 to:
            Viajes Internacional Expreso (V.I.E.); Galerias Preciados; 
		Carrera del Genil, s/n. 18005 GRANADA (Spain)
Tnos. (34) 58-22.44.95, (34) 58-22.75.86, (34) 58-224944; Telex: 78525


  The following hotels are available with special fees for the Workshop 
participants. The prices are per night and they include V.A.T.
and continental breakfast:


Hotel            Cat.      Single room       Double room
______________________________________________________________
Condor         ***           7700             10070 pts.
Eurobecquer    **            4630              5820


Tour A ........... 3.500 pts.
Tour B ........... 2.000 pts


Please tick the appropriate box. Reservations can be guaranteed before July
25th.  A list of other hotels is enclosed (Please address directly to them).
Payment should be made in Spanish currency.
                                           
I enclose a bank cheque payable to: V.I.E.


             INTERNATIONAL WORKSHOP ON ARTIFICIAL NEURAL NETWORKS
                                   (IWANN'91)
                         Granada, Spain, September 17-19, 1991
                                HOTEL BOOKING FORM


SURNAME ______________________________________   

FIRST NAME _______________________________

ORGANIZATION _______________________________________________________________

ADDRESS  ____________________________________________________________________  

CITY _____________________ POST CODE __________  COUNTRY______________________

TELEPHONE __________________ FAX _________________________

E-MAIL:_______________________

Accompanying person(s) ________________________________________________________

I want to reserve: _______ double room(s); ___________ single room(s)

Arrival date:__________ Time: __________ 

Departure date:_________ Time:_________ 


         INTERNATIONAL WORKSHOP ON ARTIFICIAL NEURAL NETWORKS
                    (IWANN'91)
          Granada, Spain, September 17-19, 1991
                  

 REGISTRATION FORM


SURNAME ______________________________________  

FIRST NAME _______________________________

ORGANIZATION  ________________________________________________________________

ADDRESS  ____________________________________________________________________  

CITY _____________________ POST CODE __________  COUNTRY ______________________

TELEPHONE __________________ FAX _________________________ 

E-MAIL:   _______________________

Fill in the appropriate box:


Fee                     Before June 25th             After June 25th
___________________________________________________________________
Regular                  33.000                       35.000
IEEE,AEIA,ATI members    28.000                       30.000
Scholarship               4.000                        5.000


This form should be sent as soon as possible to:

              Departamento de Electronica y Tecnologia de Computadores
              Facultad de Ciencias
              Universidad de Granada
              18071 GRANADA (SPAIN)

In order to avoid delays, please fax the registration form, 
together with a copy of the cheque or the bank transfer to:
FAX: 34-58-24.32.30 or 34-58-27.42.58

INSCRIPTION PAYMENTS:

Cheque payable to:  IWANN'91 (16.142.512)

or alternatively transfer to:

IWANN'91                                       IWANN'91
account number: 16.142.512                     account number: 007.01-450888
Caja Postal (Code: 2088-2037.1)   or to        Caja General     
Camino de Ronda, 138                           Camino de Ronda, 156
18003 GRANADA (SPAIN)                          18003 GRANADA (SPAIN)

************************************************************************


From ashley at spectrum.cs.unsw.oz.au  Tue Jun 11 00:11:13 1991
From: ashley at spectrum.cs.unsw.oz.au (Ashley Aitken)
Date: Tue, 11 Jun 91 0:11:13 AES
Subject: distributed representations
In-Reply-To: <9106072351.AA01618@macuni.mqcc.mq.oz.au>; from "Max Coltheart" at Jun 8, 91 9:51 am
Message-ID: <9106101413.10651@munnari.oz.au>

G'day,

In the discussion of "Distributed Representations", Max Coltheart writes:
>
> But for nets that are meant to be
> models of cognition, the hidden assumption seems to be that after brain damage
> there is graceful degradation of cognitive processing, so the fact that nets
> show graceful degradation too means they have promise for modelling cognition.
> 
> But where's the evidence that brain damage degrades cognition gracefully? That
> is, the person just gets a little bit worse at a lot of things? Very commonly,
> exactly the opposite happens - the person remains normal at almost all kinds
> of cognitive processing, but some specific cognitive task suffers catastroph-
> ically. No graceful degradation here.

I would suggest that Max is possibly confusing diffuse brain damage with 
catastrophic brain damage. 

Diffuse brain damage is the elimination of a small percentage of neurons 
diffusely from throughout the brain. Examples are the natural death of 
neurons throughout the brain and, perhaps, micro-lesions.

The continual death of an immense number of neurons in the brain, thankfully
only really amounts to the death of a very small percentage of the neurons 
in the brain. In any of the partitioned networks of the brain (say an area 
of the cortex) we would expect only a small number of neurons to die.

If one considers that a neuron may receive in the order of thousands of 
synapses on it's dendritic tree, it can be understood, I believe, how the 
network (thought of as a connectionist network) could continue to function 
if one or two of these were to be eliminated.

I would suggest that this continual death of neurons in the brain with the 
subtle, and often unnoticed, degradation in cognitive performance to be an 
example of (diffuse) brain damage degrading cognition gracefully. Hence, I
believe this type of degradation does show neural networks have promise 
for modelling cogntion.

Of course, this does depend on the the degradation seen in cognition being 
shown to be qualitatively the same as degradation seen in artificial neural 
networks.

Catastrophic brain damage, on the other hand, is the gross elimination of 
neurons (usually relatively localized) from the brain. Examples are lesions
resulting from head injuries or strokes, and ablation.

It would seem that in this case one is most likely seeing the complete (or 
nearly complete) elimination of an entire network (or a critical part of
it) and hence the elimination of it's associated and dependent function(s).
I don't believe anyone would suggest that the brain's function would degrade 
gracefully under such terrorist action.

Max continues:
> 
> I could give very many examples: I'll just give one (Semanza & Zettin, 
> Cognitive Neuropsychology, 1988 5 711). This patient, after his stroke, had
> impaired language, but this impairment was confined to language production
> (comprehension was fine) and to the production of just one type of word: proper
> nouns. He could understand proper nouns normally, but could produce almost none
> whilst his production of other kinds of nouns was normal. What's graceful about
> this degradation of cognition? 

I am definitely no expert neuroscientist but I would suggest that this is an 
example of catastrophic brain damage not diffuse brain damage. Hence, I would 
not expect graceful degradation of cognitive performance. It seems to me that 
this would be too much to ask of all but the most completely holographic-like 
systems.

The interesting point to be made from this example would then be that it 
appears to be evidence for a cortical region involved (directly or in-line) 
with the speech of only nouns. Amazing!

It would also be interesting to test if there is any subtle difference in our
*understanding* of a noun depending upon whether we are receiving (ie hearing
or seeing it) with when we are producing (ie speaking or imagining) it.

If this diagnosis of catastrophic brain damage is correct then I believe this 
example is mute upon whether or not the brain is functionally a Connectionist 
System. Still, the Connectionist System, in my opinion, gets the points for 
the diffuse brain damage. 

Hence Max's concluding suggestion, 

> If cognition does *not* degrade gracefully, and neural nets do, what does this
> say about neural nets as models of cognition?

becomes rather misplaced because cognition does appear to degrade gracefully 
under diffuse brain damage and catastrophically under catastrophic brain
damage. The former providing possible evidence for neural networks as 
models of cognition.


Ashley
ashley at spectrum.cs.unsw.oz.au

From kbj at jupiter.risc.rockwell.com  Mon Jun 10 13:57:41 1991
From: kbj at jupiter.risc.rockwell.com (Ken Johnson)
Date: Mon, 10 Jun 91 10:57:41 PDT
Subject: No subject
Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com>


In response to the debate on Distributed vs. Local Representations.....

Everyone in this field has a view point colored by their academic background.  So here is mine.

The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'..  The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties.  In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation.  In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'.

An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain.  In this case we have Ashby's Law of Requisite Variety.  I can't find my copy of the reference, but its by John Porter circa 1983-1987.  In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel.  Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description.

In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely.  References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington.  What we found was an important dichotomy.  Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes.  Without this characteristic pattern classification would not group very similar patterns together.  On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately.  Hence, we see proper code organization required similar codes be close while different codes needed to be far apart.  One should expect this property if the goal of the system is representationaly richness rat

The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly.  Correct utilization of neural representation bandwidth is something we don't use very well.  In fact, I'll state that we don't use it at all.  The notion of bandwidth immediately suggests time as a representational dimension we don't use.  Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted.  Thus, the code is again static.  Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and  increasing the processing capabilities of neural systems?

Ken Johnson
kbj at risc.rockwell.com

From ahmad at ICSI.Berkeley.EDU  Mon Jun 10 18:58:12 1991
From: ahmad at ICSI.Berkeley.EDU (Subutai Ahmad)
Date: Mon, 10 Jun 91 15:58:12 PDT
Subject: Preprint
Message-ID: <9106102258.AA16050@icsib18.Berkeley.EDU>


The following paper (to appear in this Cognitive Science proceedings)
is available from the neuroprose archives as ahmad.cogsci91.ps.Z
(ftp instructions below).


	  Efficient Visual Search: A Connectionist Solution

				by

		  Subutai Ahmad & Stephen Omohundro
	       International Computer Science Institute

			       Abstract

Searching for objects in scenes is a natural task for people and has
been extensively studied by psychologists. In this paper we examine
this task from a connectionist perspective. Computational complexity
arguments suggest that parallel feed-forward networks cannot perform
this task efficiently. One difficulty is that, in order to distinguish
the target from distractors, a combination of features must be
associated with a single object. Often called the binding problem,
this requirement presents a serious hurdle for connectionist models of
visual processing when multiple objects are present.  Psychophysical
experiments suggest that people use covert visual attention to get
around this problem. In this paper we describe a psychologically
plausible system which uses a focus of attention mechanism to locate
target objects. A strategy that combines top-down and bottom-up
information is used to minimize search time. The behavior of the
resulting system matches the reaction time behavior of people in
several interesting tasks.


A postscript version of the paper can be obtained by ftp from
cheops.cis.ohio-state.edu. The file is ahmad.cogsci91.ps.Z in the
pub/neuroprose directory. You can either use the Getps script or
follow these steps:

unix:2> ftp cheops.cis.ohio-state.edu
Connected to cheops.cis.ohio-state.edu.
Name (cheops.cis.ohio-state.edu:): anonymous
331 Guest login ok, send ident as password.
Password: neuron
230 Guest login ok, access restrictions apply.
ftp> cd pub/neuroprose
ftp> binary
ftp> get ahmad.cogsci91.ps.Z
ftp> quit
unix:4> uncompress ahmad.cogsci91.ps.Z 
unix:5> lpr ahmad.cogsci91.ps


--Subutai
ahmad at icsi.berkeley.edu

From crr at shum.huji.ac.il  Mon Jun 10 15:12:11 1991
From: crr at shum.huji.ac.il (crr@shum.huji.ac.il)
Date: Mon, 10 Jun 91 22:12:11 +0300
Subject: distributed vs. local encoding schemes 
Message-ID: <9106101912.AA28249@shum.huji.ac.il>

Terry Sejnowski mentioned the kinds of hidden units that we found in NETtalk.
As for the input/output representations, we ran a number of 
experiments using both local (one unit per letter/phoneme, but
more than one unit on per window) and distributed 
representations (more than one unit on per letter/phoneme).
Learning times are generally faster with distributed representations simply 
because the net inputs and resulting error gradients are larger.
(However it might be possible to boost the learning rate for the
local representation to match the distributed one.  I don't know if
this would affect generalization or not since I didn't try it.)
Using a representation that "makes sense" for the particular domain
(such as using an articulatory feature code for the phonemes -- or is this 
local because the units represent features?)
also leds to faster learning, and is more resistant to damage than
a "random" encoding of the phonemes.

Charlie Rosenberg

From CADEPS at BBRNSF11.BITNET  Tue Jun 11 08:56:05 1991
From: CADEPS at BBRNSF11.BITNET (JANSSEN Jacques)
Date: Tue, 11 Jun 91 14:56:05 +0200
Subject: No subject
Message-ID: <5901C8A706400066@BITNET.CC.CMU.EDU>

                   STEERABLE GenNets - A Query.
 
Abstract :
            One can evolve a GenNet (a neural net evolved with the genetic
algorithm) to display two separate behaviors depending upon the setting of
a clamped input control variable. By using an intermediate control value
one obtains an intermediate behavior. For example, let the behaviors be
sinusoidal oscillations of periods T1 and T2, where the control settings are
0.5 and -0.5 By using a control value of 0.3, one will get a sinusoid with
a period between T1 and T2. Why? Has anyone out there had any similar
experiences (i.e. of this sort of generalised behavioral learning), and has
anybody any idea why GenNets are capable of such a phenomenon? If I receive
some interesting replies, I'll prepare a summary and report back.
 
 
Further details.
 
                  One of the great advantages of GenNets (= using the GA to
teach your neural nets their behaviors) over traditional NN paradigms such as
backprop, Hopfield, etc is that the GA treats your NN as a black box, and it
doesnt matter how complex the internal dynamics of the NN are. All that counts
is the result. How well did the NN perform? If it did well, the bitstring which
codes for the NN's weights will survive. This allows the creation of GenNets
which can cope with both inputs and outputs which vary constantly. One does not
need stationary output values a la Hopfield etc. Hence NNs become much more
"dynamic", compared to the more "static" nature of traditional paradigms. One
can thus evolve dynamics (behaviors) on NNs (GenNets). This opens up a new
world of NN possibilities. If one can evolve a GenNet to express one behavior,
why not two? If two, can one evolve a continuum of behaviors depending
upon the setting of a controlled input value? The variable frequency generator
GenNet mentioned above shows that this is possible. But I'm damned if I know
why? Whats going on? Have any of you had similar experiences? Any clues for a
theoretical explanation for this extraordinary phenomenon?
 
P.S. To evolve this GenNet, use a fully connected net, with all external
inputs set at zero, except for two inputs. Clamp one at 0.5, and the other
at 0.5 (and then -0.5 in the second "experiment"). The fitness is the inverse
of the sum of the two sums (for the two expts) of the squares of the difference
between the desired output at each clock cycle and the actual output. Assign
one neuron to be the output neuron.
 
Cheers,
 
Hugo de Garis,
University of Brussels, Belgium,
George Mason University, VA, USA.

From thomasp at gshalle1.informatik.tu-muenchen.de  Tue Jun 11 11:50:25 1991
From: thomasp at gshalle1.informatik.tu-muenchen.de (Thomas)
Date: Tue, 11 Jun 1991 17:50:25 +0200
Subject: Research Position in SPAIN ?
Message-ID: <9106111550.AA08800@gshalle1.informatik.tu-muenchen.de>


I'm a graduate student in computer science at Munich Technical 
University and plan to work in a research position related to
neural networks in SPAIN. 

I would extremely appreciate if you could provide me some information
on university/private/company research institutes active
or interested in the field of neural network research and located
in the Madrid or Seville area.

Preferably, I would like to start working in Spain in November 91 or,
alternatively, in January/February 1992.

Sincerely,

Patrick Thomas
Institute for Medical Psychology
Goethestr. 31
8000 Munich 2

From kbj at jupiter.risc.rockwell.com  Mon Jun 10 13:57:41 1991
From: kbj at jupiter.risc.rockwell.com (Ken Johnson)
Date: Mon, 10 Jun 91 10:57:41 PDT
Subject: No subject
Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com>


In response to the debate on Distributed vs. Local Representations.....

Everyone in this field has a view point colored by their academic background.  So here is mine.

The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'..  The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties.  In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation.  In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'.

An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain.  In this case we have Ashby's Law of Requisite Variety.  I can't find my copy of the reference, but its by John Porter circa 1983-1987.  In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel.  Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description.

In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely.  References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington.  What we found was an important dichotomy.  Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes.  Without this characteristic pattern classification would not group very similar patterns together.  On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately.  Hence, we see proper code organization required similar codes be close while different codes needed to be far apart.  One should expect this property if the goal of the system !
is representationaly richness rat

The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly.  Correct utilization of neural representation bandwidth is something we don't use very well.  In fact, I'll state that we don't use it at all.  The notion of bandwidth immediately suggests time as a representational dimension we don't use.  Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted.  Thus, the code is again static.  Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and  increasing the processing capabilities of neural systems?

Ken Johnson
kbj at risc.rockwell.com

From moeller at kiti.informatik.uni-bonn.de  Thu Jun 13 03:50:34 1991
From: moeller at kiti.informatik.uni-bonn.de (Knut Moeller)
Date: Thu, 13 Jun 91 09:50:34 +0200
Subject: TR available from neuroprose; learning algorithms
Message-ID: <9106130750.AA01054@kiti.>

The following report is now available from the neuroprose archive:

	LEARNING BY ERROR-DRIVEN DECOMPOSITION
   D.Fox  V.Heinze  K.Moeller  S.Thrun  G.Veenker  (6pp.)

Abstract: In this paper we describe a new selforganizing decomposition
technique for learning high-dimensional mappings. Problem
decomposition is performed in an error-driven manner, such that the
resulting subtasks (patches) are equally well approximated.  Our
method combines an unsupervised learning scheme (Feature Maps
[Koh84]) with a nonlinear approximator (Backpropagation
[RHW86]). The resulting learning system is more stable and
effective in changing environments than plain backpropagation and much
more powerful than extended feature maps as proposed by
[RMW89]. Extensions of our method give rise to active
exploration strategies for autonomous agents facing unknown
environments.  
The appropriateness of this technique is demonstrated with an example
from mathematical function approximation.


-----------------------------------------------------------------------------

To obtain copies of the postscript file, please use Jordan Pollack's service:

Example:
unix> ftp cheops.cis.ohio-state.edu          # (or ftp 128.146.8.62)
Name (cheops.cis.ohio-state.edu:): anonymous
Password (cheops.cis.ohio-state.edu:anonymous): <ret>
ftp> cd pub/neuroprose
ftp> binary
ftp> get
(remote-file) fox.decomp.ps.Z
(local-file) fox.decomp.ps.Z
ftp> quit
unix> uncompress fox.decomp.ps.Z
unix> lpr -P((your_local_postscript_printer) fox.decomp.ps.Z

----------------------------------------------------------------------------
If you have any difficulties with the above, please send e-mail to
moeller at kiti.informatik.uni-bonn.de  

DO NOT "reply" to this message!!

From kbj at jupiter.risc.rockwell.com  Mon Jun 10 13:57:41 1991
From: kbj at jupiter.risc.rockwell.com (Ken Johnson)
Date: Mon, 10 Jun 91 10:57:41 PDT
Subject: No subject
Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com>


In response to the debate on Distributed vs. Local Representations.....

Everyone in this field has a view point colored by their academic background.  So here is mine.

The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'..  The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties.  In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation.  In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'.

An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain.  In this case we have Ashby's Law of Requisite Variety.  I can't find my copy of the reference, but its by John Porter circa 1983-1987.  In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel.  Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description.

In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely.  References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington.  What we found was an important dichotomy.  Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes.  Without this characteristic pattern classification would not group very similar patterns together.  On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately.  Hence, we see proper code organization required similar codes be close while different codes needed to be far apart.  One should expect this property if the goal of the system !
is representationaly richness rat

The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly.  Correct utilization of neural representation bandwidth is something we don't use very well.  In fact, I'll state that we don't use it at all.  The notion of bandwidth immediately suggests time as a representational dimension we don't use.  Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted.  Thus, the code is again static.  Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and  increasing the processing capabilities of neural systems?

Ken Johnson
kbj at risc.rockwell.com

From kbj at jupiter.risc.rockwell.com  Mon Jun 10 13:57:41 1991
From: kbj at jupiter.risc.rockwell.com (Ken Johnson)
Date: Mon, 10 Jun 91 10:57:41 PDT
Subject: No subject
Message-ID: <9106101757.AA10673@jupiter.risc.rockwell.com>


In response to the debate on Distributed vs. Local Representations.....

Everyone in this field has a view point colored by their academic background.  So here is mine.

The fundamental issues associated with information representations was in many ways dealt with by Shannon. If we consider neural activity value spread across a vector of neurons a resource then one can conjur up images of 'neural representation bandwidth'..  The usage of this bandwidth is determined by noise power, integration time, and a bunch of other signal/system properties.  In general, given a finite amount of noise, and a given number of neurons, a distributed representation is more 'efficient' than a local representation.  In this case efficiency would be the ability to pack more codes into a given coding scheme or 'code space'.

An equally important issue is that of neural code processing. Representation of the information is more or less useless without a processing system capable of transforming neural codes from one form to another form in a hierarchical system such as the brain.  In this case we have Ashby's Law of Requisite Variety.  I can't find my copy of the reference, but its by John Porter circa 1983-1987.  In this work he goes into a discussion and analysis wherein he shows that a neural system's capacity for control and information processing cannot exceed its capabilities as a communication channel.  Hence, he throws that ultimate capabilities of a neural processor back to Shannon's description.

In addition to these philosophical and theoretical reasons for my preference of distributed codes I've got reams of data from Neocognitron simulations which clearly show that proper operation of the machine REQUIRES distributed codes that use the representation space wisely.  References to this work can be found in the Proceedings of the IJCNN 1988 in SanDiego, 1988 in Boston, and 1990 Washington.  What we found was an important dichotomy.  Neural codes for similar features had to be close together in codes space to be grouped into new features by higher level processes.  Without this characteristic pattern classification would not group very similar patterns together.  On the other hand, differences between patterns had to be far apart in representation space in order to be discriminated accurately.  Hence, we see proper code organization required similar codes be close while different codes needed to be far apart.  One should expect this property if the goal of the system !
is representationaly richness rat

The above arguments lead me to believe that neural coding is one of the fundamental issues that needs to be invesgtigated more throughly.  Correct utilization of neural representation bandwidth is something we don't use very well.  In fact, I'll state that we don't use it at all.  The notion of bandwidth immediately suggests time as a representational dimension we don't use.  Feedforward systems don't use time to increase the bandwidth of a representation - they are static. Almost all feedback and recurrent systems we see are allowed to 'reach equilibrium' before the 'neural code' is interpreted.  Thus, the code is again static.  Why not use state trajectories, and temporal modulation as means of enhancing neural representation bandwidth and  increasing the processing capabilities of neural systems?

Ken Johnson
kbj at risc.rockwell.com

From thomasp at gshalle1.informatik.tu-muenchen.de  Thu Jun 13 13:33:19 1991
From: thomasp at gshalle1.informatik.tu-muenchen.de (Thomas)
Date: Thu, 13 Jun 1991 19:33:19 +0200
Subject: Gracias & Sorry
Message-ID: <9106131733.AA19732@gshalle1.informatik.tu-muenchen.de>


Sorry for the "garbage" and muchas gracias to all those 
helping out with adresses and conference announcements.

Patrick

From utans-joachim at CS.YALE.EDU  Sat Jun 15 12:48:45 1991
From: utans-joachim at CS.YALE.EDU (Joachim Utans)
Date: Sat, 15 Jun 91 12:48:45 EDT
Subject: preprint available
Message-ID: <9106151648.AA01689@SUNNY.SYSTEMSX.CS.YALE.EDU>


The following preprint has been placed in the neuroprose archive
at Ohio State University:


       Selecting Neural Network Architectures via the Prediction Risk:
                Application to Corporate Bond Rating Prediction

     Joachim Utans                          John Moody
     Department of Electrical Engineering   Department of Computer Science
     Yale University                        Yale University
     New Haven, CT 06520                    New Haven, CT 06520


                               Abstract:

    Intuitively, the notion of generalization is closely related to the
    ability of an estimator to perform well with new observations. In
    this paper, we propose the prediction risk as a measure of the
    generalization ability of multi-layer perceptron networks and use it
    to select the optimal network architecture. The prediction risk needs
    to be estimated from the available data; here we approximate the
    prediction risk by v-fold cross-validation and asymtotic estimates of
    generalized cross-validation or Akaike's final prediction error. We
    apply the technique to the problem of predicting corporate bond
    ratings. This problem is very attractive as a case study, since it is
    characterized by the limited availability of the data and by the lack
    of complete a priori information that could be used to impose a
    structure to the network architecture.


To retrieve it by anonymous ftp:

unix> ftp cheops.cis.ohio-state.edu          # (or ftp 128.146.8.62)
Name (cheops.cis.ohio-state.edu:): anonymous
Password (cheops.cis.ohio-state.edu:anonymous): neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get utans.bondrating.ps.Z
ftp> quit
unix> uncompress utans.bondrating.ps
unix> lpr -P(your_local_postscript_printer) utans.bondrating.ps


	Joachim Utans


From h1201kam at ella.hu  Sun Jun 16 13:05:00 1991
From: h1201kam at ella.hu (Kampis Gyorgy)
Date: Sun, 16 Jun 91 13:05:00
Subject: a new book; special issue on emergence; preprint availab
Message-ID: <9106161115.AA13832@sztaki.hu>


                     ANNOUNCEMENTS

****************************************************************

1. a new book
2. a Special Issue on emergence
3. preprint available

****************************************************************
1. the book

				George Kampis 	

SELF-MODIFYING SYSTEMS IN BIOLOGY AND COGNITIVE SCIENCE:
a New Framework for Dynamics, Information and Complexity

Pergamon, Oxford-New York, March 1991, 546pp with 96 Figures


About the book:

The main theme of the book is the possibility of generating
information by a recursive self-modification and self-
redefinition in systems. 

The book offers technical discussions of a variety of systems
(Turing machines, input-output systems, synergetic systems,
connectionist networks, nonlinear dynamic systems, etc.) to
contrast them with the systems capable of self-modification. 

What in the book are characterized as 'simple systems' involve a
fixed definition of their internal modes of operations, with
variables, parts, categories, etc. invariant. Such systems can be
represented by single schemes, like computational models of the
above kind. A relevant observation concerning model schemes is
that any scheme grasps but one facet of material structure, and
hence to every model there belongs a complexity excluded by it.
In other words, to every simple system there belongs a complex
one that is implicit. 

Self-modifying systems are 'complex' in the sense that they are
characterized by the author as ones capable to access an
implicate material complexity and turn it into the information
carrying variables of a process. An example for such a system
would be a tape recorder which spontaneously accesses new modes
of information processing (e.g. bits represented as knots on the
tape). A thesis discussed in the book is that unlike current
technical systems, many natural systems know how to do that
trick, and make it their principle of functioning. 

The book develops the mathematics, philosophy and methodology for
dealing with such systems, and explains how they work. A
constructive theory of models is offered, with which the modeling
of systems can be examined in terms of algorithmic information
theory. This makes possible a novel treatment of various old
issues like causation and determinism, symbolic and nonsymbolic
systems, the origin of system complexity, and, finally, the
notion of information. The book introduces technical concepts
such as information sets, encoding languages, material
implications, supports, and reading frames, to develop these
topics, and a class of systems called 'component-systems', to
give examples for self-modifying systems. As an application, it
is discussed how the latter can be applied to understand aspects
of evolution and cognition. 


From tgelder at phil.indiana.edu  Mon Jun 17 11:45:58 1991
From: tgelder at phil.indiana.edu (Timothy van Gelder)
Date: Mon, 17 Jun 91 10:45:58 EST
Subject: distribution and its advantages
Message-ID: <mailman.417.1149591216.29955.connectionists@cs.cmu.edu>


Javier Movellan's question -- what are distributed 
representations are good for anyway? -- is I think an 
important one for connectionism and cognitive science 
generally. Trouble is, the way it was put, it presupposes that 
there is some one kind of representation that everyone is 
referring to when they talk about distribution. In fact, though 
most people have a reasonable idea what they themselves 
intend when they use the term "distributed", they usually 
don't realize that its not the way many other people use it. 

This is immediately apparent if one takes an overview of the 
responses that actually came in. Various people took it that a 
representation is distributed if it utilizes many units rather 
than just one, with the "strength" of distribution increasing 
as the total number of units (or perhaps, the proportion of 
available units) used increases. Massone by contrast thought 
the key concept is that of redundancy, which I take roughly to 
mean that a given piece of input information is represented 
multiple times. This presumably requires that many units are 
used (i.e., that there is distribution in the previous sense) but 
is a significantly stronger requirement. Massone's position 
was echoed in some other responses. Chalmers claims that a 
distributed representation is one in which every 
representation, whether of a basic concept or a more complex 
one, has a kind of semantically significant internal structure. 
This definition also seems to presuppose the first kind of 
definition, but is different from redundancy. Proposing a 
somewhat different definition again, French suggested that 
distribution is a matter of the degree of "overlap" between 
representations of different entities. And so on.

This lack of agreement over what distribution actually is at 
least partly responsible for the fact that no really clear and 
useful consensus on the advantages of distributed 
representation really emerged in the responses to the initial 
question. It manifests a wider lack of agreement over the 
concept of distribution in connectionism and cognitive 
science more generally. I once surveyed as many of the 
definitions and occurrences of "distribution", "distributed 
representation", etc., as I could find in the cognitive science 
literature, and found that there were at least 5 very different 
basic properties that people often refer to as distribution. 
These ranged from a very simple notion of "spread-out-
ness" - each entity being represented by activity in many 
units rather than just one - at one extreme, to complete 
functional equipotentiality at the other. (A representation is 
functionally equipotential when any part of it can stand in for 
the whole thing. Holograms are famous for exhibiting a form 
of equipotentiality.) Authors often picked up multiple strands 
and ran them together in one characterization, or defined 
distribution differently on different occasions, sometimes 
even in the same work. 

Probably the two most common definitions are (1) the notion 
of simple extendedness just mentioned (i.e., using "many" 
units to represent a given item) and (2) superimposition of 
representations. We have superimposition when there are 
multiple items being represented at the same time, but no 
way of pointing to the discrete part of the representation 
which is responsible for item A, the discrete part which is 
responsible for item B, and so forth. Think of the weights in 
a standard feed-forward network. Here multiple input-output 
associations are represented at the same time, but there is (in 
general) no separate set of weights for each association. 

To see how these two senses simultaneously dominate 
connectionist discussions of distribution, think again of the 
answers to Movellan's question. Many of the answers took 
the form, roughly, that "when I used representations 
involving activity in many units rather than just one in such 
and such a network, I found better (or worse!) performance". 
Other responses, particularly those that made reference to 
the brain or neuropsychological results, were more concerned 
with the extent to which there is separate or discrete storage 
of the various components of our knowledge in a given 
circumscribed domain. (In these contexts, "graceful 
degradation" in performance is often thought to be a 
consequence of knowledge being stored in an inextricably 
superimposed fashion.)

In one sense, it is not surprising that these are the two most 
common notions of distribution. Perhaps the only thing that 
is really clear about distribution is the opposition between 
distribution and localization: whatever distributed 
representations are, they are non-local. Trouble is, "local" 
turns out to be ambiguous. Sometimes "local" means 
restricted in extent (e.g., using only one unit rather than 
many), and sometimes it means not overlapping with the 
representation of anything else. The two most common 
senses of "distribution" mentioned a moment ago simply 
result from denying locality in these two distinct senses. 

It seems to me that a necessary condition for any significant 
progress on the question "what are distributed 
representations good for?" is that this general state of 
confusion over what "distributed" means be resolved. This 
means clearly laying out the different senses that are floating 
around, picking out the one that is the most central and most 
theoretically significant, and giving it a reasonably precise 
definition. I attempted this in Ch.1 of my PhD dissertation 
(Distributed Representation, University of Pittsburgh 1989); 
a shorter overview of some of the material from that chapter 
has recently appeared as "What is the D in PDP? An overview 
of the concept of distribution" in Stich, Ramsey & Rumelhart 
(eds) Philosophy and Connectionist Theory. 

In my opinion, the most important concept in the vicinity of 
distribution is that of superimposition of representations, 
and it is for this that the term "distributed" should really be 
reserved. One advantage of this strategy is that 
superimposition admits of a surprisingly clear and satisfying 
mathematical definition:

Suppose R is a representation of multiple items. If the 
representings of the different items are fully superimposed, 
every part of the representation R must be implicated in 
representing each item. If this is achieved in a non-trivial 
way there must be some encoding process that generates R 
given the various items to be stored, and which makes R 
vary, at every point, as a function of each item. This process 
will be implementing a certain kind of transformation from 
items to representations. This suggests thinking of 
distribution more generally in terms of mathematical 
transformations exhibiting a certain abstract structure of 
dependency of the output on the input. More precisely, define 
any transformation from a function F to another function G 
as strongly distributing just in case the value of G at any 
point varies with the value of F at every point; the Fourier 
transform is a classic example. Similarly, a transformation 
from F to G is weakly distributing, relative to a division of 
the domain of F into a number of sub-domains, just in case 
the value of G at every point varies as a function of the value 
of F at at least one point in each sub-domain. The classic 
example here is the linear associator, in which a series of 
vector pairs are stored in a weight matrix by first forming, 
and then adding together, their respective outer products. 
Each element of the matrix varies with every stored vector, 
but only with one element of each of those vectors. (The 
"functions" F and G in this case describe the input vectors 
and the association matrix respectively; e.g., given an 
argument specifying a place in an input vector, F returns the 
value of the vector at that place.)

Clearly, a given distributing transformation yields a 
whole space of functions resulting from applying that 
transformation to different inputs (i.e., different functions 
F). If we think of these output functions as descriptions of 
representations, and the input functions as descriptions of 
items to be represented, the distributing transformation is 
defining a whole space or scheme of distributed 
representations. To be a distributed representation, then, is 
to be a member of such a scheme; it is to be a representation 
R of a series of items C such that the encoding process which 
generates R on the basis of C implements a given distributing 
transformation.

Basically, then, distributed representations are what you get 
from distributing transformations, which are 
transformations which make each part of the output (the 
representation) depend on every part of the input (what 
you're representing). Now, mathematically speaking, there is 
a vast number of different kinds of distributing 
transformations, and so there is a vast number of possible 
instantiations of distributed representation. Connectionists 
can be seen as exploring that portion of the space of possible 
transformations that you can handle with n-dimensional 
vector operations, learning algorithms, etc. In other domains 
such as optics it is possible to implement other forms of 
distributing transformations and hence to get distributed 
representations with different properties.

There are a number of reasons for wanting to define 
distributed representation in terms of superimposition 
generally, and distributed transformations in particular:
(a) superimposition is certainly one of the most common of 
the standard senses of "distribution" in current usage, and so 
we remain as close as possible to that usage;
(b) superimposition admits of a precise mathematical 
definition, so those who think clarity only comes from 
formalization should be kept happy; 
(c) various popular properties of distributed representation 
such as automatic generalization and graceful degradation are 
a natural consequence of distribution defined this way;
(d) in practice, in a connectionist context, distribution in the 
sense of requiring many units rather than just one is a 
necessary precondition of this more full-blooded notion; 
hence any advantages that accrue to representations in virtue 
of utilizing many units also accrue to superimposed 
representations;
(e) a number of other interesting theoretical results follow 
from defining distribution this way: in particular, it can be 
shown that distributed representations cannot be symbolic in 
nature, on a reasonably precise definition of "symbolic" (see 
e.g. my "Why distributed representation is inherently non-
symbolic", in G. Dorffner (ed.) Konnektionismus in Artificial 
Intelligence und Kognitionsforschung. Berlin: Springer-
Verlag, 1990; 58-66). 

On the basis of this kind of definition of what distributed 
representation is, what kind of answer can be given to the 
"what are distributed representations good for?" question? 
Well, the kind of answer you will find satisfying will depend 
very much on what your theoretical interests are. A 
connectionist whose concerns have more of an applied, 
engineering focus will want to know what specific processing 
benefits arise from using representations generated by 
distributing transformations. As mentioned in (c) above, I 
think that some of the favorite virtues of distribution are 
best seen as an immediate consequence of superimposition. 
The technical issues here still need much clarification, 
however.

As a cognitive scientist, on the other hand, I'm interested in 
more general questions such as - what are the advantages of 
distribution for human knowledge representation? Here I 
don't have any actual answers ready to hand; the most I can 
do the moment is point to the kind of question that seems the 
most interesting. Speaking at the broadest possible level: 
various difficulties encountered in mainstream AI, combined 
with some philosophical reflections, suggest that everyday 
commonsense knowledge cannot be fully and effectively 
captured in any kind of purely symbolic format; that, in other 
words, symbolic representation is fundamentally the wrong 
medium for capturing at least certain kinds of human 
knowledge. Just above I mentioned that distributed 
representation (defined in terms of superimposition) can be 
shown to be intrinsically non-symbolic. The obvious 
suggestion then is: perhaps the most important advantage of 
distributed representation is that it (and it alone?) is capable 
of representing the kind of knowledge that underlies everyday 
human competence?

Tim van Gelder

From tsejnowski at UCSD.EDU  Mon Jun 17 13:14:00 1991
From: tsejnowski at UCSD.EDU (Terry Sejnowski)
Date: Mon, 17 Jun 91 10:14:00 PDT
Subject: Santa Fe Time Series Competition
Message-ID: <9106171714.AA23031@sdbio2.UCSD.EDU>

             A Time Series Prediction and Analysis Competition

                         The Santa Fe Institute
                     
                    August 1, 1991 - December 31, 1991

A wide range of new techniques are now being applied to the time series
analysis problems of predicting the future behavior of a system and deducing
properties of the system that produced the time series. Such problems arise in
most observational disciplines, including physics, biology, and economics; new
tools, such as the use of connectionist models for forecasting, or the
extraction of parameters of nonlinear systems with time-delay embedding,
promise to provide results that are unobtainable with more traditional time
series techniques. Unfortunately, the realization and evaluation of this
promise has been hampered by the difficulty of making rigorous comparisons
between competing techniques, particularly ones that come from different
disciplines.

In order to facilitate such comparisons and to foster contact among the
relevant disciplines, the Santa Fe Institute is organizing a time series
analysis and prediction competition. A few carefully chosen experimental time
series will be made available through a computer at the Santa Fe Institute,
and quantitative analyses of these data will be collected in the areas of
forecasting, characterization (evaluating dynamical measures of the system
such as the number of degrees of freedom and the information production rate),
and system identification (inferring a model of the system's governing
equations). At the close of the competition the performance of the techniques
submitted will be compared and published, and the server will continue to
operate as an archive of data, programs, and comparisons among algorithms.
There will be no monetary prizes. A workshop is planned for the Spring of 1992
to explore the results of the competition.

The competition does not require advance registration; to enter, simply
retrieve the data and submit your analysis. The detailed description of the
competition categories and instructions for retrieving the data and entering
the competition will be available after August 1 through four routes:

                             ACCESSING THE DATA
                             --------- --- ----

        ftp: Ftp to sfi.santafe.edu (192.12.12.1) as user "tsguest" and use
             "tsguest" for the password. Get the file "instructions".    
    dial-up: There are two dial-up lines: 505-988-1705 (2400 baud), and
             505-986-0252 (any speed to 9600 baud). The settings for both
             lines are no parity, 8 bit words, 1 stop bit. At the connect
             press return; at the <cmd> prompt type "login tsguest" and
             use "tsguest" for the password. At the next <cmd> prompt type
             "telnet sfi" and login as user "tsguest" (password "tsguest").
             Using either "kermit" or "xmodem", retrieve the file
             instructions". When you are finished, logout from sfi and from
             the <cmd> prompt.
mail server: Send email to tserver at sfi.santafe.edu with the phrase
             "send time series instructions" in either the subject or the body
             of the message. The mailer will return a file with more
             detailed instructions for requesting the data and submitting
             analyses.
   pc disks: The data is available on disks in either IBM-PC or Mac
             formats. To cover the cost of distributing the data, send $25 to
             Time Series Competition Disks, The Santa Fe Institute, 1120
             Canyon Road, Santa Fe, NM 87501, and specify the machine type,
             disk size, and disk density required. Instructions will be
             included with the disks on submitting a return disk with the
             analysis of the data.


                           FOR MORE INFORMATION
                           --- ---- -----------

Further questions about the competition, or inquiries about contributing
data to be used in the competition, should be directed to:

                          Time Series Competition
                            Santa Fe Institute
                       1660 Old Pecos Trail, Suite A
                            Santa Fe, NM 87501
                             (505) 984--8800
                         tserver at sfi.santafe.edu

or to one of the organizers:

        Neil Gershenfeld                Andreas Weigend
        Department of Physics           Xerox Palo Alto Research Center
        Harvard University              3333 Coyote Hill Road
        15 Oxford Street                Palo Alto, CA  94304
        Cambridge, MA  02138            (415) 322-4066
        (617) 495-5641                  andreas at sfi.santafe.edu
        neilg at sfi.santafe.edu           
    

                              ADVISORY BOARD
                              -------- -----

     Prof. Leon Glass              Department of Physiology
                                   McGill University

     Prof. Clive W. J. Granger     Center for Econometric Analysis
                                   Department of Economics
                                   University of California, San Diego

     Prof. William H. Press        Department of Physics and Center
                                   for Astrophysics
                                   Harvard University

     Prof. Maurice B. Priestley    Department of Mathematics
                                   The University of Manchester Institute of
                                   Science and Technology

     Prof. Itamar Procaccia        Department of Chemical Physics
                                   The Weizmann Institute of Science

     Prof. T. Subba Rao            Department of Mathematics
                                   The University of Manchester Institute of
                                   Science and Technology
 
     Prof. Harry L. Swinney        Department of Physics
                                   University of Texas at Austin


From pazzani%pan.ICS.UCI.EDU at VM.TCS.Tulane.EDU  Tue Jun 18 14:10:12 1991
From: pazzani%pan.ICS.UCI.EDU at VM.TCS.Tulane.EDU (Michael Pazzani)
Date: Tue, 18 Jun 91 11:10:12 -0700
Subject: Special Issue of Machine Learning Journal
Message-ID: <9106181110.aa28419@PARIS.ICS.UCI.EDU>


MACHINE LEARNING will be publishing a special issue on Computer Models
of Human Learning.  The ideal paper would describe an aspect of human
learning, present a computational model of the learning behavior,
evaluate how the performance of the model compares to the performance
of human learners, and describe any additional predictions made by the
computational model. Since it is hoped that the papers will be of
interest to both cognitive psychologists and computer scientists,
papers should be clearly written and provide the background
information necessary to appreciate the contribution of the
computational model.

Manuscripts must be received by April 1, 1992, to assure full
consideration.  One copy should be mailed to the editor:

        Michael Pazzani
        Department of Information and Computer Science
        University of California,
        Irvine, CA 92717
        USA

In addition, four copies should be mailed to:
        Karen Cullen
        MACH Editorial Office
        Kluwer Academic Publishers
        101 Philip Drive
        Assinippi Park
        Norwell, MA 02061
        USA

Papers will be subject to the standard review process.  Please pass
this announcement along to interested colleagues.

From pazzani at pan.ICS.UCI.EDU  Tue Jun 18 14:10:12 1991
From: pazzani at pan.ICS.UCI.EDU (Michael Pazzani)
Date: Tue, 18 Jun 91 11:10:12 -0700
Subject: Special Issue of Machine Learning Journal
Message-ID: <9106181110.aa28419@PARIS.ICS.UCI.EDU>


MACHINE LEARNING will be publishing a special issue on Computer Models
of Human Learning.  The ideal paper would describe an aspect of human
learning, present a computational model of the learning behavior,
evaluate how the performance of the model compares to the performance
of human learners, and describe any additional predictions made by the
computational model. Since it is hoped that the papers will be of
interest to both cognitive psychologists and computer scientists,
papers should be clearly written and provide the background
information necessary to appreciate the contribution of the
computational model.

Manuscripts must be received by April 1, 1992, to assure full
consideration.  One copy should be mailed to the editor:

	Michael Pazzani
	Department of Information and Computer Science
	University of California,
	Irvine, CA 92717
	USA

In addition, four copies should be mailed to:
	Karen Cullen
	MACH Editorial Office
	Kluwer Academic Publishers
	101 Philip Drive
	Assinippi Park
	Norwell, MA 02061
	USA

Papers will be subject to the standard review process.  Please pass
this announcement along to interested colleagues.

From pollack at cis.ohio-state.edu  Tue Jun 18 11:28:35 1991
From: pollack at cis.ohio-state.edu (Jordan B Pollack)
Date: Tue, 18 Jun 91 11:28:35 -0400
Subject: Neuroprose Turbulence Expected
Message-ID: <9106181528.AA01029@dendrite.cis.ohio-state.edu>

Cheops, the pyramid machine upon which NEUROPROSE resides, will be
decommissioned. The Neuroprose archive will move, with luck, to a new
Sparcserver at the same IP address also called Cheops.  But between
today and July 1, all cis.ohio-state.edu systems (including email) 
will be pretty wobbly, so expect delays. 

Jordan Pollack                            Assistant Professor
CIS Dept/OSU                              Laboratory for AI Research
2036 Neil Ave                             Email: pollack at cis.ohio-state.edu
Columbus, OH 43210                        Phone: (614)292-4890 (then * to fax)


From uh311ae at sunmanager.lrz-muenchen.de  Tue Jun 18 18:31:23 1991
From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges)
Date: 19 Jun 91 00:31:23+0200
Subject: large SIMD nn machines, ASI
Message-ID: <9106182231.AA12381@sunmanager.lrz-muenchen.de>

Hello,

I wonder wether there are any other beta testers of the ASI Cnaps
machine out who might want to share some experiences. Specifically,
has anyone
    - implemented a non-local algorithm (CG, PCG),
    - implemented a good random number generator memory efficient
      enough to be put into node memory / what do you think about
      tables or host communication for an alternative implementation ?
    - thought about interfacing some hardware as preprocessor, pi-
      ping data in via DMA ?
    - found a job for idle processors (small net sizes)
    - liked the 1-bit weight mode
    - ported the debugger to Irix
    - (other)
?
Some of these questions should be familiar to other SIMD programmers
to ( I have the witbrock gf11 paper). Thank you for any hints.

Cheers, Henrik  (Rick at vee.lrz-muenchen.de)

H. Klagges, Laser Institute Prof Haensch, PhysDep U of Munich, FRG
+ IBM Research Division, Binnig group


From uh311ae at sunmanager.lrz-muenchen.de  Tue Jun 18 18:44:50 1991
From: uh311ae at sunmanager.lrz-muenchen.de (Henrik Klagges)
Date: 19 Jun 91 00:44:50+0200
Subject: Backpercolation
Message-ID: <9106182244.AA12446@sunmanager.lrz-muenchen.de>

Hello,

I wonder wether the backpercolation algorithm (see back articles in
comp.ai.neural-nets) is important or not. I got some very preliminary
results on very simple problems (n-n-n linear channel with few (3-10)
patterns) which look not bad, but complicated ones don't seem parti-
cularly zooming yet (yes, there are some bugs in my code left). If
anyone likes a C++ - backperc - server object (guarantedly broken) to
avoid reinventing the wheel and get some basic data structures, let
me know. The only problem: Mark Jurik (mgj at cup.portal.com) wants you
to sign a nondisclosure thing first before I can sent it out to you.

Anyway, if someone else has some first results, I would really like
to see them.

Cheers, Henrik  (Rick at vee.lrz-muenchen.de)


From ITGT500 at INDYCMS.BITNET  Tue Jun 18 16:32:57 1991
From: ITGT500 at INDYCMS.BITNET (Bo Xu)
Date: Tue, 18 Jun 91 15:32:57 EST
Subject: Distributed vs. local representation
Message-ID: <25A077F07E800064@BITNET.CC.CMU.EDU>

Following I would like to state my views on the distributed and local
representations.  All comments are more than welcome.
 
I think that if we define a strict local representation as: "one object
(or item, entity, etc.) is represented by one node (or unit, neuron, etc.)
only, and one node represents only one object", then all the other situations
probably can be classified as distributed representation (either semi- or
full-distributed).  In other words, only the one-to-one representation
belongs to local representation.  The others, multiple-to-one, one-to-
multiple, and multiple-to-multiple representations all belong to distributed
representation.  Therefore, the distributed representation has more senses
than local.  This may help reduce the confusion regarding these definitions.
 
Because distributed representation covers more range than local, there are
many different appearance in distributed representation.
 
One point unnoticed up to now is the difference between the "binary
representation" (the node takes binary values only) and the "analog
representation" (the node takes analog values).  In NETtalk and many other
examples, the distributed representation used seems to be the binary one.
However, the world seems favoring and is taking the analog form.  Therefore,
analog distributed representation probably is the one that is working and
dominating our cognitive processes.  I met one such problem in our work on
parabolic problem.  We found that if it was not impossible, it would be very
difficult to use (strict) local or binary distributed representation to
solve the parabolic problem.  It was only the analog distributed representation
that worked well.  We concluded that from the practical application viewpoint,
local and the distributed representation all would work if the training and
test patterns were discrete and finite.  However, if the training and/or test
patterns were continuous and infinite, only distributed representation worked.
 
-Bo

From aam9n at hagar2.acc.Virginia.EDU  Wed Jun 19 04:39:39 1991
From: aam9n at hagar2.acc.Virginia.EDU (Ali Ahmad Minai)
Date: Wed, 19 Jun 91 04:39:39 EDT
Subject: Distributed Representations
Message-ID: <9106190839.AA00322@hagar2.acc.Virginia.EDU>


We connectionists never tire of talking about "distributed representations",
and with good reason. However, I have never come across a rigorous definition
of the concept. Now, I realize that this notion, like most powerful ones,
will necessarily be diminished in any process of definition, however inclusive
that might be. That has not fazed us in trying to define entropy, information,
complexity, learnability --- and probability! My question is: has anyone
rigorously, or even empirically, tried to come up with a definition for
distributed representations --- especially a way to quantify distributed-ness?
I suppose high-order statistics represent a way to look at this, but have
there been any attempts to develop a definition specifically in the context
of connectionist networks? And would that be such a bad thing?


Ali Minai
Dept of EE
University of Virginia
aam9n at Virginia.EDU

From maureen at ai.toronto.edu  Wed Jun 19 11:38:49 1991
From: maureen at ai.toronto.edu (Maureen Smith)
Date: Wed, 19 Jun 1991 11:38:49 -0400
Subject: Announce new CRG Technical Report
Message-ID: <91Jun19.113852edt.780@neuron.ai.toronto.edu>


The following technical report is available for ftp from the neuroprose
archive. A hardcopy may also be requested. (See below for details.)

Though written for a statistics audience, this report should be of interest
to connectionists and others interested in machine learning, as it reports
a Bayesian solution for one type of "unsupervised concept learning". The
technique employed is also related to that used in Boltzmann Machines.


            Bayesian Mixture Modeling by Monte Carlo Simulation

                             Radford M. Neal

                       Technical Report CRG-TR-91-2
                      Department of Computer Science
                          University of Toronto

    It is shown that Bayesian inference from data modeled by a mixture
    distribution can feasibly be performed via Monte Carlo simulation.
    This method exhibits the true Bayesian predictive distribution,
    implicitly integrating over the entire underlying parameter space. 
    An infinite number of mixture components can be accommodated without
    difficulty, using a prior distribution for mixing proportions that
    selects a reasonable subset of components to explain any finite
    training set. The need to decide on a ``correct'' number of components
    is thereby avoided. The feasibility of the method is shown empirically
    for a simple classification task.


To obtain a compressed PostScript version of this report from neuroprose,
ftp to "cheops.cis.ohio-state.edu" (128.146.8.62), log in as "anonymous"
with password "neuron", set the transfer mode to "binary", change to the
directory "pub/neuroprose", and get the file "neal.bayes.ps.Z". Then
use the command "uncompress neal.bayes.ps.Z" to convert the file to 
PostScript.

To obtain a hardcopy version of the paper by physical mail, send mail
to :		Maureen Smith 
		Department of Computer Science
		University of Toronto
		6 King's College Road
		Toronto, Ontario
		M5A 1A4

From schraudo at cs.UCSD.EDU  Wed Jun 19 21:39:56 1991
From: schraudo at cs.UCSD.EDU (Nici Schraudolph)
Date: Wed, 19 Jun 91 18:39:56 PDT
Subject: hertz.refs.bib patch
Message-ID: <9106200139.AA29142@beowulf.ucsd.edu>

In adding the "HKP:" prefix to the citation keys in the BibTeX version
of the Hertz/Krogh/Palmer bibliography I forgot to modify the internal
cross-citations accordingly.  I've appended the necessary patch below;
it only involves three lines, but those who don't feel up to the task
can ftp the patched file (still called hertz.refs.bib.Z) from neuroprose.

My apologies for the invonvenience,

- Nici Schraudolph.


Here's the patch:

*** hertz.refs.bib	Wed Jun 19 18:23:36 1991
***************
*** 73,80 ****
  @string{snowbird = "Neural Networks for Computing"}
  
  % -------------------------------- Books ---------------------------------
! @string{inAR = "Reprinted in \cite{Anderson88}"}
! @string{partinAR = "Partially reprinted in \cite{Anderson88}"}
  @string{pdp = "Parallel Distributed Processing"}
  
  % ------------------------------- Journals ---------------------------------
--- 73,80 ----
  @string{snowbird = "Neural Networks for Computing"}
  
  % -------------------------------- Books ---------------------------------
! @string{inAR = "Reprinted in \cite{HKP:Anderson88}"}
! @string{partinAR = "Partially reprinted in \cite{HKP:Anderson88}"}
  @string{pdp = "Parallel Distributed Processing"}
  
  % ------------------------------- Journals ---------------------------------
***************
*** 3500,3506 ****
          pages = "75--112",
        journal =  cogsci,
         volume =  9,
!          note = "Reprinted in \cite[chapter 5]{Rumelhart86a}",
           year =  1985
    }
  
--- 3500,3506 ----
          pages = "75--112",
        journal =  cogsci,
         volume =  9,
!          note = "Reprinted in \cite[chapter 5]{HKP:Rumelhart86a}",
           year =  1985
    }
  

From rich at gte.com  Thu Jun 20 10:24:53 1991
From: rich at gte.com (Rich Sutton)
Date: Thu, 20 Jun 91 10:24:53 -0400
Subject: Job Announcement - GTE
Message-ID: <9106201424.AA29945@bunny>


The connectionist machine learning project at GTE Laboratories is
looking for a researcher in computational models of learning and
adaptive control.  Applications from highly-qualified candidates are
solicited.  A demonstrated ability to perform and publish world-class
research is required.  The ideal candidate would also be interested in
pursuing applications of their research within GTE businesses.  GTE is a
large company with major businesses in local telphone operations, mobile
communications, lighting, precision materials, and government systems.
GTE Labs has had one of the largest machine learning research groups in
industry for about seven years.

A doctorate in Computer Science, Computer Engineering or Mathematics
is required.  A demonstrated ability to communicate effectively in
writing and in technical and business presentations is also required.

Please send resumes and correspondence to:

June Pierce
GTE Labs MS-44
40 Sylvan Road
Waltham, MA 02254
USA

From ga1043 at sdcc6.UCSD.EDU  Thu Jun 20 12:48:06 1991
From: ga1043 at sdcc6.UCSD.EDU (ga1043)
Date: Thu, 20 Jun 91 09:48:06 PDT
Subject: Super-Turing discussion
Message-ID: <9106201648.AA15438@sdcc6.UCSD.EDU>


A couple of months ago, there was a discussion on the network about neural
nets, their capabilities, super-Turing machines, etc.  About five or six
references were mentioned.  Does anyone have a list of those refereces, or
a copy of that discussion?  If you could forward the information to me at
ga1043 at sdcc6.ucsd.edu, I would appreciate it.

Valerie Hardcastle

From rstark at aipna.edinburgh.ac.uk  Thu Jun 20 12:29:54 1991
From: rstark at aipna.edinburgh.ac.uk (rstark@aipna.edinburgh.ac.uk)
Date: Thu, 20 Jun 91 12:29:54 BST
Subject: Distributed vs. Localist Representations
Message-ID: <4210.9106201129@fal.aipna.ed.ac.uk>


One aspect of this issue which seems implicit in much of this discussion
is the notion that distributed representation can be considered
a *relative* property.  Thus the "room schema" network is "distributed"
relative to rooms, but "localist" relative to ovens.  Likewise,
the Jets and Sharks model, which is considered to be strictly localist
in the sense that each unit explictly represents a single concept
(eg. "is-in-thirties"), does produce representations that
are distributed relative to individual gang members.  Andy Clark
notes this in Microcognition.  

Does this seem correct?  Is anyone uncomfortable with calling the
Jets and Sharks a "distributed" model since each individual is
represented by a pattern over the units (one unit active in each
competition network), even though each unit can be clearly labelled in
a localist fashion?

Note that his notion of relativity in distributed
representation is (I believe) distinct from its continuous aspects
(seen in references to "paritally-" or "semi-" distributed representations),
which may be quantifiable using eg. Tim Van Gelder's proposal
of degree of superimposition.

-Randall Stark

---------------------------------------------------------------------------
Randall Stark TEL: (+44)-31-650-2725       | Dept of Artifical Intelligence
JANET: rstark at uk.ac.ed.aipna               | 80, South Bridge
ARPA: rstark%uk.ac.ed.aipna at nsfnet-relay   | University of Edinburgh
UUCP: ...!uunet!mcsun!ukc!aipna!rstark     | Edinburgh, EH1 1HN, UK
---------------------------------------------------------------------------

From haffner at lannion.cnet.fr  Fri Jun 21 11:36:19 1991
From: haffner at lannion.cnet.fr (Haffner Patrick)
Date: 21 Jun 91 17:36:19+0200
Subject: POST-DOCTORAL VACANCY : Connectionism and Oral Dialogue
Message-ID: <9106211536.AA02620@lsun26>

Applications are invited for research assistantship(s) for post-doctoral
or sabbatical candidates. Funding at the  French National Telecommunications
Research Centre (Centre National d'Etudes des Telecommunications, CNET)
will commence in September '91 for a two-year period ; the work location
will be Lannion, Brittany, France. Experience is required in Natural
Language Processing, especially Oral Dialogue Processing, by Connectionist
methods. Applicants should specify the period between Sept '91 and Sept '93
which interests them.
Applications, including CV/Resume, should be sent to :

Mme Christel Sorin

CNET LAA/TSS/RCP
BP 40
 22301 LANNION CEDEX
FRANCE

TEL : +33 96-05-31-40
FAX : +33 96-05-35-30
E-MAIL : sorin at lannion.cnet.fr


From ITGT500 at INDYCMS.BITNET  Thu Jun 20 11:55:32 1991
From: ITGT500 at INDYCMS.BITNET (Bo Xu)
Date: Thu, 20 Jun 91 10:55:32 EST
Subject: Distributed Representations
In-Reply-To: Your message of Wed, 19 Jun 91 04:39:39 EDT
Message-ID: <mailman.418.1149591216.29955.connectionists@cs.cmu.edu>

 
----------------------------Original message----------------------------
Two days ago I mentioned (strict) local representation, binary distributed
representation, and analog distributed representation.
 
As an attempt to answer Ali Minai's question, I try to give my understandings
on representations as follows:
 
(1). In my opinion, the key points underlying the definitions of representa-
tions are the correspondences between the objects (or items, entities, etc.)
to be represented by the units (or nodes, neurons, etc.) of the network and
the units.  The objects can be classified according to the properties they
have.  More than one object probably can possess the same property.  In this
case, these objects should be classified into the same group with this
property.  The units can represent different properties of the objects, or
different objects within the same property group.
     As mentioned in my mail two days ago, there are four kinds of
correspondences existed for the relationships between the objects and units:
one-to-one, multiple-to-one, one-to-multiple, and multiple-to-multiple.
     If we define the (strict) local representation as the one that represents
the one-to-one correspondence only, then all the other three correspondences
can be called distributed representations.
     However, since there are three different correspondences in distributed
representation, the word "distributed representation" will probably be a too
broad or too general concept if we try to use one definition "distributed
representation" to refer to all the three correspondences.  It is perhaps
due to this too general word or concept that brought about the confusion on the
advantages and disadvantages of local representation vs. distributed
representation.
 
(2). In an attempt to clarify these confusions, I think it is necessary
to give more specific definitions to all these four different correspondences.
     Followings are my attempt to define these representations:
 
Local Representation ---- The one-to-one correspondence in which each object
      is represented by one unit, and each unit represents only one object.
      Units in local representation always take binary values.
 
Binary Distributed Representation ---- The one-to-multiple correspondence
      in which each object is represented by multiple units and each unit
      is employed to represent only one object.  The unit takes only binary
      values here because it represents only one object, there is no need
      for it to take analog values.
 
Analog Distributed Representation ---- The multiple-to-one correspondence
      in which multiple objects with the same property are represented by
      one unit and each unit represents multiple objects with the same
      property only.  Here the unit takes different analog values for
      different objects within this property group. Different analog
      values are used to differentiate these different objects within the
      same property group.
 
Mixed Distributed Representation ---- The multiple-to-multiple correspondence
       in which multiple objects of multiple properties are represented by
       one unit and each unit represents multiple objects with multiple
       properties. Here, the units take either binary or analog values
       depending on the properties and the object they represent.
 
 
I am not sure whether the above definitions clarify these concepts and
reduce the confusions on these problems or not.
 
Welcome your comments on above statements.
 
 
Bo Xu
Dept. of Physiology and Biophysics
School of Medicine
Indiana University
ITGT500 at INDYCMS.BITNET

From hwang at pierce.ee.washington.edu  Fri Jun 21 14:54:47 1991
From: hwang at pierce.ee.washington.edu ( J. N. Hwang)
Date: Fri, 21 Jun 91 11:54:47 PDT
Subject: IJCNN'91 Presidents' Forum  (new announcement from Prof. Marks)
Message-ID: <9106211854.AA13350@pierce.ee.washington.edu.>


News release
IEEE NEURAL NETWORKS COUNCIL IS SPONSORING A PRESIDENTS' FORUM AT
IJCNN `91 IN SEATTLE, WASHINGTON

Robert J. Marks II, Professor at the University of Washington and President of the IEEE Neural Networks Council (NNC), has announced that for the first time 
the IEEE/NNC will be sponsoring a Presidents' Forum during IJCNN `91 in
Seattle, Washington, July 8-12, 1991.

The participants of the Presidents' Forum will be the Presidents of the
major artificial neural network societies of the world, including the China
Neural Networks Committee, the Joint European Neural Network Initiative, the
Japanese Neural Networks Society and the Russian Neural Networks Society.  The 
Forum will be open to conference attendees and the press on Wednesday evening,
6:30-8:30 pm, July 10, 1991, at the Washington State Convention Center in
Seattle.  Each President will give a short (15-20 minute) presentation of the
activities of their society, followed by a short question/answer period.  
Robert J. Marks II will be this year's moderator.


From aam9n at honi4.acc.virginia.edu  Thu Jun 20 17:39:38 1991
From: aam9n at honi4.acc.virginia.edu (aam9n)
Date: Thu, 20 Jun 91 17:39:38 EDT
Subject: Distributed Representations
Message-ID: <9106202139.AA00551@honi4.acc.Virginia.EDU>

 
Bo Xu presents a very interesting classification of representations
in terms of their distribution over representational units. The definitions
of each class are internally clear enough, but I have some comments about
how "distributivity" is defined, and where it leads. Let's take the definitions
that Bo Xu gives:
 
>Local Representation ---- The one-to-one correspondence in which each object
>      is represented by one unit, and each unit represents only one object.
>      Units in local representation always take binary values.
 
No quarrel about this one being a local representation.
 
>Binary Distributed Representation ---- The one-to-multiple correspondence
>      in which each object is represented by multiple units and each unit
>      is employed to represent only one object.  The unit takes only binary
>      values here because it represents only one object, there is no need
>      for it to take analog values.
 
Suppose I have two objects --- an apple and a pear --- and six representational
units r1.....r6. Then, if I read this definition correctly, a distributed
representation might be 000111 <-> apple and 111000 <-> pear. Since the units
are binary, they are presumably "on" if the object is present and "off" if it
is not. No reference is made to "properties" defining the object, and so there
is no semantic content in any unit beyond that of mere signification: each
unit is, ideally, identical. The question is: why have three units signifying
one object when they work as one? One reason might be to achieve redundancy,
and consequent fault-tolerance, through a voting scheme (e.g. 101001 <-> pear).
Is this a distributed representation, though? To decide that, I must have
an *external* definition of what it means for a representation to be
distributed. Tentatively, I say that "a representation is distributed over
a group of units if no single unit's correct operation is critical to the
representation". This certainly holds in the above example. It holds, indeed,
in all error-correcting codes. In a binary distributed representation, then,
I can define the "degree of distributivity" as the minimum Hamming distance
of the code. This is quite consistent, if rather disappointingly mundane.
 
>Analog Distributed Representation ---- The multiple-to-one correspondence
>      in which multiple objects with the same property are represented by
>      one unit and each unit represents multiple objects with the same
>      property only.  Here the unit takes different analog values for
>      different objects within this property group. Different analog
>      values are used to differentiate these different objects within the
>      same property group.
 
Here, under the obvious reading of this definition, I have two categories
(units) called "fruits" and "vegetables". Each represents many objects
with different values, but mutually exclusively. Thus, I might have
apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code
0.1,0.1. This is obviously equivalent to a binary representation with
each unit replaced by, say, n binary units. The question is: does this
code embody the principle of dispensibility? Not necessarily. One wrong bit
could change an apple into a lemon, or even lose all information about the
category of the object. Thus, in the general case, such a representation
is "distributed" only in the physical sense of activating (or not activating)
units in a group. Each unit is still functionally critical.
 
>Mixed Distributed Representation ---- The multiple-to-multiple correspondence
>       in which multiple objects of multiple properties are represented by
>       one unit and each unit represents multiple objects with multiple
>       properties. Here, the units take either binary or analog values
>       depending on the properties and the object they represent.
 
Now here we have what most people mean by "distributed representations". We
have many properties, each represented by a unit, and many objects. Each
object can be encoded in terms of its properties. If the set of properties
does not have enough discrimination, multiple objects could have the same
code. Even if the property set is sufficient for unique representation, it
is possible that the malfunction of one unit may change one object to
another. The question then is: is this dependency small or large? Does
a small malfunction in a unit cause catastrophic change in the semantic
content of the whole group of units? I can "distribute" my representation
over all the atoms in the universe, but if that doesn't give me some
protection from point failures, I have not truly "distributed" things
at all --- merely multiplied the local representation. Now, of course, in
the "real" world where things are uniformly or normally distributed and
errors are uncorrelated, increasing the size of a representation over a
set of independent units will almost always confer some degree of protection
from catastrophic point failures. An important issue is how to *maximize*
this. And to do that, we must be able to measure it. One way would be to
minimize the average information each representational unit conveys about
the represented objects, which is a simple maximum entropy formulation.
This requirement must, of course, be balanced by an adequate representation
imperative. Other formulations are certainly possible, and probably much
better. In any case, many of the more interesting issues in distributed
representation arise when the "object" being represented is only implicitly
available, or when the representation is distributed over a hierarchy of
units, not all of which are directly observable, and not all of which
count in the final encoding.
 
Comments?
 
Ali Minai
aam9n at Virginia.EDU

From aam9n at honi4.acc.virginia.edu  Thu Jun 20 17:39:38 1991
From: aam9n at honi4.acc.virginia.edu (aam9n)
Date: Thu, 20 Jun 91 17:39:38 EDT
Subject: Distributed Representations
Message-ID: <9106202139.AA00551@honi4.acc.Virginia.EDU>

 
Bo Xu presents a very interesting classification of representations
in terms of their distribution over representational units. The definitions
of each class are internally clear enough, but I have some comments about
how "distributivity" is defined, and where it leads. Let's take the
 definitions
that Bo Xu gives:
 
>Local Representation ---- The one-to-one correspondence in which each object
>      is represented by one unit, and each unit represents only one object.
>      Units in local representation always take binary values.
 
No quarrel about this one being a local representation.
 
>Binary Distributed Representation ---- The one-to-multiple correspondence
>      in which each object is represented by multiple units and each unit
>      is employed to represent only one object.  The unit takes only binary
>      values here because it represents only one object, there is no need
>      for it to take analog values.
 
Suppose I have two objects --- an apple and a pear --- and six
 representational
units r1.....r6. Then, if I read this definition correctly, a distributed
representation might be 000111 <-> apple and 111000 <-> pear. Since the units
are binary, they are presumably "on" if the object is present and "off" if it
is not. No reference is made to "properties" defining the object, and so there
is no semantic content in any unit beyond that of mere signification: each
unit is, ideally, identical. The question is: why have three units signifying
one object when they work as one? One reason might be to achieve redundancy,
and consequent fault-tolerance, through a voting scheme (e.g. 101001 <->
 pear).
Is this a distributed representation, though? To decide that, I must have
an *external* definition of what it means for a representation to be
distributed. Tentatively, I say that "a representation is distributed over
a group of units if no single unit's correct operation is critical to the
representation". This certainly holds in the above example. It holds, indeed,
in all error-correcting codes. In a binary distributed representation, then,
I can define the "degree of distributivity" as the minimum Hamming distance
of the code. This is quite consistent, if rather disappointingly mundane.
 
>Analog Distributed Representation ---- The multiple-to-one correspondence
>      in which multiple objects with the same property are represented by
>      one unit and each unit represents multiple objects with the same
>      property only.  Here the unit takes different analog values for
>      different objects within this property group. Different analog
>      values are used to differentiate these different objects within the
>      same property group.
 
Here, under the obvious reading of this definition, I have two categories
(units) called "fruits" and "vegetables". Each represents many objects
with different values, but mutually exclusively. Thus, I might have
apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code
0.1,0.1. This is obviously equivalent to a binary representation with
each unit replaced by, say, n binary units. The question is: does this
code embody the principle of dispensibility? Not necessarily. One wrong bit
could change an apple into a lemon, or even lose all information about the
category of the object. Thus, in the general case, such a representation
is "distributed" only in the physical sense of activating (or not activating)
units in a group. Each unit is still functionally critical.
 
>Mixed Distributed Representation ---- The multiple-to-multiple correspondence
>       in which multiple objects of multiple properties are represented by
>       one unit and each unit represents multiple objects with multiple
>       properties. Here, the units take either binary or analog values
>       depending on the properties and the object they represent.
 
Now here we have what most people mean by "distributed representations". We
have many properties, each represented by a unit, and many objects. Each
object can be encoded in terms of its properties. If the set of properties
does not have enough discrimination, multiple objects could have the same
code. Even if the property set is sufficient for unique representation, it
is possible that the malfunction of one unit may change one object to
another. The question then is: is this dependency small or large? Does
a small malfunction in a unit cause catastrophic change in the semantic
content of the whole group of units? I can "distribute" my representation
over all the atoms in the universe, but if that doesn't give me some
protection from point failures, I have not truly "distributed" things
at all --- merely multiplied the local representation. Now, of course, in
the "real" world where things are uniformly or normally distributed and
errors are uncorrelated, increasing the size of a representation over a
set of independent units will almost always confer some degree of protection
from catastrophic point failures. An important issue is how to *maximize*
this. And to do that, we must be able to measure it. One way would be to
minimize the average information each representational unit conveys about
the represented objects, which is a simple maximum entropy formulation.
This requirement must, of course, be balanced by an adequate representation
imperative. Other formulations are certainly possible, and probably much
better. In any case, many of the more interesting issues in distributed
representation arise when the "object" being represented is only implicitly
available, or when the representation is distributed over a hierarchy of
units, not all of which are directly observable, and not all of which
count in the final encoding.
 
Comments?
 
Ali Minai
aam9n at Virginia.EDU
 
 
From ITGT500 at INDYCMS.BITNET  Sat Jun 22 11:38:17 1991
From: ITGT500 at INDYCMS.BITNET (Bo Xu)
Date: Sat, 22 Jun 91 10:38:17 EST
Subject: Distributed Representations
Message-ID: <29E19BB296800064@BITNET.CC.CMU.EDU>

Ali Minai presented a good example of apple and pear.  I am going to answer
some questions he raised.  Let's look at his statements first.
 
>is not. No reference is made to "properties" defining the object, and so there
>is no semantic content in any unit beyond that of mere signification: each
 
This is a very good question.  Generally speaking, there are many properties
existed at the same time for each object.  Let's take the apple as an example.
An apple can be classified according to its taste, color, size, shape, or
whether it is a fruit or not (as Ali Minai chose) etc.  Different people will
choose different criteria to meet the purpose of their applications.
 
>unit is, ideally, identical. The question is: why have three units signifying
>one object when they work as one? One reason might be to achieve redundancy,
>and consequent fault-tolerance, through a voting scheme (e.g. 101001 <-> pear).
Redundancy and fault-tolerance may be reasons for binary distributed
representation.  Another reason probably comes from the faster convergence
rate consideration.  Karen Kukich has done some interesting work and concludes
that the advantage of local representation is the faster convergence rate
(see K. Kukich, "Variations on a Back-Propagation Name Recognition Net" in the
Proceedings of the United States Postal Service Advanced Technology
Conference, Vol. 2, 722-735).  The binary distributed representation is similar
to local representation in that they all take binary values.  However,
as to why "three" instead of "five" or any other numbers, I also don't know.
This question is probably similar to the question of "how many hidden units are
needed for a specific task?".  It may depend on to what degree the redundancy
is needed.
 
>Here, under the obvious reading of this definition, I have two categories
>(units) called "fruits" and "vegetables". Each represents many objects
>with different values, but mutually exclusively. Thus, I might have
>apple <-> 0.1,0 and squash <-> 0,0.1, but no object will have the code
>0.1,0.1. This is obviously equivalent to a binary representation with
>each unit replaced by, say, n binary units. The question is: does this
>code embody the principle of dispensibility? Not necessarily. One wrong bit
>could change an apple into a lemon, or even lose all information about the
>category of the object. Thus, in the general case, such a representation
>is "distributed" only in the physical sense of activating (or not activating)
>units in a group. Each unit is still functionally critical.
 
It is true if there is a bit of error, the apple will change to lemon etc.
However, the key point here is that the neural net's fault-tolerance
characteristic exists only after it is trained and has reached an accuracy
criterion.  If we are dealing with many objects and use 0.1 as a value to
differentiate different objects, we will train the net to reach a criterion
at least smaller than 0.1 (otherwise, the net will be of no use).  Thus, for
seen patterns, the error will not be so big that an apple will turn into
a lemon.  For unseen patterns, bigger errors probably will occur, and apples
probably will turn to lemons or whatsoever.  However, this time we may not
attribute the problem to the representation used only.  This is related to
the generalizability of the net, and the learning algorithm, units responsive
characteristics and even the topology of the net all probably are playing
roles for the generalizability of the net.
 
>Now here we have what most people mean by "distributed representations". We
 
>nother. The question then is: is this dependency small or large? Does
>small malfunction in a unit cause catastrophic change in the semantic
>content of the whole group of units? I can "distribute" my representation
 
When talking about the representations, the graceful degradation of brain
is introduced as a criterion.  However, since the neural net is still far
away from a real brain model, some cautiousness should be taken when relating
the neural net to brain.  The first thing to be made clear is that which
layer of neural net we are refering to.  Most people refer to the interface
layers (the input and output layers) of neural net when they talk about
the local/distributed representations. However, they refer to all layers (both
the interface layers and hidden layers) when they talk about the graceful
degradation.
      However, what are the justices for the interface layers to possess
graceful degradation?  If we say that neural net resembles brain in some
aspects, then the resemblance most likely lies in the hidden layers instead
of the interface layers. The criterion of graceful degradation should be made
on the hidden layers instead of the interface layers.  In most of current nets,
the hidden layers are using mixed distributed representation, and thus
possess the graceful degradation characteristics.
      As to the interface layers (input/output layers), we can demand them
to possess the graceful degradation characteristics too.  However, in my
opinion, this will lead to many additional problems and confusions.
The mixed distributed representation is good for hidden layers, not for
interface layers.  I think for the interface layers, the analog distributed
representation works best because: (1) Considerations at the interface layers
should be practicality instead of graceful degradation.  There is no justice
and no need for the interface layers to possess the graceful degradation.
(2). The analog distributed representation has classified the objects to
be represented.  The objects with the same property are classified into the
same group.  The differences between the objects in the same group are
represented by different analog values of the unit representing this
property group (eg, assume that there are four apples and three pears, then
in analog distributed representation, two units should be used: unit A for
apple and unit P for pear.  The four apples can be represented by letting unit
A take four different analog values.  The three pears can be represented by
letting unit P take three different analog values.).  This is the most natural
way when we deal with many objects.  Why should we sacrifice the natural
way (analog distributed representation) for the graceful degradation (which
may not belong to the interface layers. The hidden layers are using mixed
distributed representation and possess graceful degradation) when we are
considering the interface layers?  We used the analog distributed
representation in a parabolic problem (a task mapping the parabola curve
we used to compare the performances of BPNN and PPNN) and found that the
analog distributed representation was the best and most natural representation
for problems (such as the parabolic problem) which has continuous and infinite
training/test patterns (objects).
 
In sum, I think that we should be more specific when we talk about the
representations and brain-like characteristics of neural nets:
(1) For the interface layers (input/output layers), the analog distributed
representation is the best choice because at the interface layers, the priority
of consideration is practicality, and the analog distributed representation
is the most natural one and most easily to be used in dealing with many
objects.
(2) For the hidden layers, the mixed distributed representation is the best
choice because the graceful degradation requirement now is the priority to
be taken into account of for hidden layers.  Fortunately, most of the current
network architechures have ensured such requirement for hidden layers.
 
 
Bo Xu
ITGT500 at INDYCMS.BITNET

From aam9n at hagar3.acc.Virginia.EDU  Sat Jun 22 21:49:33 1991
From: aam9n at hagar3.acc.Virginia.EDU (Ali Ahmad Minai)
Date: Sat, 22 Jun 91 21:49:33 EDT
Subject: Distributed Representations
Message-ID: <9106230149.AA00465@hagar3.acc.Virginia.EDU>

Bo Xu raises some questions about distributed representations in the
context of feed-forward neural networks, particularly with regard to
graceful degradation. I do not agree that to require graceful degradation
is to imply "brain-like" networks. In my opinion, the very notion of
distribution is fundamentally linked to the requirement that each
representational unit be minimally loaded, and that each representation
be as homogeneously distributed over all representational units as possible.
That this produces graceful degradation is partly true (only to the first
order, given the non-linearity of the system), but that is incidental.

Speaking of which layers to apply the definition to, I think that in a
feed-forward associative network (analog or binary), the hidden neurons
(or all the weights) are the representational units. The input neurons
merely distribute the prior part of the association, and the output neurons
merely produce the posterior part. The latter are thus a "recovery mechanism"
designed to "decode" the distributed representation of the hidden units and
recover the "original" item. Of course, in a heteroassociative system, the
"recovered original" is not the same as the "stored original". I realize that
this is stretching the definition of "representation", but it seems quite
natural to me.

The issue of a "recovery mechanism" is quite fundamental to the question
of representational distribution. Without a requirement for adequate
recoverability, any finite medium could be "distributedly" loaded with
a potentially infinite number of representations, without being able
to reproduce any of them. To ensure adequate recoverability, however,
representations must be "distinct", or mutually non-interacting, in some
sense. Given the countervailing requirement of distributedness, the
obvious route of separation by localization is not available, and we
must arrive at some compromise principle of minimum mutual disturbance,
such as a requirement for orthogonality or linear independence (rather
artificial, if you ask me).

My point is that defining distributed representations only in terms
of unconstrained characteristics is a partial solution. Internal
and external constraining factors must be included in the formulation
to adequately ground the definition. These are provided by the
requirements of maximum dispensibility and adequate recoverability.
Zillions of issues remain unaddressed by this formulation too, especially
those of consistent measurement. I feel that each domain and situation
will have to supply its own specifics.

I am not sure I understand Bo Xu's assertion that analog representations
are "more natural". Certainly, to approximate a parabola (which I have
done hundreds of times with different neural nets) would imply using an
analog representation, but it is not clear if that is so natural for
classifying apples and pears. Using different analog values to indicate
intra-class variations is reasonable and, under specific circumstances,
might even be provably better than a binary representation. But I would
be very hesitant to generalize over all possible circumstances. In any
case, a global characterization of distributed representation should depend
of specifics only for details, and should apply to both discrete and analog
representations.

Ali Minai
University of Virginia
aam9n at Virginia.EDU

From ross at psych.psy.uq.oz.au  Sun Jun 23 01:52:51 1991
From: ross at psych.psy.uq.oz.au (Ross Gayler)
Date: Sun, 23 Jun 1991 15:52:51 +1000
Subject: Distributed vs Localist Representations
Message-ID: <9106230552.AA02343@psych.psy.uq.oz.au>

Randall Stark (rstark at aipna.edinburgh.ac.uk) writes:

>One aspect of this issue which seems implicit in much of this discussion
>is the notion that distributed representation can be considered
>a *relative* property.  Thus the "room schema" network is "distributed"
>relative to rooms, but "localist" relative to ovens.

A related point was raised by Paul Smolensky in his work on variable binding
using tensor representations.  By his definition a representation is
distributed if enitities of external interest (objects, attributes, values
or whatever) are represented as patterns across multiple units.  The point
Paul makes is that in much connectionist work the variables are localised
while the values are distributed.  That is, the set of units is typically
divided into disjoint groups that function as registers or variables.
Each variable is able to hold a pattern of activations that is a distributed
value.

He proposed a mechanism in which the variables are not disjoint sets of units
but instead are patterns that are bound to the patterns representing values.
Using this scheme a binding of a variable with a value is itself represented
as a pattern distributed over units and multiple bindings can be simultaneously
represented on the same units.  The nice point about this is that it puts
variables and values on an equal footing, they are both patterns.  In fact the
system does not need to distinguish between them from a processing perspective.
Whether something is a variable or a value is a question of how it is used,
not how it is represented or implemented.

Ross Gayler
ross at psych.psy.uq.oz.au

From aarons at cogs.sussex.ac.uk  Sun Jun 23 16:13:31 1991
From: aarons at cogs.sussex.ac.uk (Aaron Sloman)
Date: Sun, 23 Jun 91 21:13:31 +0100
Subject: Varieties of intelligence (long)
Message-ID: <1666.9106232013@csrn.cogs.susx.ac.uk>


A friend, Gerry Martin, is interested in "achievers", how they differ
and the conditions that create them or enable them to achieve.

I offered to try to find out if anyone knew of relevant work on
different kinds of (human) intelligence, how they develop, what they
are, and what (social) mechanisms if any enable them to be matched with
opportunities for development or fulfilment.

There's a collection of related questions.

1. To what extent does evolution produce variation in intellectual
capabilities, motivations, etc.? How far is the observable variation due
to environmental factors?

This is an old question, of course, and very ill-defined (e.g. there is
probably no meaningful metric for the contributions of genetic and
environmental factors to individual development). It is clear that
physical variability is inherent in evolutionary mechanisms: without
this there could not be (Darwinian) evolution.

The same must presumably be true for "mental" variability. Do genetic
factors produce different kinds of differences: in intellectual
capabilities, motivational patterns, perceptual abilities, memory
abilities, problem solving abilities, etc.

I think it was Waddington who offered the metaphor of the "epigenetic
landscape" genetically determining the opportunities for development of
an individual. The route actually taken through the landscape would
depend on the individual's environment. So our question is how different
are the landscapes (the sets of possible developmental routes) with
which each human child is born, and to what extent do they determine
different opportunities for mental, as well as physical development?
(Obviously the two are linked: a blind child won't as easily become a
great painter.) (Piaget suggested that all the human landscapes have a
common structure, with well defined stages. I suspect this view will not
survive close analysis.)

For intelligent social animals, mental variability is more important
than physical variability: a social system has more diversity of
intellectual and motivational requirements in its "jobs" than diversity
of physical requirements. (Perhaps not if you include the "jobs" done
for us by other animals, plants, microorganisms, machines, etc., without
which our society could not survive.)

Anyhow, without variation in mental properties (whether produced
genetically or not) it could be hard to achieve the division of labour
that enables a complex social system to work. Aldous Huxley's book
"Brave New World" takes this idea towards an unpalatable conclusion.

The need for mental variability goes beyond infrastructure: without such
variability all artists would be painters, or all would be composers, or
all would be poets, and all scientists would be physicists, or
biologists... Division of labour is required not only for the enabling
mechanisms of society, but also for cultural richness.


2. What is the form of this variability?

Folk psychology has it that there are different kinds of genius -
musical geniuses, mathematical geniuses, geniuses in biology, great
actors and actresses, etc. Could any of these have excelled in any other
field? Would the right education have turned Mozart into a great
mathematicion, or would his particular "gifts" never have engaged with
advanced mathematics? Could a suitable background have made Newton a
great composer? Does anyone have any insight into the genetic
requirements for different kinds of creative excellence?

We can distinguish two broad questions:
    (a) is there wide variability in DEGREE in innate capabilities
    (b) is there also wide variability in KIND (domain, field of
        application, or whatever)?

In either case it would be interesting to know what kinds of mechanisms
account for the differences? Could they be quantitative (as many naive
scientists have supposed -- e.g. number of brain cells, number of
connections, speed of transmission of signals, etc.) or are the relevant
differences more likely to be structural -- i.e. differences in hardware
or software organisation?

It looks as if many ordinary human learning capabilities need specific
pre-determined structures, providing the basis for learning abilities:
e.g. learning languages with complex syntax, learning music, learning to
control limbs, learning to see structured objects, learning to play
games, learning mathematics, and so on. (Some of the structures creating
these capabilities might be shared between different kinds of
potential.)

If these enabling structures are not "all-or-nothing" systems there
could sometimes be partial structures at birth, giving some individuals
subsets of "normal" capabilities. Are these all a result of pre-natal
damage, or might the gene pool INHERENTLY generate such variety? (An
unpalatable truth?)

Does the gene pool also produce some individuals with powerful supersets
of what is relatively common? Are there importantly different supersets,
corresponding to distinct "gifts"? (E.g. Mozart, Newton, Shakespeare.)

What are the additional mechanisms these individuals have? Can those
born without be given them artificially? (E.g. through special training,
hormone treatment, etc..)


3. To what extent do different approaches to AI (I include connectionism
as a sub-field of AI) provide tools to model different sorts of
mentalities?

As far as I know, although there has been much empirical research (e.g.
on twins) to find out what is and what is not determined genetically,
there there has been very little discussion of mechanisms that might be
related to such variability.

>From an AI standpoint it is easy to speculate about ways in which
learning systems could be designed that are initially highly sensitive
to minor and subtle environmental differences and which, through various
kinds of positive feedback, amplify differences so that even individuals
that start off very similar could, in a rich and varied environment, end
up very different. This sort of thing could be a consequence of
multi-layered self-modifying architectures with thresholds of various
kinds that get modified by "experience" and which thereby change the
behaviour of systems which cause other thresholds to be modified. Even
without thresholds, hierarchies of condition-action rules, where some of
the actions create or alter other rules, would also provide for enormous
variability. (As could hierarchies of pdp networks, some of which
change the topology of others.)

Cascades of such changes could produce huge qualitative variation in
various kinds of intellectual capabilities as well as variation in
motivational, emotional and personality traits, aesthetic tastes, etc.

Such architectures might allow relatively small genetic differences as
well as small environmental differences to produce vast differences in
adult capabilities.

Variation in tastes in food, or preferences for mates, despite common
biological needs, seem to be partly a result of cultural feedback
through such developmental mechanisms. But is it all environmental? I
gather there are genetic factors that stop some people liking the tastes
of certain foods. What about a taste for mathematics, or a general taste
for intellectual achievement?


4. Does anyone have any notion of the kinds of differences in
implementation that could account for differences in tastes,
capabilities, etc. Would it require:

(a) differences in underlying physical architectures (e.g. different
    divisions of brains into cooperative sub-nets, or different
    connection topologies among neurones?),
(b) differences in the contents of "knowledge bases", "plan databases",
    skill databases, etc. (By "database" I include what can be stored
    in a trainable network.)
(c) differences in numerical parameters.

or something quite different?

I suspect there's a huge variety of distinct ways in which qualitative
differences in capability can emerge: some closer to hardware
differences, some closer to software differences. The latter might in
principle be easier to change, but not in practice, if for example, it
requires de-compiling a huge and messy system.

The only AI-related work that I know of that explicitly deals not
only with the design or development of a single agent, but with variable
populations, is work on genetic algorithms, which can produce a family
of slightly different design solutions.

Of course, it is premature for anyone to consider modelling evolutionary
processes that would produce collections of "complete" intelligent
agents (as opposed to collections of solutions to simple problems like
planning problems, recognition problems, or whatever). But has anyone
investigated general principles involved in mechanisms that could
produce populations of agents with important MENTAL differences? Are
there any general principles? (Are the mental epigenetic landscapes for
a species importantly different in structure from the physical ones?
Perhaps for some organisms, e.g. ants, there's a lot less difference
than for others, e.g. chimpanzees?)


5. There are related questions about the need for or possibility of
social engineering. (The questions are fraught with political and
ethical problems.) In particular, if truly gifted individuals have
narrowly targetted potential, are there mechanims that enable such
potential to be matched with appropriate opportunities for development
and application? Do rare needs have a way of "attracting" those with the
rare ability to tackle them?

What mechanisms can help to match individuals with unusual combinations
of motives and capabilities, with tasks or roles that require those
combinations? In a crude and only partly successful way the educational
system and career advisory services attempt to do this. Special schools
or special lessons for gifted children attempt to enhance the
match-making. However, these formal institutions work only insofar as
there are fairly broad and widely-recognized categories of individuals
and of tasks.

They don't address the problem of matching the potentially very high
achievers to very specific opportunities and tasks that need them. Some
job advertisements and recruitment services attempt to do this but
there's no guarantee that they make contact with really suitable
candidates, and we all know how difficult selection is. Also these
mechanisms assume that the need has been identified. There was no
institution that identified the need for a theory of gravity and
recruited Newton, provided him with opportunties, etc. Was it pure
chance then that he was "found"? Or were there many others who might
have achieved what he did? Or were there unrecognized social mechanisms
that "arranged" the match? If so, how far afield could he have been born
without defeating the match-making?

If the potentially very high acheivers only have very small areas in
which their potential can be realized, and if each type is very rare,
there may be no general way to set up conditions that bring them into
the appropriate circumstances. An important example might turn out to be
the problem of matching the particular collection of talents, knowledge,
and opportunity that would enable a cure for AIDS to be found.

In a homogenous global culture with richly integrated (electronic?)
information systems it might be possible to reduce the risks of such
lost opportunities, but only if there are ways of recognizing in advance
that a particular individual is likely to be well suited to a particular
task. The more narrowly defined and rare the task and the capabilities,
the less likely it is that the match can be recognized in advance.

Is the idea that there are important but extremely difficult tasks and
challenges that only a very few individuals have the potential to cope
with just a romantic myth? Or is every solvable problem, every
achievable goal, solvable by a large subset of humanity, given
the right training and opportunity?

(Will we ever know whether nobody but Fermat had what it takes to prove
his "last" theorem?)

Even if the "romantic myth" is close to the truth, there may be no way
of setting up social mechanisms with a good chance of bringing important
opportunities and appropriately gifted individuals together: social
systems are so complex that all attempts to control them, however
well-meaning, invariably have a host of unintended, often undesirable,
consequences, some of them long term and far less obvious than missiles
that hit the wrong target.

Could some variant of AI help here? It seems unlikely that connectionist
pattern recognition techniques could work. (E.g. where would training
sets come from?) Could some more abstract sort of expert system help?
Neither could inform us that the person capable of solving a particular
problem is an unknown child in a remote underdeveloped community.

Perhaps there is nothing for it, but to rely on chance, co-incidence, or
whatever combination of ill-understood biological and social processes
have worked up to now in enabling humankind to achieve what
distinguishes us from ants and apes) including our extremes of
ecological vandalism).

-----------------------------------------------------------------------

I don't know if I have captured Gerry's questions well: he hasn't seen
this message. But if you have any relevant comments including
pointers to literature, information about work in progress, criticisms
of the presuppositions of the questions, conjectures about the answers,
etc. I'll be interested to receive them and to pass them on.

I'll post this to connectionists and the comp.ai newsgroup. (Should it
go to others?)

Apologies for length.
Aaron Sloman,
School of Cognitive and Computing Sciences,
Univ of Sussex, Brighton, BN1 9QH, England
    EMAIL   aarons at cogs.sussex.ac.uk
After 18th July 1991:
    School of Computer Science. The University of Birmingham, UK.
    Email: A.Sloman at cs.bham.ac.uk

From ITGT500 at INDYCMS.BITNET  Mon Jun 24 10:45:52 1991
From: ITGT500 at INDYCMS.BITNET (Bo Xu)
Date: Mon, 24 Jun 91 09:45:52 EST
Subject: Distributed Representations
Message-ID: <AE89B8428E800064@BITNET.CC.CMU.EDU>

Ali Minai mentioned a good point on where the representations are considered.
Let's see his messages first:
 
>Speaking of which layers to apply the definition to, I think that in a
>feed-forward associative network (analog or binary), the hidden neurons
>(or all the weights) are the representational units. The input neurons
>merely distribute the prior part of the association, and the output neurons
>merely produce the posterior part. The latter are thus a "recovery mechanism"
>designed to "decode" the distributed representation of the hidden units and
>recover the "original" item. Of course, in a heteroassociative system, the
>"recovered original" is not the same as the "stored original". I realize that
>this is stretching the definition of "representation", but it seems quite
>natural to me.
 
I think according to the criterion of where representations exist, the
representations can be classified into two different types:
 
(1). External representations ---- The representations existed at the
         interface layers (input and/or output layers).  They are
         responsible for the information transmission between the network
         and the outside world (coding the input information at the input
         layer and decoding the output information at the output layer).
 
(2). Internal representations ---- The representations existed at the
         hidden layers.  These representations are used to encode the
         mappings from the input field to the output field. The mappings
         are the core of the neural net.
 
If I understand correctly, Ali Minai is referring to the internal
representations only, and neglect the external representations.  The internal
representations are very important representations.  However, these
representations are determined by the topology of the network, and we cannot
change them unless we change the network topology.  Most of the current
networks' topology ensure that the internal representations are mixed
distributed representations (as I pointed out several days ago).  Their
working mechanisms are still a black-box.
 
Without changing the topology of the network, what we can choose and
select are the external representations only.  They should not be neglected.
 
>Zillions of issues remain unaddressed by this formulation too, especially
>those of consistent measurement. I feel that each domain and situation
>will have to supply its own specifics.
 
>I am not sure I understand Bo Xu's assertion that analog representations
>are "more natural". Certainly, to approximate a parabola (which I have
>done hundreds of times with different neural nets) would imply using an
>analog representation, but it is not clear if that is so natural for
>classifying apples and pears. Using different analog values to indicate
>intra-class variations is reasonable and, under specific circumstances,
>might even be provably better than a binary representation. But I would
>be very hesitant to generalize over all possible circumstances. In any
>case, a global characterization of distributed representation should depend
>of specifics only for details, and should apply to both discrete and analog
>representations.
 
It's true that there will be zillions of issues in practical applications.
However, it's also due to this fact, it will be very difficult (if not
impossible) to study all these zillions issues first before drawing some
conclusions.  Some generalizations based on limited studies are probably
necessary and of help when facing such a situation.
 
I want to thank Ali Minai for his comments.  All of his comments are very
valuable and thought-stimulating.
 
 
Bo Xu
Indiana University
ITGT500 at INDYCMS.BITNET

From aam9n at hagar2.acc.Virginia.EDU  Mon Jun 24 22:29:34 1991
From: aam9n at hagar2.acc.Virginia.EDU (Ali Ahmad Minai)
Date: Mon, 24 Jun 91 22:29:34 EDT
Subject: Distributed Representations
Message-ID: <9106250229.AA00528@hagar2.acc.Virginia.EDU>

This is in response to Bo Xu's last posting regarding distributed
representations. I think one of the problems is a basic incompatibility
in our notions of "representations" and where they exist. I would like to
clarify my earlier posting somewhat on this point.
 
I wrote:

>>Speaking of which layers to apply the definition to, I think that in a
>>feed-forward associative network (analog or binary), the hidden neurons
>>(or all the weights) are the representational units. The input neurons
>>merely distribute the prior part of the association, and the output neurons
>>merely produce the posterior part. The latter are thus a "recovery mechanism"
>>designed to "decode" the distributed representation of the hidden units and
>>recover the "original" item. Of course, in a heteroassociative system, the
>>"recovered original" is not the same as the "stored original". I realize that
>>this is stretching the definition of "representation", but it seems quite
>>natural to me.

To which Bo replied:
 
>I think according to the criterion of where representations exist, the
>representations can be classified into two different types:
> 
>(1). External representations ---- The representations existed at the
>         interface layers (input and/or output layers).  They are
>         responsible for the information transmission between the network
>         and the outside world (coding the input information at the input
>         layer and decoding the output information at the output layer).
> 
>(2). Internal representations ---- The representations existed at the
>         hidden layers.  These representations are used to encode the
>         mappings from the input field to the output field. The mappings
>         are the core of the neural net.
> 
>If I understand correctly, Ali Minai is referring to the internal
>representations only, and neglect the external representations.  The internal
>representations are very important representations.  However, these
>representations are determined by the topology of the network, and we cannot
>change them unless we change the network topology.  Most of the current
>networks' topology ensure that the internal representations are mixed
>distributed representations (as I pointed out several days ago).  Their
>working mechanisms are still a black-box.
> 
>Without changing the topology of the network, what we can choose and
>select are the external representations only.  They should not be neglected.

First, let me state what I meant by the "stored" and "recovered"
representations in the heteroassociative case. We can see the process
of the heteroassociation of an input vector U and output vector V in
a feed-forward network as a process of encoding a representation of
the vector UV over the hidden units of the network. This is what I call
"storage". There is a special requirement here that, given U, a
mechanism should be able to produce V over the output units, thus
"completing the pattern". The process of doing this is what I call
"recovery" (or "recall"). The way I see it (and I believe most other
connectionists too) is that the representational part of the network
consists of its "internals" --- either the weights, or the hidden units.
Far from being uncontrollable, as Bo Xu states, these are *precisely* the
things that we *do* control --- not in a micro sense, but through complex
global schemes such as training algorithms. The prior to be stored, which
Bo takes to be the representation, is, to me, just a given that has been
through some unspecified preprocessing. It is the "object" to be
represented (though I agree that all objects are themselves representations).

From rosauer at ira.uka.de  Tue Jun 25 14:27:57 1991
From: rosauer at ira.uka.de (Bernd Rosauer)
Date: Tue, 25 Jun 91 14:27:57 MET DST
Subject: genetic algorithms + neural networks
Message-ID: <mailman.420.1149591217.29955.connectionists@cs.cmu.edu>


I am interested in any kind of combination of genetic algorithms
and neural network training. I am aware of the papers presented at
	* Connectionist Models Summer School, 1990
	* First International Workshop on Parallel Problem Solving
	  from Nature, 1990
	* Third International Conference on Genetic Algorithms, 1989
	* Advances in Neural Information Processing Systems 2, 1989.
Please, let me know if there is any further work on that topic.
Post to <rosauer at ira.uka.de>, so I will summarize here.

Thanks a lot
			
               Bernd


From stork at GUALALA.CRC.RICOH.COM  Mon Jun 24 20:36:49 1991
From: stork at GUALALA.CRC.RICOH.COM (David Stork)
Date: Mon, 24 Jun 91 17:36:49 -0700
Subject: Job offer
Message-ID: <9106250036.AA11456@cache.CRC.Ricoh.Com>


The Ricoh California Research Center has an oppening for a staff programmer
or researcher in neural networks and connectionism.  This opening is for a
B.S. or possibly M.S.-level graduate in Physics, Computer Science, Math,
Electrical Engineering, Cognitive Science, Psychology, and related topics.
A background in some hardware design is a plus.

The Ricoh California Research Center is located in Menlo Park, about one
mile from Stanford University.  

Contact:

Dr. David G. Stork
Ricoh California Research Center
2882 Sand Hill Road #115
Menlo Park, CA 94025-7022
stork at crc.ricoh.com

From issnnet at park.bu.edu  Tue Jun 25 15:39:29 1991
From: issnnet at park.bu.edu (issnnet@park.bu.edu)
Date: Tue, 25 Jun 91 15:39:29 -0400
Subject: Call For Votes: comp.org.issnnet
Message-ID: <9106251939.AA04607@copley.bu.edu>


			    CALL FOR VOTES
			   ----------------

GROUP NAME: 	comp.org.issnnet

STATUS:		unmoderated

CHARTER:	The newsgroup shall serve as a medium for discussions
		pertaining to the International Student Society for
		Neural Networks (ISSNNet), Inc., and to its activities
		and programs as they pertain to the role of students
		in the field of neural networks. Details were posted
		in the REQUEST FOR DISCUSSION, and can be requested
		from <issnnet at park.bu.edu>.

VOTING PERIOD: 	JUNE 25 - JULY 25, 1991
				   
******************************************************************************
			    VOTING PROCESS

If you wish to vote for or against the creation of comp.org.issnnet,
please send your vote to:
				   
			 issnnet at park.bu.edu

To facilitate collection and sorting of votes, please include one of
these lines in your "subject:" entry:


If you favor creation of comp.org.issnnet, your subject should read:

	YES - comp.org.issnnet


If you DO NOT favor creation of comp.org.issnnet, use the subject:

	NO - comp.org.issnnet
				   

     YOUR VOTE ONLY COUNTS IF SENT DIRECTLY TO THE ABOVE ADDRESS.

-----------------------------------------------------------------------

For more information, please send e-mail to issnnet at park.bu.edu (ARPANET)
write to:

	ISSNNet, Inc.
	PO Box 557, New Town Br.
	Boston, MA 02258   USA

ISSNNet, Inc. is a non-profit corporation in the Commonwealth of
Massachusetts. 

NOTE -- NEW SURFACE ADDRESS:

ISSNNet, Inc.
P.O. Box 15661
Boston, MA  02215  USA

From koch at CitIago.Bitnet  Thu Jun 27 06:12:08 1991
From: koch at CitIago.Bitnet (Christof Koch)
Date: Thu, 27 Jun 91 03:12:08 PDT
Subject: Phase-locking without oscillations
Message-ID: <910627031202.20402f6a@Iago.Caltech.Edu>

The following paper  is available by anyonymous  FTP  from Ohio  State
University from pub/neuroprose. The file is called  "koch.syncron.ps.Z".
 
 
                        A SIMPLE NETWORK SHOWING BURST
                SYNCHRONIZATION WITHOUT FREQUENCY-LOCKING
 
 
                        Christof Koch and Heinz Schuster
 
 
ABSTRACT: The  dynamic  behavior of  a network  model   consisting  of
all-to-all excitatory coupled binary neurons with global inhibition is
studied analytically  and  numerically. It  is shown that   for random
input signals,  the output   of  the network consists of  synchronized
bursts  with  apparently random  intermissions  of noisy  activity. We
introduce the  fraction of simultaneously firing  neurons as a measure
for synchrony  and  prove   that  its temporal  correlation   function
displays,  besides a delta peak  at  zero indicating random processes,
strongly  damped  oscillations.  Our results suggest that  synchronous
bursts  can be  generated   by a  simple   neuronal architecture which
amplifies incoming coincident signals. This synchronization process is
accompanied by damped  oscillations which,  by themselves, however, do
not play any constructive role in this and can therefore be considered
to be an epiphenomenon.
 
Key words: neuronal networks / stochastic activity / burst
synchronization / phase-locking / oscillations
 
 
For comments, send e-mail to koch at iago.caltech.edu.
 
Christof
 
 
P.S. And this is how you can FTP and print the file:
 
 
              unix> ftp cheops.cis.ohio-state.edu (or 128.146.8.62)
              Name: anonymous
              Password: neuron
              ftp> cd pub/neuroprose (actually, cd neuroprose)
              ftp> binary
              ftp> get koch.syncron.ps.Z
              ftp> quit
              unix> uncompress koch.syncron.ps.Z
              unix> lpr koch.syncron.ps
 
              Read and be illuminated.
 
 
From nowlan at helmholtz.sdsc.edu  Thu Jun 27 14:38:58 1991
From: nowlan at helmholtz.sdsc.edu (Steven J. Nowlan)
Date: Thu, 27 Jun 91 11:38:58 MST
Subject: Thesis/TR available
Message-ID: <9106271838.AA27191@bose>


The following technical report version of my thesis is now available from
the School of Computer Science, Carnegie Mellon University:

-------------------------------------------------------------------------------

			Soft Competitive Adaptation:
		    Neural Network Learning Algorithms 
		   based on Fitting Statistical Mixtures

   	                      CMU-CS-91-126

  			    Steven J. Nowlan
			School of Computer Science
			Carnegie Mellon University


			       ABSTRACT

In this thesis, we consider learning algorithms for neural networks which are
based on fitting a mixture probability density to a set of data.

We begin with an unsupervised algorithm which is an
alternative to the classical winner-take-all competitive algorithms. Rather
than updating only the parameters of the ``winner'' on each case, the
parameters of all competitors are updated in proportion to their relative
responsibility for the case.
Use of such a ``soft'' competitive algorithm is shown to give better
performance than the more traditional algorithms, with little additional cost.

We then consider a supervised modular architecture in which a number
of simple ``expert'' networks compete to solve distinct pieces of a large
task. A soft competitive mechanism is used to determine how much an expert
learns on a case, based on how well the expert performs relative to the other
expert networks. At the same time, a separate gating network learns to weight
the output of each expert according to a prediction of its relative performance
based on the input to the system.
Experiments on a number of tasks illustrate that this architecture is capable
of uncovering interesting task decompositions and of generalizing better than a
single network with small training sets.

Finally, we consider learning algorithms in which we assume that the actual
output of the network should fall into one of a small number of classes or
clusters. The objective of learning is to make the variance of these classes as
small as possible.
In the classical decision-directed algorithm, we decide that an
output belongs to the class it is closest to and minimize the squared distance
between the output and the center (mean) of this closest class. In the
``soft'' version of this algorithm, we minimize the squared
distance between the actual output and a weighted average of the means of all
of the classes. The weighting factors are the relative probability that
the output belongs to each class. This idea may also be used to model the
weights of a network, to produce networks which generalize better from small
training sets.

-------------------------------------------------------------------------------

Unfortunately there is NOT an electronic version of this TR. Copies may be
ordered by sending a request for TR CMU-CS-91-126 to:

Computer Science Documentation
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA
15213
USA

There will be a charge of $10.00 U.S. for orders from the U.S., Canada or
Mexico and $15.00 U.S. for overseas orders to cover copying and mailing
costs (the TR is 314 pages in length). Checks and money orders should
be made payable to Carnegie Mellon University. Note that if your institution
is part of the Carnegie Mellon Technical Report Exchange Program there will
be NO charge for this TR.

REQUESTS SENT DIRECTLY TO MY E-MAIL ADDRESS WILL BE FILED IN /dev/null.


		- Steve

(P.S. Please note my new e-mail address is nowlan at helmholtz.sdsc.edu).


------- End of Forwarded Message


From D.M.Shumsheruddin at computer-science.birmingham.ac.uk  Thu Jun 27 06:06:51 1991
From: D.M.Shumsheruddin at computer-science.birmingham.ac.uk (Dean Shumsheruddin)
Date: Thu, 27 Jun 91 11:06:51 +0100
Subject: Request for references on navigation
Message-ID: <961.9106271006@christopher-robin.cs.bham.ac.uk>

I am looking for references to work on neural nets for navigation in
graph-structured environments.
I've already found the papers by Pomerleau and Bachrach in NIPS 3.
I would greatly appreciate information about related work. 
If there is sufficient interest I'll post a summary to the list.

Dean Shumsheruddin 
University of Birmingham, UK
dms at cs.bham.ac.uk

From russ at oceanus.mitre.org  Fri Jun 28 10:47:09 1991
From: russ at oceanus.mitre.org (Russell Leighton)
Date: Fri, 28 Jun 91 10:47:09 EDT
Subject: Aspirin/MIGRAINES v4.0 Users
Message-ID: <9106281447.AA13459@oceanus.mitre.org>

                        Aspirin/MIGRAINES v4.0 Users

Could those groups presently using the Aspirin/MIGRAINES v4.0 
neural network simulator from MITRE please reply to this
message. A brief description of your motivation for
using this software would be useful but not necessary.

We are compiling a list of users so that we may more easily
distribute the next release of software (Aspirin/MIGRAINES v5.0).

Thank you.

Russell Leighton

INTERNET: russ at dash.mitre.org

Russell Leighton
MITRE Signal Processing Lab
7525 Colshire Dr.
McLean, Va. 22102
USA