KOHONEN MAPS

Fri Mar 31 15:14:12 EST 1989

I had initiated a discussion on Kohonen's maps two weeks ago and
apart from the many replies I (and many others??) received there
were requests that I post the responses. It would be a good idea
to go through this material and then discuss again.

>From pastor at prc.unisys.com Thu Mar 16 16:58:47 1989
Received: from PRC-GW.PRC.UNISYS.COM by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA03401; Thu, 16 Mar 89 16:58:40 EST
Received: from bigburd.PRC.Unisys.COM by burdvax.PRC.Unisys.COM (5.61/Domain/jpb/2.9) 
	id AA11739; Thu, 16 Mar 89 16:58:28 -0500
Received: by bigburd.PRC.Unisys.COM (5.61/Domain/jpb/2.9) 
	id AA24449; Thu, 16 Mar 89 16:58:23 -0500
From: pastor at prc.unisys.com (Jon Pastor)
Message-Id: <8903162158.AA24449 at bigburd.PRC.Unisys.COM>
Received: from Xerox143 by bigburd.PRC.Unisys.COM with PUP; Thu, 16 Mar 89 16:58 EST
To: ananth sankar <sankar at caip.rutgers.edu>
Date: 16 Mar 89 16:56 EST (Thursday)
Subject: Re: questions on kohonen's maps
In-Reply-To: ananth sankar <sankar at caip.rutgers.edu>'s message of Thu, 16 Mar 89 09:42:44 EST
To: ananth sankar <sankar at caip.rutgers.edu>
Cc: pastor at bigburd.prc.unisys.com
Status: R

I am in the process of implementing a Kohonen-style system, and if I
actually get it running and obtain any results I'll let you know.  If
you get any responses, please let me know.

Thanks.

>From Connectionists-Request at q.cs.cmu.edu Thu Mar 16 16:59:58 1989
Received: from Q.CS.CMU.EDU by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA03426; Thu, 16 Mar 89 16:59:52 EST
Received: from cs.cmu.edu by Q.CS.CMU.EDU id aa11454; 16 Mar 89 9:44:34 EST
Received: from CAIP.RUTGERS.EDU by CS.CMU.EDU; 16 Mar 89 09:42:55 EST
Received: by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA14983; Thu, 16 Mar 89 09:42:44 EST
Date: Thu, 16 Mar 89 09:42:44 EST
From: ananth sankar <sankar at caip.rutgers.edu>
Message-Id: <8903161442.AA14983 at caip.rutgers.edu>
To: connectionists at cs.cmu.edu
Subject: questions on kohonen's maps
Status: R

I am interested in the subject of Self Organization and have some
questions with regard to Kohonen's algorithm for Self Organizing Feature
Maps. I have tried to duplicate the results of Kohonen for the two dimensional
uniform input case i.e. two inputs. I used a 10 X 10 output grid. The maps
that resulted were not as good as reported in the papers.

Questions:

1	Is there any analytical expression for the neighbourhood and gain
	functions? I have seen a simulation were careful tweaking after
	every so many iterations produces a correctly evolving map. This
	is obviously not a proper approach.

2	Even if good results are available for particular functions for
	the uniform distribution input case, it is not clear to me that these
	same functions would result in good classification for some other
	problem. I have attempted to use these maps for word classification
	using LPC coeffs as features.

3	In Kohonen's book "Self Organization and Associative Memory", Ch 5
	the algorithm for weight adaptation does not produce normalized
	weights. Thus the output nodes cannot function as simply as taking
	a dot product of inputs and weights. They have to execute a distance
	calculation.

4	I have not seen as yet in the literature any reports on
	how the fact that neighbouring nodes respond to similar patterns
	from a feature space can be exploited.

5	Can the net become disordered after ordering is achieved at any
	particular iteration? 

I would appreciate any comments, suggestions etc on the above. Also so that
net mail clutter may be reduced please respond to

sankar at caip.rutgers.edu

Thank you.

Ananth Sankar
Department of Electrical Engineering
Rutgers University, NJ

>From regier at cogsci.berkeley.edu Thu Mar 16 17:07:20 1989
Received: from cogsci.Berkeley.EDU by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA03562; Thu, 16 Mar 89 17:07:16 EST
Received: by cogsci.berkeley.edu (5.61/1.29)
	id AA13666; Thu, 16 Mar 89 14:07:18 -0800
Date: Thu, 16 Mar 89 14:07:18 -0800
From: regier at cogsci.berkeley.edu (Terry Regier)
Message-Id: <8903162207.AA13666 at cogsci.berkeley.edu>
To: sankar at caip.rutgers.edu
Subject: Kohonen request
Status: R

	Hi, I'm interested in the responses to your recent Kohonen posting
on Connectionists.  Do you suppose you could post the results once all the
replies are in?  Thanks,
						-- Terry

>From ken at phyb.ucsf.edu Thu Mar 16 20:11:35 1989
Received: from cgl.ucsf.EDU by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA09101; Thu, 16 Mar 89 20:11:32 EST
Received: from phyb.ucsf.EDU by cgl.ucsf.EDU (5.59/GSC4.15)
	id AA01036; Thu, 16 Mar 89 17:11:23 PST
Received: by phyb (1.2/GSC4.15)
	id AA11601; Thu, 16 Mar 89 17:11:17 pst
Date: Thu, 16 Mar 89 17:11:17 pst
From: ken at phyb.ucsf.edu (Ken Miller)
Message-Id: <8903170111.AA11601 at phyb>
To: sankar at caip.rutgers.edu
Subject: kohonen
Status: R

re your point 3: 
the algorithm

du_{ij}/dt = a(t)[e_j(t) - u_{ij}(t)], i in N_c

where u = weights, e is input pattern, N_c is topological neighborhood of
maximally responding neighborhood, should actually be written

du_{ij}/dt = a(t)[ (e_j(t)/\sum_k(e_k(t))) - u_{ij}(t)/\sum_j(u_{ij}(t) ], 
i in N_c.

That is the change should be such as to move the jth synaptic weight on the
ith cell, as a PROPORTION of all the synaptic weights on the ith cell, in the
direction of matching the PROPORTION of input which was incoming on the jth
line.  Note that in this case \sum_j du_{ij}(t)/dt = 0, so weights remain
normalized in the sense that sum over each cell remains constant.

If you normalize your inputs to sum to 1 (\sum_k(e_k(t)) = 1) and start with
weights normalized to sum to 1 on each cell ( \sum_j(u_{ij}(t) = 1 for all
i) then weights will remain normalized to sum to 1, hence the two sums in
the denominators are both just = 1 and can be left out.  Kohonen was I
believe assuming these normalizations and hence dispensing with the sums.

ken miller (ken at phyb.ucsf.edu)
ucsf dept. of physiology

>From tds at wheaties.ai.mit.edu Thu Mar 16 23:26:42 1989
Received: from life.ai.mit.edu by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA12489; Thu, 16 Mar 89 23:26:39 EST
Received: from mauriac.ai.mit.edu by life.ai.mit.edu; Thu, 16 Mar 89 22:48:15 EST
Received: from localhost by mauriac.ai.mit.edu; Thu, 16 Mar 89 22:48:06 est
Date: Thu, 16 Mar 89 22:48:06 est
From: tds at wheaties.ai.mit.edu
Message-Id: <8903170348.AA19015 at mauriac.ai.mit.edu>
To: sankar at caip.rutgers.edu
Subject: Kohonen maps
Status: R

I share some of your confusion about Kohonen maps.  My main question is 
#4: are they really doing anything useful?  The mapping demonstrated in
Kohonen's 1982 paper (Biol. Cyb.) only shows mappings from a 2D manifold
in 3-space onto a two-dimensionally arranged set of units.  The book
talks about dimensionality issues in more detail, but so far as I can
tell what the network does (after training) is to map three numbers into
about 100 numbers.  Since the mapping is linear, I don't see how anything
at all is gained.  
  If the network is unable to generate an ordering, it may be one way to
tell if the data does not lie on a 2D manifold.  But there are many other
ways to do this that are more efficient!  Also, this is not robust if the
manifold folds back on itself (so that two distinct points on the surface
are in the same direction from the origin).

  Let me know if you find out the true significance of this widely-known
work,
		Terry

>From lwyse at bucasb.bu.edu Fri Mar 17 17:42:18 1989
Received: from BU-IT.BU.EDU by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA05821; Fri, 17 Mar 89 17:42:12 EST
Received: from COCHLEA.BU.EDU by bu-it.BU.EDU (5.58/4.7)
	id AA17739; Fri, 17 Mar 89 17:38:02 EST
Received:  by cochlea.bu.edu (4.0/4.7)
	id AA02692; Fri, 17 Mar 89 17:38:21 EST
Date:  Fri, 17 Mar 89 17:38:21 EST
From: lwyse at bucasb.bu.edu
Message-Id:  <8903172238.AA02692 at cochlea.bu.edu>
To: sankar at caip.rutgers.edu
Subject: re:questions on Kohonen maps
Status: R

I would be surprised if there was some analytical expression for the
neighborhood and gain functions that was useful in practical applications.
I have found different "best functions" for different input vector
distributions, initial weight distributions, etc.

A related question to yours: What does "ordering" mean when mapping
accross different dimensional spaces? An excerpt from a report on my 
experiences with Kohonen maps:

When the input space and the neighborhood space of the weight vectors are of
different dimension, however, what "ordered" means becomes a sticky wicket.
For example, int Fig. 5.17, Kohonen shows a one-dimensional neigborhood of
weight vecotrs approximating a triangular distribution of inputs with what
he terms a "Peano-like" curve. But this type of curve folds in on itself in
an attempt to fill the space and thus moves points that may be far from each
other in their one-D neighborooh, but be maximally responsive to very close
input points. Is this "ordered"? He doesn't seem to address this point
directly. A point I would like to bring out is that in these situations
where the dimension of the input space and the dimension of the neighborhood
differ, whether or not the wheight-vector chain crosses itself is {\em not}
necessarily the important metric for measuring the ability of the weights
to approximate the input space. That is, there is not necessarily a correlation
between neighborhood-chain crossings, and the mean squared error of the
weight vector approximations of the input points. It is true, however, that
if the neighborhood chain crosses itself, then {\em there exists} a better
approximation to the input space.

-lonce

>From risto at cs.ucla.edu Sat Mar 18 02:59:46 1989
Received: from Oahu.CS.UCLA.EDU by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA14191; Sat, 18 Mar 89 02:59:35 EST
Return-Path: <risto>
Received: by oahu.cs.ucla.edu (Sendmail 5.59/2.16)
	id AA02486; Fri, 17 Mar 89 23:14:45 PST
Date: Fri, 17 Mar 89 23:14:45 PST
From: risto at cs.ucla.edu (Risto Miikkulainen)
Message-Id: <8903180714.AA02486 at oahu.cs.ucla.edu>
To: sankar at caip.rutgers.edu
In-Reply-To: ananth sankar's message of Thu, 16 Mar 89 09:42:44 EST <8903161442.AA14983 at caip.rutgers.edu>
Subject: questions on kohonen's maps
Reply-To: risto at cs.ucla.edu
Organization: UCLA Computer Science Department
Physical-Address: 3677 Boelter Hall
Status: R

   Date: Thu, 16 Mar 89 09:42:44 EST
   From: ananth sankar <sankar at caip.rutgers.edu>

   1	Is there any analytical expression for the neighbourhood and gain
	   functions? I have seen a simulation were careful tweaking after
	   every so many iterations produces a correctly evolving map. This
	   is obviously not a proper approach.
The trick is to start with a neighborhood large enough. For 10x10, a
radius of 8 units might be appropriate. Then reduce the radius
gradually (e.g. over a few thousand inputs) to 1 or even to 0.

   3	In Kohonen's book "Self Organization and Associative Memory", Ch 5
	   the algorithm for weight adaptation does not produce normalized
	   weights. Thus the output nodes cannot function as simply as taking
	   a dot product of inputs and weights. They have to execute a distance
	   calculation.
True. The original idea was to form the "activity bubble" with lateral
inhibition and change the weights by "redistribution of synaptic
resources". This neurologically plausible algorithm gave way to an
abstraction which uses distance, global selection and difference.
(I did some work comparing these two algorithms; I can send you the tech
report if you want to look at it. At least it has the parameters that work)

   5	Can the net become disordered after ordering is achieved at any
	   particular iteration? 
Kohonen proved (in ch 5) that this cannot happen (in the 1-d case) for
the abstract algorithm. This is a big problem for the biologically
plausible algorithm though.

>From djb at flash.bellcore.com Sat Mar 18 23:38:41 1989
Received: from flash.bellcore.com by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA27190; Sat, 18 Mar 89 23:38:32 EST
Received: by flash.bellcore.com (5.58/1.1)
	id AA06742; Sat, 18 Mar 89 23:38:10 EST
Date: Sat, 18 Mar 89 23:38:10 EST
From: djb at flash.bellcore.com (David J Burr)
Message-Id: <8903190438.AA06742 at flash.bellcore.com>
To: sankar at caip.rutgers.edu
Subject: Feature Map Learning
Status: R

Your questions regarding the feature map algorithm are ones that have also
concerned me.  I have been experimenting with a form of this elastic mapping
algorithm since about 1979. My early experiments were focussed on using such
an adaptive process to map handwritten characters onto reference characters
in an attempt to automate a form of elastic template matching.  The algorithm
I came up with was one which used nearest neighbor "attractors" to "pull" 
an elastic map into shape by an interative process.  I defined a window
or smoothing kernel which had a Gaussian shape as opposed to the bos
(box) shape commonly used in self organized mapping. My algorithm resembled
the Kohonen feature map classifier that you referred to in your email.

The gaussian kernel has advantages over the box kernel in that aliasing
distortion can be reduced.  This is similar to the use of Hamming windows
in the design of fast fourier transforms.  

With regard to your first and second questions, we have found that the 
actual window size and gain parameters can take on a number of different 
schedule shapes and give similar results.  It is important that window
size decrease very gradually to avoid to early committment to a particular
vector.  This is particularly important in the mapping of highly distorted
characters where a rapid schedule could cause a feature in one character
to map to the "wrong" feature in the reference character.  Gaussian
windows were the choice for that problem, since they guaranteed very
smooth maps.

You are right that a parameter schedule that works for one problem
may be poorly suited to a different problem.  We have recently applied
the feature map model to the traveling salesman problem and reported
some of our results at ICNN-88.  A one-dimensional version of the elastic
map ( a rubber band ) seems best suited to this problem.  We found that
there was a particular analytic form of the gain schedule which worked
well for this problem.  Window size, on the other hand, seemed to benefit
best from a feedback schedule in which the degree of progress toward the
solution served as input to set an appropriate window size.  I have
results studying some 700 different learning trials on 30-100 city
problems using this method.  Performance is considerable better than
the Hopfield-Tank solution.

Yes, it seems as though one needs distance calculation as the input for
this model, rather than dot product as used in back-propagation nets.

I would be happy to mail you some papers describing my implementation of
feature map learning model.  The first article appeared in Computer Graphics
and Image Processing Journal, 1981, entitled "A Dynamic Model for Image
Registration".  The recent work on traveling salesman was also reported
at last year's Snowbird meeting in addition to ICNN-88.  Please feel free
to correspond with me as I consider this a very interesting topic.

Best Wishes,
D. J. Burr
djb at bellcore.com

>From @relay.cs.net:tony at ifi.unizh.ch Mon Mar 20 03:12:51 1989
Received: from RELAY.CS.NET by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA02795; Mon, 20 Mar 89 03:12:46 EST
Received: from relay2.cs.net by RELAY.CS.NET id ab08738; 20 Mar 89 4:55 EST
Received: from switzerland by RELAY.CS.NET id ae29120; 20 Mar 89 4:48 EST
Received: from ean by scsult.SWITZERLAND.CSNET id a011717; 20 Mar 89 9:45 WET
Date: 19 Mar 89 21:45 +0100
From: tony bell <tony at ifi.unizh.ch>
To: sankar at caip.rutgers.edu
Mmdf-Warning:  Parse error in original version of preceding line at SWITZERLAND.CSNET
Message-Id: <342:tony at ifi.unizh.ch>
Subject: Top Maps
Status: R

You should see Ritter & Schulten's paper in IEEE ICNN proceedings 1988
(San Diego) for expressions answering question 1. Another paper from Helge Ritt
er
deals with the convergence properties. This was submitted to Biol Cybernetics
but maybe you should write to him at the University of Illinois where he
is now.

Tony Bell, Univ of Zurich

>From djb at flash.bellcore.com Mon Mar 20 17:51:22 1989
Received: from flash.bellcore.com by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA18086; Mon, 20 Mar 89 17:51:14 EST
Received: by flash.bellcore.com (5.58/1.1)
	id AA25760; Mon, 20 Mar 89 17:51:18 EST
Date: Mon, 20 Mar 89 17:51:18 EST
From: djb at flash.bellcore.com (David J Burr)
Message-Id: <8903202251.AA25760 at flash.bellcore.com>
To: sankar at caip.rutgers.edu
Subject: Self-Organized Mapping
Status: R

There has been interest on the net recently in some of the questions that
you posed in your recent mail.  I have personally received comments
regarding the neighborhood functions and whether there is an appropriate
analytic form.  My comments were summarized in my recent mailing to you.
If you get additional responses, I would certainly appreciate hearing about
peoples' experiences.  Would you consider posting a summary to the net?

I did not comment on your questions 4 and 5.  It seems that the neighbors-
matching-to-neighbors observation comes about as a result rather than
an input constraint.  In my 1981 paper on elastic matching of images I
used a more extended pattern matcher (area template insteat of a point-to-
point nearest neighbor) for gray scale images.  This tended to enforce the
constraint that you observed at the input level.  Unfortunately, I am not
sure what its generalization would be for non-image patterns (N-D instead of2-D).  

I have done all my experiments on elastic mapping of fixed patterns as opposed
to point distributions.  There was no problem of a map being undone after it
converged.  Have you had such problems with your speech data?  I have been
told that when the distributions are stochastic or sampled, that there is
even stronger need to proceed slowly.  Apparently one sampled point can
pull the map in one direction and this must be counterbalanced by opposing
samples pulling the other way to maintain stability of the map.  This 
unfortunately takes lots of computer cycles.

Hoping to hear from you.

Dave Burr

>From Connectionists-Request at q.cs.cmu.edu Mon Mar 20 18:01:41 1989
Received: from Q.CS.CMU.EDU by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA18228; Mon, 20 Mar 89 18:01:34 EST
Received: from cs.cmu.edu by Q.CS.CMU.EDU id aa23263; 20 Mar 89 14:41:25 EST
Received: from XEROX.COM by CS.CMU.EDU; 20 Mar 89 14:39:19 EST
Received: from Semillon.ms by ArpaGateway.ms ; 20 MAR 89 11:26:12 PST
Date: 20 Mar 89 11:25 PST
From: chrisley.pa at xerox.com
Subject: Re: questions on kohonen's maps
In-Reply-To: ananth sankar <sankar at caip.rutgers.edu>'s message of Thu, 16
 Mar 89 09:42:44 EST
To: ananth sankar <sankar at caip.rutgers.edu>
Cc: connectionists at cs.cmu.edu, chrisley.pa at xerox.com
Message-Id: <890320-112612-6136 at Xerox>
Status: R

Ananth Sankar recently asked some questions about Kohonen's feature maps.
As I have worked on these issues with Kohonen, I feel like I might be able
to give some answers, but standard disclaimers apply:  I cannot be certain
that Kohonen would agree with all of the following.  Also, I do not have my
copy of his book with me, so I cannot be more specific about refrences.

Questions:

1	Is there any analytical expression for the neighbourhood and gain
	functions? I have seen a simulation were careful tweaking after
	every so many iterations produces a correctly evolving map. This
	is obviously not a proper approach.

Although there is probably more than one, correct, task-independent gain or
neighborhood function, Kohonen does mention constraints that all of them
should meet.  For example, both functions should decrease to zero over
time.  I do not know of any tweaking; Kohonen usually determines a number
of iterations and then decreases the gain linearly.  If you call this
tweaking, then your idea of domain-independent parameters might be a sort
of holy grail, since it does not seem likely that we are going to find a
completely parameter-free learning algorithm that will work in every
domain.

2	Even if good results are available for particular functions for
	the uniform distribution input case, it is not clear to me that these
	same functions would result in good classification for some other
	problem. I have attempted to use these maps for word classification
	using LPC coeffs as features.

As far as I know, Kohonen has used the same type of gain and neighborhood
functions for all of his map demonstrations.  These demonstrations, which
have been shown via an animated film at several major conferences,
demonstrate maps learning the distribution in cases where 1) the
dimensionality of the network topology and the input space mismatch, e.g.,
where the network is 2d and the distribution is a 3d 'cactus'; 2) the
distribution is not uniform.  The algorithm was developed with these 2
cases in mind, so it is no surprise that the results are good for them as
well.

3	In Kohonen's book "Self Organization and Associative Memory", Ch 5
	the algorithm for weight adaptation does not produce normalized
	weights. Thus the output nodes cannot function as simply as taking
	a dot product of inputs and weights. They have to execute a distance
	calculation.

That's right.  And Kohonen usually uses the Euclidean distance metric,
although other ones can be used (which he discusses in the book)
Furthermore, there have been independent efforts to normalize weights in
Kohonen maps so that the dot product measure can be used.  If you have any
doubts about the suitability of the Euclidean metric, as your question
seems to imply, express them.  It is an interesting issue.

4	I have not seen as yet in the literature any reports on
	how the fact that neighbouring nodes respond to similar patterns
	from a feature space can be exploited.

The primary interest in maps, I believe, came from a desire to display
high-dimensional information in low dimensional spaces, which are more
easily apprehended.  But there is evidence that there are other uses as
well:  1) Kohonen has published results on using maps for phoneme
recognition, where the topology-preservation plays a significant role (such
maps are used in the Otaniemi Phonetic Typewriter featured in, I think,
Computer magazine a year or two agao.); 2)  work has been done on using the
topology to store sequential information, which seems to be a good idea if
you are dealing with natural signals that can only temporally shift from a
state to similar states; 3)  several people have followed Kohonen's
suggestion of using maps for adaptive kinematic representations for robot
control (the work on Murphy, mentioned on this net a month or so ago, and
the work being done at Carlton (sp) University by Darryl Graf are two good
examples).  In short, just look at some ICNN or INNS proceedings, and
you'll find many papers where researchers found Kohonen maps to be a good
place from which to launch their own studies.

5	Can the net become disordered after ordering is achieved at any
	particular iteration? 

Of course, this is theoretically possible, and is almost certain if at some
point the distribution of the mapped function changes.  But this brings up
the difficult question:  what is the proper ordering in such a case?
Should a net try to integrate both past and present distributions, or
should it throw away the past on concentrate on the present?  I think nost
nn researchers would want a litlle of both, woth maybe some kind of
exponential decay in the weights.  But in many applications of maps, there
is no chance of the distribution changing:  it is fixed, and iterations are
over the same test data each time.  In this case, I would guess that the
ordering could not becone disrupted (at least for simple distributions and
a net of adequate size), but I realise that there is no proof of this, and
the terms 'simple' and 'adequate' are lacking definition.  But that's life
in nnets for you!

If anyone has any more questions, feel free.

Ron Chrisley

Xerox PARC System Sciences Lab
3333 Coyote Hill Road
Palo Alto, CA 94304
USA

chrisley.pa at xerox.com
tel: (415) 494-4728

OR

New College
Oxford OX1 3BN
UK

chrisley at vax.oxford.ac.uk
tel: (865) 279-492

>From chrisley.pa at xerox.com Thu Mar 23 15:00:13 1989
Received: from Xerox.COM by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA22224; Thu, 23 Mar 89 15:00:04 EST
Received: from Semillon.ms by ArpaGateway.ms ; 23 MAR 89 11:35:27 PST
Date: 23 Mar 89 11:35 PST
From: chrisley.pa at xerox.com
Subject: Re: questions on kohonen's maps
In-Reply-To: ananth sankar <sankar at caip.rutgers.edu>'s message of Thu, 16
 Mar 89 09:42:44 EST
To: ananth sankar <sankar at caip.rutgers.edu>
Cc: connectionists at cs.cmu.edu
Message-Id: <890323-113527-4949 at Xerox>
Status: R

One further note about Ananth Sankar's questions about Kohonen maps:

A friend of mine, Tony Bell, tells me (and Ananth) that Helge Ritter has 
a "neat set of expressions for the learning rate and neighbourhood size
parameters... and he also proves something about congergence elsewhere."

Unfortunately, I do not as yet have a reference for the papers, but I have
liked Ritter's work in the past, so I thought people on the net might be
interested.

>From Connectionists-Request at q.cs.cmu.edu Fri Mar 24 11:52:18 1989
Received: from Q.CS.CMU.EDU by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA20326; Fri, 24 Mar 89 11:52:13 EST
Received: from ri.cmu.edu by Q.CS.CMU.EDU id aa17597; 24 Mar 89 8:48:01 EST
Received: from BU-IT.BU.EDU by RI.CMU.EDU; 24 Mar 89 08:41:54 EST
Received: from COCHLEA.BU.EDU by bu-it.BU.EDU (5.58/4.7)
	id AA06449; Tue, 21 Mar 89 13:58:32 EST
Received:  by cochlea.bu.edu (4.0/4.7)
	id AA04927; Tue, 21 Mar 89 13:59:02 EST
Date:  Tue, 21 Mar 89 13:59:02 EST
From: lwyse at bucasb.bu.edu
Message-Id:  <8903211859.AA04927 at cochlea.bu.edu>
To: connectionists at ri.cmu.edu
In-Reply-To: connectionists at c.cs.cmu.edu's message of 20 Mar 89 23:47:09 GMT
Subject: Re: questions on kohonen's maps
Organization: Center for Adaptive Systems, B.U.
Status: R

What does "ordering" mean when your projecting inputs to a lower dimensional
space? For example, the "Peano" type curves that result from a one-D
neighborhood learning a 2-D input distribution, it is obviously NOT 
true that nearby points in the input space maximally activate nearby
points on the neighborhood chain. In this case, it is not even clear
that "untangling" the neighborhood is of utmost importance, since a 
tangled chain can still do a very good job of divvying up the space
almost equally between its nodes. 

-lonce

>From @relay.cs.net:tony at ifi.unizh.ch Fri Mar 24 13:30:26 1989
Received: from RELAY.CS.NET by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA23163; Fri, 24 Mar 89 13:30:12 EST
Received: from relay2.cs.net by RELAY.CS.NET id ab09426; 24 Mar 89 12:01 EST
Received: from switzerland by RELAY.CS.NET id aa01417; 24 Mar 89 11:55 EST
Received: from ean by scsult.SWITZERLAND.CSNET id a011335; 24 Mar 89 17:53 WET
Date: 24 Mar 89 17:51 +0100
From: tony bell <tony at ifi.unizh.ch>
To: sankar at caip.rutgers.edu
Mmdf-Warning:  Parse error in original version of preceding line at SWITZERLAND.CSNET
Message-Id: <352:tony at ifi.unizh.ch>
Status: R

In case anyone else asks (or Ron sends any more vague messages to the
net), here are all the refs I have on Helge Ritter's work on topological maps:

[1]"Kohonen's Self-Organizing Maps: exploring their computational capabilities"
in Proc. IEEE ICNN 1988, San Diego.

[2]"Convergence Properties of Kohonen's Topology Conserving Maps: fluctuations,
stability and dimension selection" submitted to Biol. Cybernetica.

[3] "Extending Kohonen's self-organising mapping algorithm to learn Ballistic
Movements" in the book "Neural Computers" Eckmiller & von der Malsburg (eds)

[4] "Topology conserving mappings for learning motor tasks" in the book "Neural
Networks for Computing" Denker (ed) AIP Conf. proceedings, Snowbird, 1986.

The second one in particular uses some heavy statistical techniques (the inputs
are seen as a Markov process and a Fokker-Planck equation describes the learn-
ing) in order to prove that the map will reach equilibrium when the learning 
rate is time dependant (ie: it decays). Ritter's PhD thesis covers all his work,
but it's in German. Now, Ritter is at the University of Illinois. I hope
this helps you and I don't mind if you post this to the net if you think
people are interested enough.

yours,

Tony Bell.

>From Connectionists-Request at q.cs.cmu.edu Fri Mar 24 22:07:14 1989
Received: from Q.CS.CMU.EDU by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA23834; Fri, 24 Mar 89 22:07:06 EST
Received: from ri.cmu.edu by Q.CS.CMU.EDU id aa22170; 24 Mar 89 13:28:20 EST
Received: from MAUI.CS.UCLA.EDU by RI.CMU.EDU; 24 Mar 89 13:26:10 EST
Return-Path: <gblee at CS.UCLA.EDU>
Received: by maui.cs.ucla.edu (Sendmail 5.59/2.16)
	id AA25252; Fri, 24 Mar 89 10:25:07 PST
Date: Fri, 24 Mar 89 10:25:07 PST
From: Geunbae Lee <gblee at cs.ucla.edu>
Message-Id: <8903241825.AA25252 at maui.cs.ucla.edu>
To: lwyse at bucasb.bu.edu
Subject: Re: questions on konhonen's map
Cc: connectionists at ri.cmu.edu
Status: R

>What does "ordering" mean when your projecting inputs to a lower dimensional
>space? 
It means topological ordering

>For example, the "Peano" type curves that result from a one-D
>neighborhood learning a 2-D input distribution, it is obviously NOT 
>true that nearby points in the input space maximally activate nearby
>points on the neighborhood chain. 
It depends on what you mean by "near by" If it is near by in 
relative sense (in topological relation), not absolute sense, then
the nearby points in the input space DOES maximally activate nearby
points on the neighborhood chain.

--Geunbae Lee
  AI Lab, UCLA

>From Connectionists-Request at q.cs.cmu.edu Sat Mar 25 02:26:12 1989
Received: from Q.CS.CMU.EDU by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA26264; Sat, 25 Mar 89 02:26:06 EST
Received: from ri.cmu.edu by Q.CS.CMU.EDU id aa25584; 24 Mar 89 17:55:35 EST
Received: from XEROX.COM by RI.CMU.EDU; 24 Mar 89 17:53:44 EST
Received: from Semillon.ms by ArpaGateway.ms ; 24 MAR 89 14:53:32 PST
Date: 24 Mar 89 14:53 PST
From: chrisley.pa at xerox.com
Subject: Re: questions on kohonen's maps
In-Reply-To: lwyse at bucasb.BU.EDU's message of Tue, 21 Mar 89 13:59:02 EST
To: lwyse at bucasb.bu.edu
Cc: connectionists at ri.cmu.edu
Message-Id: <890324-145332-8519 at Xerox>
Status: R

Lonce (lwyse at bucasb.BU.EDU) writes:

"What does "ordering" mean when your projecting inputs to a lower
dimensional space? For example, the "Peano" type curves that result from a
one-D neighborhood learning a 2-D input distribution, it is obviously NOT 
true that nearby points in the input space maximally activate nearby
points on the neighborhood chain." 

It is not true that nearby points in input space are always mapped to
nearby points in the output space when the mapping is dimensionality
reducing, agreed.  But 'ordering' still makes sense.  The map is
topology-preserving if the dependency is in the other direction, i.e., if
nearby points in output space are always activated by nearby points in
input space.

Lonce goes on to say:

"In this case, it is not even clear that "untangling" the neighborhood is
of utmost importance, since a tangled chain can still do a very good job of
divvying up the space almost equally between its nodes."

I agree that topology preservation is not necessarily of utmost importance,
but it may be useful in some applications, such as the ones I mentioned a
few messages back (phoneme recognition, inverse kinematics, etc.).  Also,
there is 1) the interest in properties of self-organizing systems in
themselves, even though an application can't be immediately found; and 2)
the observation that for some reason the brain seems to use topology
preserving maps (with the one-way dependency I mentioned above), which,
although they *could* be computationally unnecessary or even
disadvantageous, are probably in fact, nature being what she is, good
solutions to tough real time problems. 

Ron Chrisley
After April 14th, please send personal email to Chrisley at vax.ox.ac.uk

>From Connectionists-Request at q.cs.cmu.edu Sun Mar 26 03:40:59 1989
Received: from Q.CS.CMU.EDU by caip.rutgers.edu (5.59/SMI4.0/RU1.0/3.03) 
	id AA19433; Sun, 26 Mar 89 03:40:47 EST
Received: from cs.cmu.edu by Q.CS.CMU.EDU id aa07032; 26 Mar 89 1:22:01 EST
Received: from CGL.UCSF.EDU by CS.CMU.EDU; 26 Mar 89 01:18:16 EST
Received: from phyb.ucsf.EDU by cgl.ucsf.EDU (5.59/GSC4.16)
	id AA07448; Sat, 25 Mar 89 22:18:01 PST
Received: by phyb (1.2/GSC4.15)
	id AA08352; Sat, 25 Mar 89 22:17:59 pst
Date: Sat, 25 Mar 89 22:17:59 pst
From: Ken Miller <ken at phyb.ucsf.edu>
Message-Id: <8903260617.AA08352 at phyb>
To: Connectionists at cs.cmu.edu
Subject: Normalization of weights in Kohonen algorithm
Status: R

re point 3 of recent posting about Kohonen algorithm: 

"3	In Kohonen's book "Self Organization and Associative Memory", Ch 5
	the algorithm for weight adaptation does not produce normalized
	weights."

the algorithm

du_{ij}/dt = a(t)[e_j(t) - u_{ij}(t)], i in N_c

where u = weights, e is input pattern, N_c is topological neighborhood of
maximally responding neighborhood, should I believe be written

du_{ij}/dt = a(t)[ e_j(t)/\sum_k(e_k(t)) - u_{ij}(t)/\sum_k(u_{ik}(t)) ], 
i in N_c.

That is, the change should be such as to move the jth synaptic weight on the
ith cell, as a PROPORTION of all the synaptic weights on the ith cell, in the
direction of matching the PROPORTION of input which was incoming on the jth
line.  Note that in this case \sum_j du_{ij}(t)/dt = 0, so weights remain
normalized in the sense that sum over each cell remains constant.

If inputs are normalize to sum to 1 (\sum_k(e_k(t)) = 1) then the first
denominator can be omitted.  If weights begin normalized to sum to 1 on each
cell ( \sum_k(u_{ik}(t)) = 1 for all i) then weights will remain normalized
to sum to 1, hence the second denominator can be omitted.  Perhaps Kohonen
was assuming these normalizations and hence dispensing with the denominators?

ken miller (ken at phyb.ucsf.edu)