questions on kohonen's maps

Mon Mar 20 14:25:00 EST 1989

Ananth Sankar recently asked some questions about Kohonen's feature maps.
As I have worked on these issues with Kohonen, I feel like I might be able
to give some answers, but standard disclaimers apply:  I cannot be certain
that Kohonen would agree with all of the following.  Also, I do not have my
copy of his book with me, so I cannot be more specific about refrences.

Questions:

1	Is there any analytical expression for the neighbourhood and gain
	functions? I have seen a simulation were careful tweaking after
	every so many iterations produces a correctly evolving map. This
	is obviously not a proper approach.

Although there is probably more than one, correct, task-independent gain or
neighborhood function, Kohonen does mention constraints that all of them
should meet.  For example, both functions should decrease to zero over
time.  I do not know of any tweaking; Kohonen usually determines a number
of iterations and then decreases the gain linearly.  If you call this
tweaking, then your idea of domain-independent parameters might be a sort
of holy grail, since it does not seem likely that we are going to find a
completely parameter-free learning algorithm that will work in every
domain.

2	Even if good results are available for particular functions for
	the uniform distribution input case, it is not clear to me that these
	same functions would result in good classification for some other
	problem. I have attempted to use these maps for word classification
	using LPC coeffs as features.

As far as I know, Kohonen has used the same type of gain and neighborhood
functions for all of his map demonstrations.  These demonstrations, which
have been shown via an animated film at several major conferences,
demonstrate maps learning the distribution in cases where 1) the
dimensionality of the network topology and the input space mismatch, e.g.,
where the network is 2d and the distribution is a 3d 'cactus'; 2) the
distribution is not uniform.  The algorithm was developed with these 2
cases in mind, so it is no surprise that the results are good for them as
well.

3	In Kohonen's book "Self Organization and Associative Memory", Ch 5
	the algorithm for weight adaptation does not produce normalized
	weights. Thus the output nodes cannot function as simply as taking
	a dot product of inputs and weights. They have to execute a distance
	calculation.

That's right.  And Kohonen usually uses the Euclidean distance metric,
although other ones can be used (which he discusses in the book)
Furthermore, there have been independent efforts to normalize weights in
Kohonen maps so that the dot product measure can be used.  If you have any
doubts about the suitability of the Euclidean metric, as your question
seems to imply, express them.  It is an interesting issue.

4	I have not seen as yet in the literature any reports on
	how the fact that neighbouring nodes respond to similar patterns
	from a feature space can be exploited.

The primary interest in maps, I believe, came from a desire to display
high-dimensional information in low dimensional spaces, which are more
easily apprehended.  But there is evidence that there are other uses as
well:  1) Kohonen has published results on using maps for phoneme
recognition, where the topology-preservation plays a significant role (such
maps are used in the Otaniemi Phonetic Typewriter featured in, I think,
Computer magazine a year or two agao.); 2)  work has been done on using the
topology to store sequential information, which seems to be a good idea if
you are dealing with natural signals that can only temporally shift from a
state to similar states; 3)  several people have followed Kohonen's
suggestion of using maps for adaptive kinematic representations for robot
control (the work on Murphy, mentioned on this net a month or so ago, and
the work being done at Carlton (sp) University by Darryl Graf are two good
examples).  In short, just look at some ICNN or INNS proceedings, and
you'll find many papers where researchers found Kohonen maps to be a good
place from which to launch their own studies.

5	Can the net become disordered after ordering is achieved at any
	particular iteration? 

Of course, this is theoretically possible, and is almost certain if at some
point the distribution of the mapped function changes.  But this brings up
the difficult question:  what is the proper ordering in such a case?
Should a net try to integrate both past and present distributions, or
should it throw away the past on concentrate on the present?  I think nost
nn researchers would want a litlle of both, woth maybe some kind of
exponential decay in the weights.  But in many applications of maps, there
is no chance of the distribution changing:  it is fixed, and iterations are
over the same test data each time.  In this case, I would guess that the
ordering could not becone disrupted (at least for simple distributions and
a net of adequate size), but I realise that there is no proof of this, and
the terms 'simple' and 'adequate' are lacking definition.  But that's life
in nnets for you!

If anyone has any more questions, feel free.

Ron Chrisley

Xerox PARC System Sciences Lab
3333 Coyote Hill Road
Palo Alto, CA 94304
USA

chrisley.pa at xerox.com
tel: (415) 494-4728

OR

New College
Oxford OX1 3BN
UK

chrisley at vax.oxford.ac.uk
tel: (865) 279-492