Case frequency versus case distance during learning

Thu Nov 14 08:59:03 EST 1991

Dear connectionists

During learning the gradient methods use a mix of case similarity
(euclidian) and case frequency to direct each step in the
hyperspace. Each weight is changed with a certain magnitude:

    delta W = (Target-Output) * f'(Act) * f(Act-1)
    (for Back-propagation)

to adjust weights between the (last) hidden layer and the output
layer. I have wondered how much emphasis Back-propagation puts on
"case similarity" (euclidian) while determining delta W.

The underlying problem is the following:

  *  How large a role plays the number of cases (in each
     category) compared to their (euclidian) (dis)similarity in
     adjusting the weights?

It is a relevant question to pose because other learning
algorithms such as ID3 *only* rely on case frequencies and NOT on
distance between patterns within a cluster as well as distance
between patterns belonging to different clusters.

My question might already have been answered by some one in a
paper. Is this the case, then don't bother the other
connectionists with it, but mail me directly. Otherwise, it is a
highly relevant question to pose, because the input
representation then plays a role for how fast a network learns
and furthermore its ability to generalize.

Best regards
Michael Egmont-Petersen

Institute for Computer and Systems Sciences
Copenhagen Business School
DK-1925 Frb. C.
Denmark

E-mail: michael at dasy.cbs.dk