Case frequency versus case distance during learning
Michael Egmont-Petersen
michael at dasy.cbs.dk
Thu Nov 14 08:59:03 EST 1991
Dear connectionists
During learning the gradient methods use a mix of case similarity
(euclidian) and case frequency to direct each step in the
hyperspace. Each weight is changed with a certain magnitude:
delta W = (Target-Output) * f'(Act) * f(Act-1)
(for Back-propagation)
to adjust weights between the (last) hidden layer and the output
layer. I have wondered how much emphasis Back-propagation puts on
"case similarity" (euclidian) while determining delta W.
The underlying problem is the following:
* How large a role plays the number of cases (in each
category) compared to their (euclidian) (dis)similarity in
adjusting the weights?
It is a relevant question to pose because other learning
algorithms such as ID3 *only* rely on case frequencies and NOT on
distance between patterns within a cluster as well as distance
between patterns belonging to different clusters.
My question might already have been answered by some one in a
paper. Is this the case, then don't bother the other
connectionists with it, but mail me directly. Otherwise, it is a
highly relevant question to pose, because the input
representation then plays a role for how fast a network learns
and furthermore its ability to generalize.
Best regards
Michael Egmont-Petersen
Institute for Computer and Systems Sciences
Copenhagen Business School
DK-1925 Frb. C.
Denmark
E-mail: michael at dasy.cbs.dk
More information about the Connectionists
mailing list