seperability and unbalanced data discussion

Fri Nov 11 06:29:38 EST 1988

In a close inspection of convergence ailments afflicting a multilayer net,
I found that the problem boiled down to a layer which needed to learn the
separable AND function, but wasn't.  So I had a close look at the LMS error
function for AND, in terms of the the weights from each of the two inputs, 
the bias weight, and the multiplicities of each of the 4 exemplars in the 
truth table.  It turns out that the error can not be made exactly 0 (with
finite weights), so minimization of the error involves a tradeoff between
the contributions of the 4 exemplars, and this tradeoff is strongly influenced
by the multiplicities.  It is not difficult to find the minimum analytically
in this problem, so I was able to verify that with my highly unbalanced
training data, the actual minimum was precisely where the LMS algorithm had
terminated, miles away from a reasonable solution for AND.  I also found that
balanced data puts the minimum where it "belongs".

The relative importance of the different exemplars in the LMS error function
runs as the square root of the ratio of their multiplicities.  So I solved
my particular problem by turning to a quartic error function, for which it
is the 4th root of this ratio that matters.  (The p-norm, p-th root of the
sum of the p-th powers, approaches the MAX norm as p approaches infinity, and
4 is much closer to infinity than 2.)

   ---Richard Rohwer, CSTR, Edinburgh