backprop for classification

Ron Chrisley chrisley at parc.xerox.com
Mon Aug 20 13:35:08 EDT 1990


Xiru, you wrote:

"While we trained a standard backprop network for some classification task
(one output unit for each class), we found that when the classes are not
evenly distribed in the training set, e.g., 50% of the training data belong
to one class, 10% belong to another, ... etc., then the network always
biased towards the classes that have the higher percentage in the training set.
Thus, we had to post-process the output of the network, giving more weights
to the classes that occur less frequently (in reverse proportion to their
population)."

My suggestion:  most BP classification paradigms will work best if you are
using the same distribution for training as for testing.  So only worry
about uneven distribution of classes in the training data if the input on
which the network will have to perform does not have that distribution.  If
rocks are 1000 times more common than mines, then given that something is
completely qualitatively ambiguous with respect to the rock/mine
distinction, it is best (in terms of minimizing # of misclassifications) to
guess that the thing is a rock.  So being biased toward rock
classifications is a valid way to minimize misclassification.  (Of course,
once you start factoring in cost, this will be skewed dramatically:  it is
much better to have a false alarm about a mine than to falsely think a mine
is a rock.)

In summary, uneven distributions aren't, in themselves, bad for training,
nor do they require any post-processing.  However, distributions that
differ from real-world ones will require some sort of post-processing, as
you have done.

But there is another issue here, I think.

How were you using the network for classification?  From your message, it
sounds like you were training and interpreting the network in such a way
that the activations of the output nodes were supposed to correspond to the
conditional probabilities of the different classes, given the input.  This
would explain what you meant by your last sentence in the above quote.

But there are other ways of using back-propagation.  For instance, if one
does not constrain the network to estimate conditional probabilities, but
instead has it solve the more general problem of minimizing classification
error, then it is possible that the network will come up with a solution
that is not affected by differences of prior probabilities of classes in the
training and testing data.  Since it is not solving the problem by
classifying via maximum liklihood, its solutions will be based on the
frequency-independent, qualitative structure of the inputs.

In fact, humans often do something like this.  The phenomenon is called
"base rate neglect".  The phenomenon is notorious in that when qualitative
differences are not so marked between a rare and a common class, humans
will always over-classify inputs into the rare class.  That is, if the
symptoms a patient has even *slightly* indicate a rare tropical disease
over a common cold, humans will give the rare disease dignosis, even
though it is extremely unlikely that the patient has that disease.  Of
course, the issue of cost is again being ignored here.  (See Gluck and Bower
for a look at the relation between neural networks and base rate neglect).

Such limitations aside, classification via means other than conditional
probability estimation may be desirable for certain applications.  For
example, those in which you do not know the priors, or they change
dramatically in an unpredictable way.  And/or where there is a strong
qualitative division bewteen members of the classes.

In such cases, you might get good classification performance, even when
the distributions differ, by relying more on qualitative differences in the
inputs than in the frequency of the classes.

Does this sound right?

Ron Chrisley	chrisley at csli.stanford.edu
Xerox PARC SSL					New College
Palo Alto, CA 94304				Oxford OX1 3BN, UK
(415) 494-4728					(865) 793-484







More information about the Connectionists mailing list