backprop for classification

John.Hampshire@SPEECH2.CS.CMU.EDU John.Hampshire at SPEECH2.CS.CMU.EDU
Sun Aug 19 13:48:06 EDT 1990


Xiru Zhang of Thinking Machines Corp. writes:

> While we trained a standard backprop network for some classification task
> (one output unit for each class), we found that when the classes are not
> evenly distribed in the training set, e.g., 50% of the training data belong
> to one class, 10% belong to another, ... etc., then the network always biased
> towards the classes that have the higher percentage in the training set.
> Thus, we had to post-process the output of the network, giving more weights
> to the classes that occur less frequently (in reverse proportion to their
> population). 
> 
> I wonder if other people have encountered the same problem, and if  there
> are better ways to deal with this problem.

Indeed, one can show that any classifier with sufficient
functional capacity to model the class-conditional densities
of the random vector X being classified (e.g., a MLP with sufficient
connectivity to perform the input-to-output functional mapping
necessary for robust classification) and trained with a "reasonable
error measure" (a term originated by B. Pearlmutter)
will yield outputs that are accurate estimates
of the a posteriori probabilities of X, given an asymptotically
large number of statistically independent training samples.
Examples of "reasonable error measures" are mean-squared
error (the one used by Xiru Zhang), Cross Entropy, Max. Mutual Info.,
Kullback-Liebler distance, Max. Likelihood...

Unfortunately, one never has enough training data, and
it's not always clear what constitutes sufficient but not
excessive functional capacity in the classifier.  So one
ends up *estimating* the a posterioris with one's
"reasonable error measure"-trained classifier.  If one trains
one's classifier with a disproportionately high number of
samples belonging to one particular class, one will get
precisely the behavior Xiru Zhang describes.

**************
This is because the a posterioris depend on the class priors
(you can prove this easily using Bayes' rule).  If you
bias the priors, you will bias the a posterioris accordingly.
Your classifier will therefore learn to estimate the biased
a posterioris.
**************

The best way to fix the problem if you're using a
"reasonable error measure" to train your classifier
is to have a training set that reflects the true class
priors.  If this isn't possible,
then you can post-process the classifier's outputs by
correcting for the biased priors.  Whether or not this fix
really works depends a lot on the classifier you're using.
MLPs tend to be over-parameterized, so they tend to yield
binary outputs that won't be affected by this kind of post
processing.

Another approach might be to avoid using "reasonable error
measures" to train your classifier.  I have more info regarding
such alternatives if anyone cares, but I've already blabbed too much.
If you want refs., please send me email directly.

Cheers,

John










More information about the Connectionists mailing list