Adding noise to training data

John.Hampshire@SPEECH2.CS.CMU.EDU John.Hampshire at SPEECH2.CS.CMU.EDU
Fri Nov 1 17:41:47 EST 1991


As a follow-on to Geoff's post on this topic...

Adding noise to the training set of any classifier (connectionist or
other, linear or non-linear) has the statistical effect of convolving
the PDF of the noise with the class-conditional densities of the RV
that generated the training samples (assuming the noise and the RV
are independent).  This can (in principle) help generalization,
because we typically have training sets that are so puny, we don't
begin to have a sufficient sample size to estimate with any degree of
precision the a-posteriori class distributions of the RV we're trying
to classify.  As a result, we get estimated a-posteriori distributions for
a training set size of n that are usually n scaled Dirac delta functions 
distributed in feature space (continuous RV case).  For discrete
RV's the estimated distributions are made up of Kronecker deltas...

OK, so if you add noise to that, you're convolving the deltas with the
PDF of the noise (in the limit that you create an infinite number of
noisy versions of each original training vector).  This means that
you have fabricated a NEW set of a-posteriori class distributions ---
one that you hope will yield classification boundaries that are
better estimates of the TRUE a-posteriori class distributions
than all those original deltas.  Whether or not you succeed
depends critically on your choice of the PDF for the noise AND the
covariance matrix of that PDF.  In most cases the critical choice
comes down to a largely arbitrary guess.

So adding noise to improve generalization is something of an act of
desperation in the face of uncertainty... uncertainty about what kind
and how complex a classifier to build, uncertainty about the PDF of the
data being classified... uncertainty about lots of things.

John



More information about the Connectionists mailing list