Adding noise to training data

Mon Nov 4 09:42:53 EST 1991

At first blush, it seems there's a close relationship between Parzen
estimation (Duda & Hart 1973) and training with noise added to the
samples.  If we were to use the noise function as the window function
in Parzen estimation of the distribution from which the training set
was drawn, wouldn't we would obtain precisely the noisy-sample
distribution?

And wouldn't a network minimizing squared error for the noisy training
set asymptotically realize (i.e., as the number of noisy sample
presentations approaches infinity) the Parzen estimator?  The
results of Hampshire and Perlmutter (1990) seem to be relevant here.

> So adding noise to improve generalization is something of an act of
> desperation in the face of uncertainty... uncertainty about what kind
> and how complex a classifier to build, uncertainty about the PDF of the
> data being classified... uncertainty about lots of things.

I agree.  But perhaps the "act of desperation" is of a familiar sort.

Tom English

Duda, R. O., and P. E. Hart.  1973.  Pattern Classification and Scene
Analysis.  New York:  Wiley & Sons.

Hampshire, J. B., and B. A. Perlmutter.  1990?.  Equivalence proofs for
multi-layer perceptron classifiers and Bayesian discriminant function.
In Proc. 1990 Connectionist Models Summer School.  [Publisher?]