Methods for improving generalization (was Re: some questions on ...)

Sun Feb 6 17:22:17 EST 1994

Dear Mr. Grossman,

	I read with great interest your analysis of overlearning and about
your research into achieving better generalization with less data.

	However, I only want to point out an ommision in your background
despcription.  In the abstract of your paper "Use of Bad Training Data For
Better Predictions" you write:

>Use of noise sensitivity signatures is distinctly different from other schemes
>to avoid overtraining, such as cross-validation, which uses only part of the
>training data, or various penalty functions, which are not data-adaptive.
>Noise sensitivity signature methods use all of the training data and
>are manifestly data-adaptive and non-parametric.

When you say penalty functions the first thing which comes to mind is a
penalty on the sum of squared weights.  This method is indeed not
data-adaptive.  However, an interesting article in Neural Computation 4, pp. 
473-493, "Simplifying Neural Networks by Soft Weight-Sharing" proposes a
weight penalty method which is adaptive.  Basically, the weights are grouped
together in Gaussian clusters whose mean and variance are allowed to adapt to
the data.  The experimental results they published show improvement over both
cross-validation and weight decay.  

I am looking forward to reading your paper when it is available.

Yours Respectfully,

	Craig Hicks

Craig Hicks           hicks at cs.titech.ac.jp | Kore ya kono  Yuku mo kaeru mo
Ogawa Laboratory, Dept. of Computer Science | Wakarete wa   Shiru mo shiranu mo
Tokyo Institute of Technology, Tokyo, Japan |  	    Ausaka no seki        
lab:03-3726-1111 ext.2190 home:03-3785-1974 |  (from hyaku-nin-issyu)
fax: +81(3)3729-0685 (from abroad) 
     03-3729-0685  (from Japan)