choose your own randomized regularizer

Grace Wahba wahba at stat.wisc.edu
Wed Jan 13 23:32:29 EST 1993


Very interesting request.. !!
I'm convinced (as you seem to be) that some interesting 
results are to be obtained using CV or  GCV in the 
context of neural nets. In my book are brief discussions 
of how GCV can be used in certain nonlinear inverse 
problems (Sect 8.3), and when one is doing penalized 
likelihood with non-Gaussian data (Sect 9.2). 
(No theory is given, however). 
Finbarr O'Sullivan (finbarr at stat.washington.edu) 
has further results on problems like those in Sect 8.3. 
However, I have not seen any theoretical results in the 
context of sigmoidal feedforward networks (but that 
sure would be interesting!!). However, if you make 
a local quadratic approximation to an optimization 
problem to get a local linear approximation to the 
influence operator (which plays the role of A(\lambda)), 
then you have to decide where you are going to take 
your derivatives. In my book on page 113 (equation (9.2.19)
I make a suggestion as to where to 
take the derivatives , but I later 
got convinced that that was not the best way 
to do it.  Chong Gu,`Cross-Validating Non-Gaussian Data', 
J. Computational and Graphical Statistics 1, 169-179, June, 1992 
has a discussion of what he (and I) believe is a better way, 
in that context. That context doesn't look at all like
neural nets, I only mention this in case you 
get into some proofs in the neural network context - 
in that event I think you may have to worry about
where you differentiate and Gu's arguments may be valid 
more generally..

As far as missing any theoretical result due to not having my 
book, the only theoretical cross validation result discussed 
in any detail is that in Craven and Wahba(1979) which
has been superceded by the work of Li, Utreras and Andrews.

As far as circulating your request to the net do go right 
ahead- I will be very interested in any answers you get!!



\bibitem[Wahba 1990]
Wahba,Grace. 1990. 
"Spline Models for Observational Data" 
v. 59 in the CBMS-NSF Regional Conference
Series in Applied Mathematics, 
SIAM, Philadelphia, PA, March 1990. 
Softcover, 169 pages, bibliography, author index.
ISBN 0-89871-244-0

ORDER INFO FOR WAHBA 1990:
==========================

List Price $24.75, SIAM or CBMS* Member Price $19.80
(Domestic 4th class postage free, UPS or Air extra)

May be ordered from SIAM by mail, electronic mail, or phone:

SIAM
P. O. Box 7260
Philadelphia, PA 19101-7260
USA

service at siam.org

Toll-Free 1-800-447-7426 (8:30-4:45 Eastern Standard Time, 
   the US only.
Regular phone:  (215)382-9800
FAX (215)386-7999

May be ordered on American Express, Visa or Mastercard,
or paid by check or money order in US dollars, 
or may be billed (extra charge).

CBMS member organizations include AMATC, AMS, ASA, ASL, ASSM, 
IMS, MAA, NAM, NCSM, ORSA, SOA and TIMS.

============================================================



REFERENCES:
===========

\bibitem[Li 86]
Li, Ker-Chau. 1986.
``Asymptotic optimality of $C_{L}$ and generalized
cross-validation in ridge regression with
application to spline smoothing.''
{\em The Annals of Statistics}.
{\bf 14}, 3, 1101-1112.
 
\bibitem[Li 87]
Li, Ker-Chau. 1987.
``Asymptotic optimality for $C_{p}$, $C_{L}$,
cross-validation, and generalized cross-validation:
discrete index set.''
{\em The Annals of Statistics}.
{\bf 15}, 3, 958-975.
 
\bibitem[Utreras 87]
Utreras, Florencio I.  1987.
``On generalized cross-validation for
multivariate smoothing spline functions.''
{\em SIAM J. Sci. Stat. Comput.}
{\bf 8}, 4, July 1987.
 
\bibitem[Andrews 91]
Andrews, Donald W.K. 1991.
``Asymptotic optimality of generalized
$C_{L}$, cross-validation, and generalized
cross-validation in regression with heteroskedastic
errors.''
{\em Journal of Econometrics}. {\bf 47} (1991) 359-377.
North-Holland.
 
\bibitem[Bowman 80]
Bowman, Adrian W.  1980.
``A note on consistency of the kernel method for
the analysis of categorical data.''
{\em Biometrika} (1980), {\bf 67}, 3, pp. 682-4.
 
\bibitem[Hall 83]
Hall, Peter.  1983.
``Large sample optimality of least squares cross-validation
in density estimation.''
{\em The Annals of Statistics}.
{\bf 11}, 4, 1156-1174.
 

Stone, Charles J. 1984
``An asymptotically optimal window selection rule
for kernel density estimates.''
{\em The Annals of Statistics}.
{\bf 12}, 4, 1285-1297.

\bibitem[Stone 59]
Stone, M. 1959.
``Application of a measure of information
to the design and comparison of regression experiments.''
{\em Annals Math. Stat.} {\bf 30} 55-69

\bibitem[Marron 87]
Marron, M. 1987.
``A comparison of cross-validation techniques in density estimation.''
{\em The Annals of Statistics}.
{\bf 15}, 1, 152-162.

\bibitem[Bowman etal 84]
Bowman, Adrian W., Peter Hall, D.M. Titterington.  1984.
``Cross-validation in nonparametric estimation of
probabilities and probability densities.''
{\em Biometrika} (1984), {\bf 71}, 2, pp. 341-51.

\bibitem[Bowman 84]
Bowman, Adrian W. 1984.
``An alternative method of cross-validation for the
smoothing of density estimates.''
{\em Biometrika} (1984), {\bf 71}, 2, pp. 353-60.

\bibitem[Stone 77]
Stone, M. 1977.
``An asymptotic equivalence of choice of model by
cross-validation and Akaike's criterion.''
{\em J. Roy. Stat. Soc. Ser B}, {\bf 39}, 1, 44-47.

\bibitem[Stone 76]
Stone, M. 1976.
"Asymptotics for and against cross-validation"
??



More information about the Connectionists mailing list