choose your own randomized regularizer
Grace Wahba
wahba at stat.wisc.edu
Wed Jan 13 23:32:29 EST 1993
Very interesting request.. !!
I'm convinced (as you seem to be) that some interesting
results are to be obtained using CV or GCV in the
context of neural nets. In my book are brief discussions
of how GCV can be used in certain nonlinear inverse
problems (Sect 8.3), and when one is doing penalized
likelihood with non-Gaussian data (Sect 9.2).
(No theory is given, however).
Finbarr O'Sullivan (finbarr at stat.washington.edu)
has further results on problems like those in Sect 8.3.
However, I have not seen any theoretical results in the
context of sigmoidal feedforward networks (but that
sure would be interesting!!). However, if you make
a local quadratic approximation to an optimization
problem to get a local linear approximation to the
influence operator (which plays the role of A(\lambda)),
then you have to decide where you are going to take
your derivatives. In my book on page 113 (equation (9.2.19)
I make a suggestion as to where to
take the derivatives , but I later
got convinced that that was not the best way
to do it. Chong Gu,`Cross-Validating Non-Gaussian Data',
J. Computational and Graphical Statistics 1, 169-179, June, 1992
has a discussion of what he (and I) believe is a better way,
in that context. That context doesn't look at all like
neural nets, I only mention this in case you
get into some proofs in the neural network context -
in that event I think you may have to worry about
where you differentiate and Gu's arguments may be valid
more generally..
As far as missing any theoretical result due to not having my
book, the only theoretical cross validation result discussed
in any detail is that in Craven and Wahba(1979) which
has been superceded by the work of Li, Utreras and Andrews.
As far as circulating your request to the net do go right
ahead- I will be very interested in any answers you get!!
\bibitem[Wahba 1990]
Wahba,Grace. 1990.
"Spline Models for Observational Data"
v. 59 in the CBMS-NSF Regional Conference
Series in Applied Mathematics,
SIAM, Philadelphia, PA, March 1990.
Softcover, 169 pages, bibliography, author index.
ISBN 0-89871-244-0
ORDER INFO FOR WAHBA 1990:
==========================
List Price $24.75, SIAM or CBMS* Member Price $19.80
(Domestic 4th class postage free, UPS or Air extra)
May be ordered from SIAM by mail, electronic mail, or phone:
SIAM
P. O. Box 7260
Philadelphia, PA 19101-7260
USA
service at siam.org
Toll-Free 1-800-447-7426 (8:30-4:45 Eastern Standard Time,
the US only.
Regular phone: (215)382-9800
FAX (215)386-7999
May be ordered on American Express, Visa or Mastercard,
or paid by check or money order in US dollars,
or may be billed (extra charge).
CBMS member organizations include AMATC, AMS, ASA, ASL, ASSM,
IMS, MAA, NAM, NCSM, ORSA, SOA and TIMS.
============================================================
REFERENCES:
===========
\bibitem[Li 86]
Li, Ker-Chau. 1986.
``Asymptotic optimality of $C_{L}$ and generalized
cross-validation in ridge regression with
application to spline smoothing.''
{\em The Annals of Statistics}.
{\bf 14}, 3, 1101-1112.
\bibitem[Li 87]
Li, Ker-Chau. 1987.
``Asymptotic optimality for $C_{p}$, $C_{L}$,
cross-validation, and generalized cross-validation:
discrete index set.''
{\em The Annals of Statistics}.
{\bf 15}, 3, 958-975.
\bibitem[Utreras 87]
Utreras, Florencio I. 1987.
``On generalized cross-validation for
multivariate smoothing spline functions.''
{\em SIAM J. Sci. Stat. Comput.}
{\bf 8}, 4, July 1987.
\bibitem[Andrews 91]
Andrews, Donald W.K. 1991.
``Asymptotic optimality of generalized
$C_{L}$, cross-validation, and generalized
cross-validation in regression with heteroskedastic
errors.''
{\em Journal of Econometrics}. {\bf 47} (1991) 359-377.
North-Holland.
\bibitem[Bowman 80]
Bowman, Adrian W. 1980.
``A note on consistency of the kernel method for
the analysis of categorical data.''
{\em Biometrika} (1980), {\bf 67}, 3, pp. 682-4.
\bibitem[Hall 83]
Hall, Peter. 1983.
``Large sample optimality of least squares cross-validation
in density estimation.''
{\em The Annals of Statistics}.
{\bf 11}, 4, 1156-1174.
Stone, Charles J. 1984
``An asymptotically optimal window selection rule
for kernel density estimates.''
{\em The Annals of Statistics}.
{\bf 12}, 4, 1285-1297.
\bibitem[Stone 59]
Stone, M. 1959.
``Application of a measure of information
to the design and comparison of regression experiments.''
{\em Annals Math. Stat.} {\bf 30} 55-69
\bibitem[Marron 87]
Marron, M. 1987.
``A comparison of cross-validation techniques in density estimation.''
{\em The Annals of Statistics}.
{\bf 15}, 1, 152-162.
\bibitem[Bowman etal 84]
Bowman, Adrian W., Peter Hall, D.M. Titterington. 1984.
``Cross-validation in nonparametric estimation of
probabilities and probability densities.''
{\em Biometrika} (1984), {\bf 71}, 2, pp. 341-51.
\bibitem[Bowman 84]
Bowman, Adrian W. 1984.
``An alternative method of cross-validation for the
smoothing of density estimates.''
{\em Biometrika} (1984), {\bf 71}, 2, pp. 353-60.
\bibitem[Stone 77]
Stone, M. 1977.
``An asymptotic equivalence of choice of model by
cross-validation and Akaike's criterion.''
{\em J. Roy. Stat. Soc. Ser B}, {\bf 39}, 1, 44-47.
\bibitem[Stone 76]
Stone, M. 1976.
"Asymptotics for and against cross-validation"
??
More information about the Connectionists
mailing list