SS-ANOVA for `soft classification'
Grace Wahba
wahba at stat.wisc.edu
Tue Feb 28 21:17:22 EST 1995
Announcing:
Smoothing Spline ANOVA for Exponential Families, with
Application to the Wisconsin Epidemiological Study of
Retinopathy. by Grace Wahba, Yuedong Wang, Chong Gu,
Ronald Klein, MD and Barbara Klein, MD. UWisconsin-
Madison Statistics Dept TR 940, Dec. 1994 (WWGKK)
ftp: ftp.stat.wisc.edu/pub/wahba/exptl.ssanova.ps.gz
Mosaic: http://www.stat.wisc.edu/~wahba/wahba.html -
then click on ftp
.....
GRKPACK: Fitting Smoothing Spline ANOVA Models for
Exponential Families. by Yuedong Wang. UWisconsin-
Madison Statistics Dept TR 942, Jan. 1995. (GRKPACK-doc)
ftp: ftp.stat.wisc.edu/pub/wahba/grkpack.ps.gz
Mosaic: http://www.stat.wisc.edu/~wahba/wahba.html -
then click on ftp
......
In WWGKK we develop Smoothing Spline ANOVA (SS-ANOVA)
models for estimating the probability that an instance
(subject) will be in class 1 as opposed to class 0,
given a vector of predictor variables t (`soft' classification).
We observe {y_i, t(i), i = 1,..,n}
where y_i is 1 or 0 according as subject i's response
is `success' or `failure', and t(i) is a vector of
predictor variables for the i-th subject. Letting
p(t) be the probability that a subject whose predictor variables are t,
has a `success' response, we estimate p(t) = exp{f(t)}/(1 + exp{f(t)}}
from this data using a smoothing spline ANOVA representation
of f. An ANOVA representation gives f as a sum of functions
of one variable (main effects) plus sums of functions
of two variables (two -factor interactions) ...etc.
This representation provides an interpretable alternative
to a neural net. The following issues are addressed in this paper;
(1) Methods for deciding which terms in the ANOVA decomposition
to include (model selection),
(2) Methods for choosing good values of the regularization
(smoothing) parameters, which control the bias-variance tradeoff,
(3) Methods for making confidence statements concerning the
estimate,
(4) Numerical algorithms for the calculations,
and, finally,
(5) Public software (GRKPACK).
The overall scheme is applied to data from
the Wisconsin Epidemiologic Study of Diabetic Retinopathy
(WESDR) to model the risk of progression of diabetic retinopathy
{`success'} as a function of glycosylated hemoglobin,
duration of diabetes and body mass index {t}. Cross sectional
plots provide interpretable information about these risk factors.
This paper provided the basis for Grace Wahba's Neyman Lecture.
A preliminary version appeared in NIPS-6.
GRKPACK-doc provides documentation for the code GRKPACK,
which implements (2)-(4) above.
The code for GRKPACK is available in netlib in the file
gcv/grkpack.shar. It is recommended that
it be retrieved via Mosaic: http://www.netlib.org goto
The Netlib Repository, goto gcv,
rather than via the robot mailserver, which may subdivide the file.
Included in GRKPACK are several examples including the
analysis described in WWGKK and the WESDR data.
Comments and suggestions concerning the code are requested
to be sent to Yuedong Wang yuedong at umich.edu.
More information about the Connectionists
mailing list