SS-ANOVA for `soft classification'

Grace Wahba wahba at stat.wisc.edu
Tue Feb 28 21:17:22 EST 1995


Announcing:

Smoothing Spline ANOVA for Exponential Families, with 
Application to the Wisconsin Epidemiological Study of 
Retinopathy. by Grace Wahba, Yuedong Wang, Chong Gu, 
Ronald Klein, MD and Barbara Klein, MD. UWisconsin-
Madison Statistics Dept TR 940, Dec. 1994 (WWGKK)

ftp:  ftp.stat.wisc.edu/pub/wahba/exptl.ssanova.ps.gz
Mosaic: http://www.stat.wisc.edu/~wahba/wahba.html - 
	   then click on ftp
.....
GRKPACK: Fitting Smoothing Spline ANOVA Models for 
Exponential Families. by  Yuedong Wang. UWisconsin-
Madison Statistics Dept TR 942, Jan. 1995. (GRKPACK-doc)

ftp:  ftp.stat.wisc.edu/pub/wahba/grkpack.ps.gz
Mosaic: http://www.stat.wisc.edu/~wahba/wahba.html - 
	   then click on ftp
......
In WWGKK we develop Smoothing Spline ANOVA (SS-ANOVA) 
models for estimating the probability that an instance 
(subject)  will be in class 1 as opposed to class 0, 
given a vector of predictor variables t (`soft' classification). 
We observe {y_i, t(i), i = 1,..,n}
where y_i is 1 or 0 according as subject i's response 
is `success' or `failure', and t(i) is a vector of 
predictor variables for the i-th subject. Letting 
p(t) be the probability that a subject whose predictor variables are t, 
has a `success' response, we estimate p(t) = exp{f(t)}/(1 + exp{f(t)}} 
from this data using a smoothing spline ANOVA representation 
of f. An ANOVA representation gives f as a sum of functions 
of one variable (main effects) plus sums of functions 
of two variables (two -factor interactions) ...etc. 
This representation provides an interpretable alternative 
to a neural net. The following issues are addressed in this paper;
 (1) Methods for deciding which terms in the ANOVA decomposition
  to include (model selection),
 (2) Methods for choosing good values of the regularization 
 (smoothing) parameters, which control the bias-variance tradeoff,
 (3) Methods for making confidence statements concerning the 
     estimate,
 (4) Numerical algorithms for the calculations, 
and, finally, 
 (5) Public software (GRKPACK). 
The overall scheme is applied to data from 
the Wisconsin Epidemiologic Study of Diabetic Retinopathy
(WESDR) to model the risk of progression of diabetic retinopathy
{`success'} as a function of glycosylated hemoglobin, 
duration of diabetes and body mass index {t}. Cross sectional 
plots provide interpretable information about these risk factors.
This paper provided the basis for Grace Wahba's Neyman Lecture.
A preliminary version appeared in NIPS-6.

GRKPACK-doc provides documentation for the code GRKPACK, 
which implements (2)-(4) above.

The code for GRKPACK is available in netlib in the file
gcv/grkpack.shar. It is recommended that 
it be retrieved via Mosaic: http://www.netlib.org goto
The Netlib Repository, goto gcv, 
rather than via the robot mailserver, which may subdivide the file.

Included in GRKPACK are several examples including the 
analysis described in WWGKK and the WESDR data. 
Comments and suggestions concerning the code are requested 
to be sent to Yuedong Wang yuedong at umich.edu.


More information about the Connectionists mailing list