submission
Grace Wahba
wahba at stat.wisc.edu
Sun Jan 31 20:20:30 EST 1993
I would like to submit the following to connectionists -
thanks much!
****************
This is to announce two papers in the neuroprose archive:
1) Soft Classification, a.k.a. Penalized Log Likelihood and
Smoothing Spline Analysis of Variance
by Grace Wahba, Chong Gu, Yuedong Wang and Rick Chappell
to appear in the proceedings of the Santa Fe Workshop on Supervised
Machine Learning, August 1992, D. Wolpert and A. Lapedes, eds.
also partly presented at CLNL*92.
2) Smoothing Spline ANOVA with Component-Wise Bayesian
`Confidence Intervals'
by Chong Gu and Grace Wahba,
to appear, J. Computational and Graphical Statistics
wahba at stat.wisc.edu, chong at pop.stat.purdue.edu
wang at stat.wisc.edu, chappell at stat.wisc.edu
Below are the abstracts followed by instructions for retrieving the papers.
Grace Wahba
----------------------------------------------------------------------
Soft Classification, a.k.a. Penalized Log Likelihood and Smoothing
Spline Analysis of Variance
G. Wahba, C. Gu, Y. Wang and R. Chappell
We discuss a class of methods for the problem of `soft' classification
in supervised learning. In `hard' classification, it is assumed that
any two examples with the same attribute vector will always be in the
same class, (or have the same outcome), whereas in `soft' classification
two examples with the same attribute vector do not necessarily have the
same outcome, but the *probability* of a particular outcome does depend
on the attribute vector. In this paper we will describe a family of
methods which are well suited for the estimation of this probability.
The method we describe will produce, for any value in a (reasonable)
region of the attribute space, an estimate of the probability that
the next example with that value of its attribute vector
will be in class 1. Underlying these methods is an assumption
that this probability varies in a smooth way (to be defined)
as the predictor variables vary. The method combines results from
Penalized log likelihood estimation, Smoothing splines, and
Analysis of variance, to get the PSA class of methods. In the process
of describing PSA we discuss some issues concerning the computation of
degrees of freedom for signal, which has wider ramifications for the
minimization of generalization error in machine learning. As an
illustration we apply the method to the Pima-Indian Diabetes data set
in the UCI Repository, and compare the results to Smith et. al. (1988)
who used the ADAP learning algorithm on this same data set to forecast
the onset of diabetes mellitus. If the probabilities we obtain are
thresholded to make a hard classification to compare with the hard
classification of Smith et. al. the results are very similar, however
the intermediate probabilities that we obtain provide useful and inter-
pretable information on how the risk of diabetes varies with some of
the risk factors.
...........................
Smoothing Spline ANOVA with Component-Wise Bayesian `Confidence
Intervals'
C. Gu and G. Wahba
We study a multivariate smoothing spline estimate of a function of
several variables, based on an ANOVA decomposition as sums of main
effect functions (of one variable), two-factor interaction functions
(of two variables), etc. We derive the Bayesian `confidence intervals'
of Wahba(1983) for the components of this decomposition and demonstrate
that, even with multiple smoothing parameters, they can be efficiently
computed using the publicly available code RKPACK, which was originally
designed just to compute the estimates. We carry out a small Monte
Carlo study to see how closely the actual properties of these
component-wise confidence intervals match their nominal confidence
levels. Lastly, we analyze some lake acidity data as a function of
calcium concentration, latitude, and longitude, using both polynomial
and thin plate spline main effects in the same model.
-----------------------------------------------------------------------------
To retrieve these files from the neuroprose archive:
unix> ftp archive.cis.ohio-state.edu
Name (archive.cis.ohio-state.edu:wahba): anonymous
Password: (use your email address)
ftp> binary
ftp> cd pub/neuroprose
ftp> get wahba.soft-class.ps.Z
200 PORT command successful.
150 Opening BINARY mode data connection for wahba.soft-class.ps.Z
.
ftp> get wahba.ssanova.ps.Z
.
221 Goodbye.
unix> uncompress wahba.soft-class.ps.Z
unix> lpr wahba.soft-class.ps
unix> uncompress wahba.ssanova.ps.Z
unix> lpr wahba.ssanova.ps
..
Thanks to Jordan Pollack for maintaining the archive.
More information about the Connectionists
mailing list