submission

Grace Wahba wahba at stat.wisc.edu
Sun Jan 31 20:20:30 EST 1993


I would like to submit the following to connectionists - 
thanks much! 

****************
This is to announce two papers in the neuroprose archive: 

1) Soft Classification, a.k.a. Penalized Log Likelihood and 
   Smoothing Spline Analysis of Variance

   by Grace Wahba, Chong Gu, Yuedong Wang and Rick Chappell
 
   to appear in the proceedings of the Santa Fe Workshop on Supervised 
   Machine Learning, August 1992, D. Wolpert and A. Lapedes, eds. 
   also partly presented at CLNL*92. 


2) Smoothing Spline ANOVA with Component-Wise Bayesian 
   `Confidence Intervals' 
          
   by Chong Gu and Grace Wahba, 

   to appear, J. Computational and Graphical Statistics

   wahba at stat.wisc.edu, chong at pop.stat.purdue.edu 
   wang at stat.wisc.edu,  chappell at stat.wisc.edu

Below are the abstracts followed by instructions for retrieving the papers. 
   Grace Wahba
----------------------------------------------------------------------

Soft Classification, a.k.a. Penalized Log Likelihood and Smoothing 
Spline Analysis of Variance

       G. Wahba, C. Gu, Y. Wang and R. Chappell

We discuss a class of methods for the problem of `soft' classification
in supervised learning. In `hard' classification, it is assumed that 
any two examples with the same attribute vector will always be in the 
same class, (or have the same outcome), whereas in `soft' classification
two examples with the same attribute vector do not necessarily have the 
same outcome, but the *probability* of a particular outcome does depend 
on the attribute vector.  In this paper we will describe a family of 
methods which are well suited for the estimation of this probability. 
The method we describe will produce, for any value in a (reasonable)
region of the attribute space, an estimate of the probability that 
the next example with that value of its attribute vector
will be in class 1. Underlying these methods is an assumption
that this probability varies in a smooth way (to be defined)
as the predictor variables vary. The method combines results from 
Penalized log likelihood estimation, Smoothing splines, and 
Analysis of variance, to get the PSA class of methods. In the process
of describing PSA we discuss some issues concerning the computation of 
degrees of freedom for signal, which has wider ramifications for the 
minimization of generalization error in machine learning.  As an 
illustration we apply the method to the Pima-Indian Diabetes data set
in the UCI Repository, and compare the results to Smith et. al. (1988)
who used the ADAP learning algorithm on this same data set to forecast
the onset of diabetes mellitus.  If the probabilities we obtain are 
thresholded to make a hard classification to compare with the hard 
classification of Smith et. al. the results are very similar, however
the intermediate probabilities that we obtain provide useful and inter-
pretable information on how the risk of diabetes varies with some of 
the risk factors.
...........................

Smoothing Spline ANOVA with Component-Wise Bayesian `Confidence
Intervals'

		  C. Gu and G. Wahba

We study a multivariate smoothing spline estimate of a function of
several variables, based on an ANOVA decomposition as sums of main
effect functions (of one variable), two-factor interaction functions
(of two variables), etc. We derive the Bayesian `confidence intervals'
of Wahba(1983) for the components of this decomposition and demonstrate
that, even with multiple smoothing parameters, they can be efficiently 
computed using the publicly available code RKPACK, which was originally
designed just to compute the estimates.  We carry out a small Monte
Carlo study to see how closely the actual properties of these
component-wise confidence intervals match their nominal confidence
levels.  Lastly, we analyze some lake acidity data as a function of
calcium concentration, latitude, and longitude, using both polynomial
and thin plate spline main effects in the same model.  

-----------------------------------------------------------------------------
To retrieve these files from the neuroprose archive:

unix> ftp archive.cis.ohio-state.edu
Name (archive.cis.ohio-state.edu:wahba): anonymous
Password: (use your email address)
ftp> binary
ftp> cd pub/neuroprose
ftp> get wahba.soft-class.ps.Z
200 PORT command successful.
150 Opening BINARY mode data connection for wahba.soft-class.ps.Z
.
ftp> get wahba.ssanova.ps.Z
.
221 Goodbye.
unix> uncompress wahba.soft-class.ps.Z
unix> lpr wahba.soft-class.ps
unix> uncompress wahba.ssanova.ps.Z
unix> lpr wahba.ssanova.ps
..
Thanks to Jordan Pollack for maintaining the archive.



More information about the Connectionists mailing list