Connectionists: A Framework for Kernel Regularization, TR 1107
Grace Wahba
wahba at stat.wisc.edu
Fri May 6 19:04:30 EDT 2005
Announcing the paper:
A Framework for Kernel Regularization with Application
to Protein Clustering
Fan Lu, Sunduz Keles, Stephen J. Wright and Grace Wahba
University of Wisconsin-Madison Statistics Dept TR 1107, May 2005.
available at:
http://www.stat.wisc.edu/~wahba -> TRLIST
or
http://www.stat.wisc.edu/~wahba/ftp1/tr1107.pdf
Abstract
We develop and apply a novel framework which is
designed to extract information in the form of a positive
definite kernel matrix from possibly crude, noisy, incomplete,
inconsistent dissimilarity information between pairs of objects,
obtainable in a variety of contexts. Any positive definite
kernel defines a consistent set of distances, and
the fitted kernel provides a set of coordinates in Euclidean space which
attempt to respect the information available, while controlling
for complexity of the kernel. The resulting set of coordinates
are highly appropriate for visualization and as input
to classification and clustering algorithms.
The framework is formulated in terms of a class
of optimization problems which can be solved efficiently
using modern convex cone programming software.
The power of the method is illustrated in the context of protein
clustering based on primary sequence data. An application to the globin family of
proteins resulted in a readily visualizable 3D sequence space of globins, where
several sub-families and sub-groupings consistent with
the literature were easily identifiable. Included in the framework
is an algorithm for placing new objects in the coordinate
space of the training set.
Keywords: Regularized Kernel Estimation, positive definite matrices,
noisy dissimilarity data, modern convex cone programming, protein
clustering, globin family, support vector machines, classification.
More information about the Connectionists
mailing list