new paper in JMLR: The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces

David 'Pablo' Cohn David.Cohn at acm.org
Wed Nov 13 10:55:15 EST 2002


[cross-posted to connectionists at the request of the authors - for
information on subscribing to the jmlr-announce mailing list, please visit
www.jmlr.org]

The Journal of Machine Learning Research is pleased to announce the
availability of a new paper online at http://www.jmlr.org.

----------------------------------------
The Subspace Information Criterion for
Infinite Dimensional Hypothesis Spaces
Masashi Sugiyama and Klaus-Robert Muller
JMLR 3(Nov):323-359, 2002

Abstract

A central problem in learning is selection of an appropriate model. This is
typically done by estimating the unknown generalization errors of a set of
models to be selected from and then choosing the model with minimal
generalization error estimate. In this article, we discuss the problem of
model selection and generalization error estimation in the context of kernel
regression models, e.g., kernel ridge regression, kernel subset regression
or Gaussian process regression. Previously, a non-asymptotic generalization
error estimator called the subspace information criterion (SIC) was
proposed, that could be successfully applied to finite dimensional subspace
models. SIC is an unbiased estimator of the generalization error for the
finite sample case under the conditions that the learning target function
belongs to a specified reproducing kernel Hilbert space (RKHS) H and the
reproducing kernels centered on training sample points span the whole space
H. These conditions hold only if dim H < l, where l < infinity is the number
of training examples. Therefore, SIC could be applied only to finite
dimensional RKHSs. In this paper, we extend the range of applicability of
SIC, and show that even if the reproducing kernels centered on training
sample points do not span the whole space H, SIC is an unbiased estimator of
an essential part of the generalization error. Our extension allows the use
of any RKHSs including infinite dimensional ones, i.e., richer function
classes commonly used in Gaussian processes, support vector machines or
boosting. We further show that when the kernel matrix is invertible, SIC can
be expressed in a much simpler form, making its computation highly
efficient. In computer simulations on ridge parameter selection with real
and artificial data sets, SIC is compared favorably with other standard
model selection techniques for instance leave-one-out cross-validation or an
empirical Bayesian method.

----------------------------------------

This is the 13th paper in Volume 3. It, and all previous papers, are
available electronically at http://www.jmlr.org/ in PostScript and PDF
formats. Many are also available in HTML. The papers of Volume 1 and 2 are
also available in hardcopy from the MIT Press; please see
http://mitpress.mit.edu/JMLR for details.

 -David Cohn, <David.Cohn at acm.org>
  Managing Editor, Journal of Machine Learning Research





More information about the Connectionists mailing list