Connectionists: NIPS'05 Workshop on The Accuracy-Regularization Frontier
Nathan Srebro
nati at mit.edu
Fri Sep 23 13:23:43 EDT 2005
NIPS Workshop on
The Accuracy-Regularization Frontier
Friday, December 9th, 2005
Westin Resort and SPA, Whistler, BC, Canada
http://www.cs.toronto.edu/~nati/Front/
CALL FOR CONTRIBUTIONS
A prevalent approach in machine learning for achieving good
generalization performance is to seek a predictor that, on one hand,
attains low empirical error, and on the other hand, is "simple", as
measured by some regularizer, and so guaranteed to generalize well.
Consider, for example, support vector machines, where one seeks a
linear classifier with low empirical error and low L2-norm
(corresponding to a large geometrical margin). The precise trade-off
between the empirical error and the regularizer (e.g. L2-norm) is not
known. But since we would like to minimize both, we can limit our
attention only to extreme solutions, i.e. classifiers such that one
cannot reduce both the empirical error and the regularizer (norm).
Considering the set of attainable (error,norm) combinations, we are
interested only in the extreme "frontier" (or "regularization path")
of this set. The typical approach is to evaluate classifiers along
the frontier on held-out validation data (or cross validate) and
choose the classifier minimizing the validation error.
Classifiers along the frontier are typically found by minimizing some
parametric combination of the empirical error and the regularizer,
e.g. norm^2+C*err, for varying C, in the case of SVMs. Different
values of C yield different classifiers along the frontier and C can be
thought of as parameterizing the frontier. This particular parametric
function of the empirical error and the regularizer is chosen because
it leads to a convenient optimization problem, but minimizing any
other monotone function of the empirical error and regularizer (in
this case, the L2-norm) would also lead to classifiers on the
frontier.
Recently, methods have been proposed for obtaining the entire frontier
in computation time that is comparable to obtaining a single
classifier along the frontier.
The proposed workshop is concerned with optimization and statistical
issues related to viewing the entire frontier, rather than a single
predictor along it, as an object of interest in machine learning.
Specific issues to be addressed include:
1. Characterizing the "frontier" in a way independent of a specific
trade-off, and its properties as such, e.g. convexity, smoothness,
piecewise linearity/polynomial behavior.
2. What parametric trade-offs capture the entire frontier? Minimizing
any monotone trade-off leads to a predictor on the frontier, but what
conditions must be met to ensure all predictors along the frontier
are obtained when the regularization parameter is varied? Study of
this question is motivated by scenarios in which minimizing a
non-standard parametric trade-off leads to a more convenient
optimization problem.
3. Methods for obtaining the frontier:
3a. Direct methods relying on a characterization, e.g. Hastie et al's
(2004) work on the entire regularization path of Support vector
Machines.
3b. Warm-restart continuation methods (slightly changing the
regularization parameter and initializing the optimizer to the
solution of the previous value of the parameter). How should one
vary the regularization parameter in order to guarantee never to be
too far away from the true frontier? In a standard optimization
problem, one ensures a solution within some desired distance from
the optimal solution. Analogously, when recovering the entire
frontier, it would be desirable to seek a frontier which is always
within some desired distance in the (error,regularizer) space from
the true frontier.
3c. Predictor-corrector methods: when the frontier is a
differentiable manifold, warm-restart methods can be improved by
using a first order approximation of the manifold to predict where
the frontier should be for an updated value of the frontier
parameter.
4. Interesting generalization or uses of the frontier, e.g.:
- The frontier across different kernels
- Higher dimensional frontiers when more than two parameters are
considered
5. Formalizing and providing guarantees for the standard practice of
picking a classifier along the frontier using a hold-out set (this is
especially important for more than two objectives). In some
regression cases there are detailed inferences that can be done on
the frontier --- for Ridge it is well established whereas for Lasso,
Efron et al (2004), and more recently Zou et al (2004), establish
degrees of freedom along the frontier, yielding generalization error
estimates.
The main goal of the workshop is to open up research in these
directions, establishing the important questions and issues to be
addressed, and introducing to the NIPS community relevant approaches
for multi-objective optimization.
CONTRIBUTIONS
We invite presentations addressing any of the above issues, or other
related issues. We welcome presentations of completed work or
work-in-progress, as well as position statements, papers discussing
potential research directions and surveys of recent developments.
SUBMISSION INSTRUCTIONS
If you would like to present in the workshop, please send an abstract
in plain text (preferred), postscript or PDF (Microsoft Word documents
will not be opened) to frontier at cs.toronto.edu as soon as possible,
and no later than October 23rd, 2005.
The final program will be posted in early November.
Workshop organizing committee:
Nathan Srebro, University of Toronto
Alexandre d'Aspremont, Princeton University
Francis Bach, Ecole des Mines de Paris
Massimiliano Pontil, University College London
Saharon Rosset, IBM T.J. Watson Research Center
Katya Scheinberg, IBM T.J. Watson Research Center
For further information, please email
frontier at cs.toronto.edu
or visit
http://www.cs.toronto.edu/~nati/Front
More information about the Connectionists
mailing list