Connectionists: NIPS'05 Workshop on The Accuracy-Regularization Frontier

Fri Sep 23 13:23:43 EDT 2005

                           NIPS Workshop on

                 The Accuracy-Regularization Frontier

                      Friday, December 9th, 2005
             Westin Resort and SPA, Whistler, BC, Canada

                http://www.cs.toronto.edu/~nati/Front/

                        CALL FOR CONTRIBUTIONS

A prevalent approach in machine learning for achieving good
generalization performance is to seek a predictor that, on one hand,
attains low empirical error, and on the other hand, is "simple", as
measured by some regularizer, and so guaranteed to generalize well.
Consider, for example, support vector machines, where one seeks a
linear classifier with low empirical error and low L2-norm
(corresponding to a large geometrical margin).  The precise trade-off
between the empirical error and the regularizer (e.g. L2-norm) is not
known.  But since we would like to minimize both, we can limit our
attention only to extreme solutions, i.e. classifiers such that one
cannot reduce both the empirical error and the regularizer (norm).
Considering the set of attainable (error,norm) combinations, we are
interested only in the extreme "frontier" (or "regularization path")
of this set.  The typical approach is to evaluate classifiers along
the frontier on held-out validation data (or cross validate) and
choose the classifier minimizing the validation error.

Classifiers along the frontier are typically found by minimizing some
parametric combination of the empirical error and the regularizer,
e.g. norm^2+C*err, for varying C, in the case of SVMs.  Different
values of C yield different classifiers along the frontier and C can be
thought of as parameterizing the frontier.  This particular parametric
function of the empirical error and the regularizer is chosen because
it leads to a convenient optimization problem, but minimizing any
other monotone function of the empirical error and regularizer (in
this case, the L2-norm) would also lead to classifiers on the
frontier.

Recently, methods have been proposed for obtaining the entire frontier
in computation time that is comparable to obtaining a single
classifier along the frontier.

The proposed workshop is concerned with optimization and statistical
issues related to viewing the entire frontier, rather than a single
predictor along it, as an object of interest in machine learning.
Specific issues to be addressed include:

 1. Characterizing the "frontier" in a way independent of a specific
 trade-off, and its properties as such, e.g. convexity, smoothness,
 piecewise linearity/polynomial behavior.

 2. What parametric trade-offs capture the entire frontier? Minimizing
 any monotone trade-off leads to a predictor on the frontier, but what
 conditions must be met to ensure all predictors along the frontier
 are obtained when the regularization parameter is varied?  Study of
 this question is motivated by scenarios in which minimizing a
 non-standard parametric trade-off leads to a more convenient
 optimization problem.

 3.  Methods for obtaining the frontier:

  3a. Direct methods relying on a characterization, e.g. Hastie et al's
  (2004) work on the entire regularization path of Support vector
  Machines.

  3b. Warm-restart continuation methods (slightly changing the
  regularization parameter and initializing the optimizer to the
  solution of the previous value of the parameter). How should one
  vary the regularization parameter in order to guarantee never to be
  too far away from the true frontier?  In a standard optimization
  problem, one ensures a solution within some desired distance from
  the optimal solution.  Analogously, when recovering the entire
  frontier, it would be desirable to seek a frontier which is always
  within some desired distance in the (error,regularizer) space from
  the true frontier.

  3c. Predictor-corrector methods: when the frontier is a
  differentiable manifold, warm-restart methods can be improved by
  using a first order approximation of the manifold to predict where
  the frontier should be for an updated value of the frontier
  parameter.

 4.  Interesting generalization or uses of the frontier, e.g.:

  - The frontier across different kernels

  - Higher dimensional frontiers when more than two parameters are
    considered

 5.  Formalizing and providing guarantees for the standard practice of
 picking a classifier along the frontier using a hold-out set (this is
 especially important for more than two objectives).  In some
 regression cases there are detailed inferences that can be done on
 the frontier --- for Ridge it is well established whereas for Lasso,
 Efron et al (2004), and more recently Zou et al (2004), establish
 degrees of freedom along the frontier, yielding generalization error
 estimates.

The main goal of the workshop is to open up research in these
directions, establishing the important questions and issues to be
addressed, and introducing to the NIPS community relevant approaches
for multi-objective optimization.

                            CONTRIBUTIONS

We invite presentations addressing any of the above issues, or other
related issues.  We welcome presentations of completed work or
work-in-progress, as well as position statements, papers discussing
potential research directions and surveys of recent developments.

                       SUBMISSION INSTRUCTIONS

If you would like to present in the workshop, please send an abstract
in plain text (preferred), postscript or PDF (Microsoft Word documents
will not be opened) to frontier at cs.toronto.edu as soon as possible,
and no later than October 23rd, 2005.

The final program will be posted in early November.

Workshop organizing committee:

Nathan Srebro, University of Toronto
Alexandre d'Aspremont, Princeton University
Francis Bach, Ecole des Mines de Paris
Massimiliano Pontil, University College London
Saharon Rosset, IBM T.J. Watson Research Center
Katya Scheinberg, IBM T.J. Watson Research Center

For further information, please email
    frontier at cs.toronto.edu
or visit
    http://www.cs.toronto.edu/~nati/Front