Connectionists: NIPS workshop "Theoretical Foundations of Clustering"

Ulrike von Luxburg ulrike.luxburg at ipsi.fraunhofer.de
Tue Oct 4 10:34:38 EDT 2005


################################################################

              CALL FOR CONTRIBUTIONS

                 NIPS workshop on

      THEORETICAL FOUNDATIONS OF CLUSTERING

              Saturday, December 10, 2005
     Westin Resort and SPA, Whistler, BC, Canada


http://www.ipsi.fraunhofer.de/~ule/clustering_workshop_nips05/clustering_workshop_nips05.htm


Submission deadline: October 30, 2005

################################################################

Organizers:
------------

  * Shai Ben-David, University of Waterloo, Canada,
    http://www.cs.uwaterloo.ca/~shai
  * Ulrike von Luxburg, Fraunhofer IPSI, Darmstadt, Germany
    http://www.ipsi.fraunhofer.de/~ule
  * John Shawe-Taylor, University of Southampton, UK
    http://www.ecs.soton.ac.uk/people/jst
  * Naftali Tishby, Hebrew University, Jerusalem, Israel
    http://www.cs.huji.ac.il/~tishby


Clustering is one of the most widely used techniques for
exploratory data analysis. Across all disciplines, from social
sciences over biology to computer science, people try to get a first
intuition about their data by identifying meaningful groups among the
data points. In the past five decades, a wide variety of clustering
algorithms have been developed and applied to a wide range of practical
problems. Despite this large number of algorithms and applications,
the goal of clustering and its proper interpretation remains fuzzy and
vague. There are in fact many different problems that are clustered
together under this single term, from quantization with low distortion
for compression, through various techniques for graph partitioning
whose goals are not fully specified, to methods for revealing hidden
structure and unobserved features in complex data. We are clearly not
talking about a single well defined problem. Moreover, the theoretical
foundations of clustering seem to be distressingly meager, covering
only some sub-domains and failing to address some of the most basic
general aspects of the area. There is not even an agreement among the
researchers on the correct questions to pose, let alone which tools
and analysis techniques should be used to answer those questions. In
our opinion there is an urgent need to initiate a concerted discussion
on these issues, in order to move towards a consolidation of the
theoretical basis for - at least some of the aspects of -
clustering. One prospective benefit of building a theoretical
framework for clustering may come from enabling the transfer of tools
developed in other related domains, such as machine learning and
information theory, where the usefulness of having a general
mathematical framework have been impressively demonstrated. We have
the impression that recently many researchers have become aware of
this need and agree on the importance of these issues.


Questions we wish to address:
-----------------------------

    1. What is clustering? How can it be defined and how can we sort the
       different types of clustering and their goals? In particular:
           * Is the main purpose to use the partition to discover new
             features in the data?
           * Or the other way around, is the main purpose to simplify our
             data by building groups, thus getting rid of unimportant
             information?
           * Is clustering just data compression?
           * Is clustering just estimating modes of a density?
           * Is clustering related to human perception?
           * Can one come up with a meaningful taxonomy of clustering
             tasks?
           * Can we formulate the intuitive notion of "revealing hidden
             structure and properties"?

    2. How should prior knowledge be encoded? As a pair-wise
       similarity/distance function over domain points? As a set of
       relevant features? Should data be embedded in some richer
       structure (Hilbert space, topology) ?

    3. Is there a principled way to measure the quality of a clustering
       on particular data set?
           * Can every clustering task be expressed as an optimization of
             some explicit readily-computable associated objective cost
             function?
           * Can stability be considered a first principle for meaningful
             clustering?

    4. Is there a principled way to measure the quality of a clustering
        algorithm?
           * Necessary conditions
           * Can we come up with sufficient conditions for reasonable
             clustering?
           * Stability conditions
           * Richness conditions
           * What type of performance guarantees can one hope to provide?

    5. What are principled and meaningful ways of measuring the
       similarity (or degree of agreement) between different clusterings?

    6. Can one distinguish "clusterable" data from "structureless" data?

    7. What are the tools we should try to import from other areas such
       as classification prediction, density estimation, data
       compression, computational geometry, other relevant areas?


Contributions:
--------------

We invite presentations addressing one or several of the questions
raised above. To keep the workshop lively we intend to keep the
individual presentations short, at most 15 minutes. We welcome
presentations about work in all "stages of completion", ranging from
completed work over work in progress to discussing potential
directions of future research. In particular we encourage position
papers. We would like to stress that the focus of this workshop is on
*foundations* of clustering. We are not interested in contributions
about "yet another ad-hoc clustering algorithm".

Please submit an extended abstract (at most two pages) summarizing your
potential contribution to clustering_workshop at ipsi.fraunhofer.de.

    *** The deadline is the 30th October 2005. ***

The organizers will review all submissions. You will be notified by
November 11 whether your contribution is accepted. The final program
of the workshop will be posted on the workshop webpage.


-- 

------------------------------------------------------
Dr. Ulrike von Luxburg
Data Mining Group, Fraunhofer IPSI
Dolivostrasse 15, 64293 Darmstadt, Germany
Phone: +49 6151 869-844 and +49-170-9669432
Fax: +49 6151 869-989
http://www.ipsi.fraunhofer.de/~ule
------------------------------------------------------




More information about the Connectionists mailing list