[Research] no lab mtg during regular hours, but we will have a special meeting this week!
Artur Dubrawski
awd at cs.cmu.edu
Mon Mar 13 11:28:38 EST 2006
It is going to be on Friday, March 17th at 2pm in NSH 3305
and it will be our own Ting Liu's thesis defense!
She's done tons of cool research into nonparametric classification
and clustering (as evident from the abstract of her talk below).
Let's crowd the room and give Ting our support.
Artur
Abstract:
Nonparametric methods have become increasingly popular in the
statistics communities and probabilistic AI communities. One simple and
well-known nonparametric method is called k-nearest-neighbor or k-NN.
Despite its simplicity, k-NN and its variants have been successful in
a large number of machine learning problems.
However, k-NN and many related nonparametric methods remain
hampered by their computational complexity. Many spatial methods, such
as metric-trees, have been proposed to alleviate the computational
cost, but the effectiveness of these methods decreases as the number of
dimensions of feature vectors increases. From another direction,
researchers are trying to develop ways to find approximate answers and
some approximate methods show pretty good performance in a number of
applications. However, when facing hundreds or thousands
dimensionalities, many algorithms do not work very well in reality.
I propose four new spatial methods for fast k-NN and its variants,
namely KNS2, KNS3, IOC and spill-tree. The first three algorithms are
designed to speed up k-NN classification problems, and they all share
the same insight that finding the majority class among the k-NN of q
need not require us to explicitly find those k-NNs. Spill-tree is
designed for approximate-nearest-neighbor search. By adapting
metric-trees to a more flexible data structure, spill-tree is able to
automatically adapt to the distribution of data and it scales well even
for huge high-dimensional data sets.
The new methods have been applied to many real-world applications,
including video segmentation, drug discovery and image clustering for
1.5 billion images. Significant efficiency improvement has been
observed in all these applications.
More information about the Autonlab-research
mailing list