[Research] no lab mtg during regular hours, but we will have a special meeting this week!

Artur Dubrawski awd at cs.cmu.edu
Mon Mar 13 11:28:03 EST 2006


It is going to be on Friday, March 17th at 2pm in NSH 3305
and it will be our own Ting Liu's thesis defense!
She's done tons of cool research into nonparametric classification
and clustering (as evident from the abstract of her talk below).
Let's crowd the room and give Ting our support.

Artur



  Abstract:

      Nonparametric methods have become increasingly popular in the 
statistics communities and probabilistic AI communities. One simple and 
well-known nonparametric method is called k-nearest-neighbor or k-NN. 
Despite its simplicity,  k-NN and its variants have been successful in 
a large number of machine learning problems.
      However, k-NN and many related nonparametric methods remain 
hampered by their computational complexity. Many spatial methods, such 
as metric-trees, have been proposed to alleviate the computational 
cost, but the effectiveness of these methods decreases as the number of 
dimensions of feature vectors increases. From another direction, 
researchers are trying to develop ways to find approximate answers and 
some approximate methods show pretty good performance in a number of 
applications. However, when facing hundreds or thousands 
dimensionalities, many algorithms do not work very well in reality.
     I propose four new spatial methods for fast k-NN and its variants, 
namely KNS2, KNS3, IOC and spill-tree. The first three algorithms are 
designed to speed up k-NN classification problems, and they all share 
the same insight that finding the majority class among the k-NN of q 
need not require us to explicitly find those k-NNs. Spill-tree is 
designed for approximate-nearest-neighbor search. By adapting 
metric-trees to a more flexible data structure, spill-tree is able to 
automatically adapt to the distribution of data and it scales well even 
for huge high-dimensional data sets.
     The new methods have been applied to many real-world applications, 
including video segmentation, drug discovery and image clustering for 
1.5 billion images. Significant efficiency improvement has been 
observed in all these applications.




More information about the Autonlab-research mailing list