Announcing KlustaKwik 1.5

Ken Harris kdharris at andromeda.rutgers.edu
Wed Feb 19 17:55:56 EST 2003


Dear Connectionists

I am pleased to announce the release of version 1.5 of KlustaKwik, a program
for fast automatic clustering using a mixture of Gaussians model.  It can be
downloaded from http://osiris.rutgers.edu/Buzsaki/software. Further details
follow below.

Best regards,
Ken Harris.
-----------------------------------------------

KlustaKwik is a program for unsupervised classification of multidimensional
continuous data. It arose from a specific need - automatic sorting of
neuronal action potential waveforms (see KD Harris et al, Journal of
Neurophysiology 84:401-414,2000), but works for any type of data.  We needed
a program that would:

1) Fit a mixture of Gaussians with unconstrained covariance matrices
2) Automatically choose the number of mixture components
3) Be robust against noise
4) Reduce the problem of local minima
5) Run fast on large data sets (up to 100000 points, 48 dimensions)

Speed in particular was essential.  KlustaKwik is based on the CEM algorithm
of Celeux and Govaert (which is faster than the standard EM algorithm), and
also uses several tricks to improve execution speed while maintaining good
performance.  On our data, it runs at least 10 times faster than Autoclass.

The main improvement in version 1.5 is a cluster splitting feature.
KlustaKwik allows for a variable number of clusters to be fit, penalized by
AIC. The program periodically checks if splitting any cluster would improve
the overall score.  It also checks to see if deleting any cluster and
reallocating its points would improve overall score.  The splitting and
deletion features allow the program to often escape from local minima,
reducing sensitivity to the initial number of clusters, and reducing the
total number of starts needed for a data set.





More information about the Connectionists mailing list