Thesis on data exploration with SOMs available
Sami Kaski
sami at guillotin.hut.fi
Fri Apr 4 07:56:49 EST 1997
The following Dr.Tech. thesis is available at
http://nucleus.hut.fi/~sami/thesis/thesis.html (html-version)
http://nucleus.hut.fi/~sami/thesis.ps.gz (compressed postscript, 300K)
http://nucleus.hut.fi/~sami/thesis.ps (postscript, 2M)
The articles that belong to the thesis can be accessed through the page
http://nucleus.hut.fi/~sami/thesis/node3.html
---------------------------------------------------------------
Data Exploration Using Self-Organizing Maps
Samuel Kaski
Helsinki University of Technology
Neural Networks Research Centre
P.O.Box 2200 (Rakentajanaukio 2C)
FIN-02015 HUT, Finland
Finding structures in vast multidimensional data sets, be they
measurement data, statistics, or textual documents, is difficult and
time-consuming. Interesting, novel relations between the data items
may be hidden in the data. The self-organizing map (SOM) algorithm of
Kohonen can be used to aid the exploration: the structures in the data
sets can be illustrated on special map displays.
In this work, the methodology of using SOMs for exploratory data
analysis or data mining is reviewed and developed further. The
properties of the maps are compared with the properties of related
methods intended for visualizing high-dimensional multivariate data
sets. In a set of case studies the SOM algorithm is applied to
analyzing electroencephalograms, to illustrating structures of the
standard of living in the world, and to organizing full-text document
collections.
Measures are proposed for evaluating the quality of different types of
maps in representing a given data set, and for measuring the
robustness of the illustrations the maps produce. The same measures
may also be used for comparing the knowledge that different maps
represent.
Feature extraction must in general be tailored to the application, as
is done in the case studies. There exists, however, an algorithm
called the adaptive-subspace self-organizing map, recently developed
by Kohonen, which may be of help. It extracts invariant features
automatically from a data set. The algorithm is here characterized in
terms of an objective function, and demonstrated to be able to
identify input patterns subject to different transformations.
Moreover, it could also aid in feature exploration: the kernels that
the algorithm creates to achieve invariance can be illustrated on map
displays similar to those that are used for illustrating the data
sets.
More information about the Connectionists
mailing list