Papers on learning metrics

Tue May 22 09:22:46 EDT 2001

Dear connectionists,

There are papers on learning metrics available at

http://www.cis.hut.fi/projects/mi/

The methods learn, based on auxiliary data, to measure distances along
relevant or important local directions in a data space. The approach
has connections to discriminative learning, distributional clustering,
information geometry, and maximization of mutual information.

So far we have incorporated the metrics into a clustering algorithm
and the SOM, and applied the methods to the analysis of gene
expression data, text documents, and financial statements of
companies.

Best regards,
Samuel Kaski

-----
Abstracts of two papers:

(1) Samuel Kaski, Janne Sinkkonen, and Jaakko Peltonen. Bankruptcy
analysis with self-organizing maps in learning metrics. IEEE
Transactions on Neural Networks, 2001. Accepted for publication.

We introduce a method for deriving a metric, locally based on the
Fisher information matrix, into the data space. A Self-Organizing Map
is computed in the new metric to explore financial statements of
enterprises. The metric measures local distances in terms of changes
in the distribution of an auxiliary random variable that reflects what
is important in the data. In this paper the variable indicates
bankruptcy within the next few years. The conditional density of the
auxiliary variable is first estimated, and the change in the estimate
resulting from local displacements in the primary data space is
measured using the Fisher information matrix. When a Self-Organizing
Map is computed in the new metric it still visualizes the data space
in a topology-preserving fashion, but represents the (local)
directions in which the probability of bankruptcy changes the most.

(2) Janne Sinkkonen and Samuel Kaski. Clustering based on conditional
distributions in an auxiliary space. Neural Computation, 2001.
Accepted for publication.

We study the problem of learning groups or categories that are local
in the continuous primary space, but homogeneous by the distributions
of an associated auxiliary random variable over a discrete auxiliary
space. Assuming variation in the auxiliary space is meaningful,
categories will emphasize similarly meaningful aspects of the primary
space. From a data set consisting of pairs of primary and auxiliary
items, the categories are learned by minimizing a Kullback-Leibler
divergence-based distortion between (implicitly estimated)
distributions of the auxiliary data, conditioned on the primary
data. Still, the categories are defined in terms of the primary
space. An on-line algorithm resembling the traditional Hebb-type
competitive learning is introduced for learning the
categories. Minimizing the distortion criterion turns out to be
equivalent to maximizing the mutual information between the categories
and the auxiliary data. In addition, connections to density estimation
and to the distributional clustering paradigm are outlined. The method
is demonstrated by clustering yeast gene expression data from DNA
chips, with biological knowledge about the functional classes of the
genes as the auxiliary data.