TR available on "Mixture model for image understanding and the EM algorithm"

Thu Apr 13 13:10:46 EDT 1995

Re: the paper "Mixture model for image understanding and
the EM algorithm" recently announced by Shotaro Akaho.

I believe the idea of using a mixture model parameterized by scale and
translation parameters is very similar to the "constrained mixture
models" we have been using for character recognition for some years.

Basically, the idea is to create a template out of a number of
Gaussians; in the character recognition case the Gaussians will be
spaced out along the stroke; each one is like a spray-can ink
generator. This template can be scaled or translated by applying the
transformation to each of the Gaussian centres.

In fact our model went further than this in that it allowed deformable
templates, so that the Gaussians could be moved away from their "home
locations" in the object-based frame, at a cost.  We also allowed a
full 2x2 affine transformation plus translation rather than just
translation and scaling,  and used a "noise model" to reject
outlier/noise data points.

We used a method based on the EM algorithm for fitting these templates to 
data. For the non-deformable case there is (as Akaho points out) a direct
EM algorithm. This is mentioned in Appendix B of my thesis.

The fitting algorithm converged to the desired solution (i.e.
one which looks correct -- this is the advantage of working in 2d :-))
in around 99% of the cases when only a single character was present.
We have run some experiments with two templates and two objects and 
found that we only got convergence to the desired solution if the
starting point was rather close to it. 

I should also comment that Eric Mjolsness and his colleagues have 
been doing some similar work, although they have used an explicit match-
matrix to encode the correspondence between datapoints and model points;
the mixture model can be obtained by integrating out one of the
row and column constraints.
[ref: eg. Chien-Ping Lu and Eric Mjolsness, NIPS 6, 985-992; also NIPS 7
(forthcoming), and earlier work back to a TR YALEU-DCS-TR-854 in 1990]

Our Refs:

[early paper]

@incollection   (hinton-williams-revow-92,
author  =       "Hinton, G. ~E. and Williams, C. ~K. ~I. and Revow, M. ~D.",
title   =       "Adaptive elastic models for hand-printed character
recognition",
editor  =       "J. E. Moody and S. J. Hanson and R. P. Lippmann",
booktitle=      "Advances in Neural Information Processing Systems 4",
year    =       "1992",
publisher=      "Morgan Kauffmann",
place   =       "San Mateo CA."
)

[up to date work]

* a paper submitted to IEEE Transactions on Pattern Analysis and Machine 
Intelligence in 1994: pami.ps.Z (36 pages, 0.3 Mb)

* my PhD thesis: thesis.ps.Z (95 pages, 0.6 Mb)

both available from the Toronto ftp server

unix> ftp ftp.cs.toronto.edu  (or 128.100.3.6, or 128.100.1.105)

   (log in as "anonymous", e-mail address as password)

  ftp> binary
  ftp> cd pub/ckiw
  ftp> get thesis.ps.Z
  ftp> get pami.ps.Z
  ftp> quit

Regards

Chris Williams
c.k.i.williams at aston.ac.uk

Department of Computer Science and Applied Maths
Aston University
Birmingham B4 7ET
England

tel: +44 121 359 3621 x 4382  fax: +44 121 333 6215