Tech reports from CBCL at MIT

Thu Feb 17 09:03:53 EST 1994

Hello,

Following is a list of recent technical reports from the Center for
Biological and Computational Learning at M.I.T.  These reports are 
available via anonymous ftp. (see end of this message for details)

--------------------------------
:CBCL Paper #78/AI Memo #1405
:author Amnon Shashua
:title On Geometric and Algebraic Aspects of 3D Affine and Projective
Structures from Perspective 2D Views
:date July 1993
:pages 14
:keywords visual recognition, structure from motion, projective
geometry, 3D reconstruction

We investigate the differences --- conceptually and algorithmically
--- between affine and projective frameworks for the tasks of visual
recognition and reconstruction from perspective views.  It is shown
that an affine invariant exists between any view and a fixed view
chosen as a reference view. This implies that for tasks for which a
reference view can be chosen, such as in alignment schemes for visual
recognition, projective invariants are not really necessary.  We then
use the affine invariant to derive new algebraic connections between
perspective views. It is shown that three perspective views of an
object are connected by certain algebraic functions of image
coordinates alone (no structure or camera geometry needs to be
involved).

--------------
:CBCL Paper #79/AI Memo #1390
:author  Jose L. Marroquin and Federico Girosi
:title Some Extensions of the K-Means Algorithm for Image Segmentation
and Pattern Classification
:date  January 1993
:pages 21
:keywords K-means, clustering, vector quantization, segmentation,
classification

We present some extensions to the k-means algorithm for vector
quantization that permit its efficient use in image segmentation and
pattern classification tasks. We show that by introducing a certain
set of state variables it is possible to find the representative
centers of the lower dimensional manifolds that define the boundaries
between classes; this permits one, for example, to find class
boundaries directly from sparse data or to efficiently place centers
for pattern classification. The same state variables can be used to
determine adaptively the optimal number of centers for clouds of data
with space-varying density. Some examples of the application of these
extensions are also given.

--------------
:CBCL Paper #80/AI Memo #1431
:title Example-Based Image Analysis and Synthesis
:author David Beymer, Amnon Shashua and Tomaso Poggio
:date November, 1993
:pages 21
:keywords computer graphics, networks, computer vision,
teleconferencing, image compression, computer interfaces 

Image analysis and graphics synthesis can be achieved with learning
techniques using directly image examples without physically-based, 3D
models.  In our technique:  1) the mapping from novel images to a vector of 
``pose'' and ``expression'' parameters can be learned from a small set of 
example images using a function approximation technique that we call an 
analysis network; 2) the inverse mapping from  input ``pose'' and 
``expression'' parameters to output images can be synthesized from a small
set of example images and used to produce new images using a similar synthesis 
network.  The techniques described here have several applications in computer
graphics, special effects, interactive multimedia and very low bandwidth 
teleconferencing.

--------------
:CBCL Paper #81/AI Memo #1432
:title Conditions for Viewpoint Dependent Face Recognition
:author Philippe G. Schyns and Heinrich H. B\"ulthoff
:date August 1993
:pages 6
:keywords face recognition, RBF Network Symmetry

Face recognition stands out as a singular case of object recognition:  
although most faces are very much alike, people discriminate between many 
different faces with outstanding efficiency.  Even though little is known 
about the mechanisms of face recognition, viewpoint dependence, a recurrent 
characteristic of many research on faces, could inform algorithms and 
representations.  Poggio and Vetter's symmetry argument predicts that learning 
only one view of a face may be sufficient for recognition, if this view allows 
the computation of a symmetric, "virtual," view.  More specifically, as faces 
are roughly bilaterally symmetric objects, learning a side-view---which always 
has a symmetric view--- should give rise to better generalization performances 
that learning the frontal view.  It is also predicted that among all new 
views, a virtual view should be best recognized.  We ran two psychophysical 
experiments to test these predictions.  Stimuli were views of 3D models of 
laser-scanned faces.  Only shape was available for recognition; all other face 
cues--- texture, color, hair, etc.--- were removed from the stimuli.  The first
 experiment tested wqhich single views of a face give rise to best 
generalization performances.  The results were compatible with the symmetry 
argument: face recognition from a single view is always better when the 
learned view allows the computation 0f a symmetric view.

--------------
:CBCL Paper #82/AI Memo #1437
:author Reza Shadmehr and Ferdinando A. Mussa-Ivaldi
:title Geometric Structure of the Adaptive Controller of the Human Arm
:date  July 1993
:pages 34
:keywords Motor learning, reaching movements, internal models, force fields, 
virtual environments, generalization, motor control

The objects with which the hand interacts with may significantly change the 
dynamics of the arm.  How does the brain adapt control of arm movements
to this new dynamics?  We show that adaptation is via composition of a 
model of the task's dynamics.  By exploring generalization capabilities 
of this adaptation we infer some of the properties of the computational 
elements with which the brain formed this model:
the elements have broad receptive fields and encode the learned 
dynamics as a map structured in an intrinsic coordinate system closely related 
to the geometry of the skeletomusculature.  The low--level nature of 
these elements suggests that they may represent a set of primitives 
with which movement are represented in the CNS.

--------------
:CBCL Paper #83/AI Memo #1440
:author Michael I. Jordan and Robert A. Jacobs
:title Hierarchical Mixtures of Experts and the EM Algorithm
:date August 1993
:pages 29
:keywords supervised learning, statistics, decision trees, neural
networks

We present a tree-structured architecture for supervised learning.  The 
statistical model underlying the architecture is a hierarchical mixture model 
in which both the mixture coefficients and the mixture components are 
generalized linear models (GLIM's).  Learning is treated as a maximum 
likelihood problem; in particular, we present an Expectation-Maximization (EM) 
algorithm for adjusting the parameters of the architecture.  We also develop 
an on-line learning algorithm in which the parameters are updated 
incrementally.  Comparative simulation results are presented in the robot 
dynamics domain.

--------------
:CBCL Paper #84/AI Memo #1441
:title On the Convergence of Stochastic Iterative Dynamic Programming 
Algorithms
:author Tommi Jaakkola, Michael I. Jordan and Satinder P. Singh
:date August 1993
:pages 15
:keywords reinforcement learning, stochastic approximation,
convergence, dynamic programming

Recent developments in the area of reinforcement learning have yielded a 
number of new algorithms for the prediction and control of Markovian 
environments.  These algorithms, including the TD(lambda) algorithm of Sutton 
(1988) and the Q-learning algorithm of Watkins (1989), can be motivated 
heuristically as approximations to dynamic programming (DP).  In this paper 
we provide a rigorous proof of convergence of these DP-based learning 
algorithms by relating them to the powerful techniques of stochastic 
approximation theory via a new convergence theorem.  The theorem establishes 
a general class of convergent algorithms to which both TD (lambda) and 
Q-learning belong.

--------------
:CBCL Paper #86/AI Memo #1449
:title Formalizing Triggers:  A Learning Model for Finite Spaces
:author Patha Niyogi and Robert Berwick
:pages 14
:keywords language learning, parameter systems, Markov chains,
convergence times, computational learning theory
:date November 1993

In a recent seminal paper, Gibson and Wexler (1993) take important
steps to formalizing the notion of language learning in a (finite)
space whose grammars are characterized by a finite number of {\it
parameters\/}. They introduce the Triggering Learning Algorithm (TLA)
and show that even in finite space convergence may be a problem due to
local maxima. In this paper we explicitly formalize learning in finite
parameter space as a Markov structure whose states are parameter
settings. We show that this captures the dynamics of TLA completely
and allows us to explicitly compute the rates of convergence for TLA
and other variants of TLA e.g. random walk. Also included in the paper
are a corrected version of GW's central convergence proof, a list of
``problem states'' in addition to local maxima, and batch and
PAC-style learning bounds for the model.

--------------
:CBCL Paper #87/AI Memo #1458
:title Convergence Results for the EM Approach to Mixtures of Experts 
Architectures
:author Michael Jordan and Xei Xu
:pages 33
:date September 1993

The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation.  Jordan and Jacobs (1993) recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992).  They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent.  In the current paper we provide a theoretical analysis of this algorithm.  We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood.  We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate.  In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.

--------------
:CBCL Paper #89/AI Memo #1461
:title Face Recognition under Varying Pose
:author David J. Beymer
:pages 14
:date December 1993
:keywords computer vision, face recognition, facial feature detection,
template matching

While researchers in computer vision and pattern recognition have
worked on automatic techniques for recognizing faces for the last 20
years, most systems specialize on frontal views of the face.  We
present a face recognizer that works under varying pose, the difficult
part of which is to handle face rotations in depth.  Building on
successful template-based systems, our basic approach is to represent
faces with templates from multiple model views that cover different
poses from the viewing sphere.  Our system has achieved a recognition
rate of 98% on a data base of 62 people containing 10 testing and 15
modelling views per person.

--------------
:CBCL Paper #90/AI Memo #1452
:title Algebraic Functions for Recognition
:author Amnon Shashua
:pages 11
:date January 1994

In the general case, a trilinear relationship between three perspective views
is shown to exist.  The trilinearity result is shown to be of much practical
use in visual recognition by alignment --- yielding a direct method that cuts
through the computations of camera transformation, scene structure and epipolar
geometry.  The proof of the central result may be of further interest as it
demonstrates certain regularities across homographies of the plane and 
introdues new view invariants.  Experiments on simulated and real image data
were conducted, including a comparative analysis with epipolar intersection
and the linear combination methods, with results indicating a greater degree
of robustness in practice and higher level of performance in re-projection 
tasks.

============================

How to get a copy of a report:

The files are in compressed postscript format and are named by their 
AI memo number.  They are put in a directory named as the year
in which the paper was written.  

Here is the procedure for ftp-ing:

unix> ftp publications.ai.mit.edu (128.52.32.22, log-in as anonymous)
ftp>  cd ai-publications/1993
ftp>  binary
ftp>  get AIM-number.ps.Z
ftp>  quit
unix> zcat AIM-number.ps.Z | lpr

Best wishes,

Reza Shadmehr
Center for Biological and Computational Learning
M. I. T.
Cambridge, MA 02139