Tech reports from CBCL at MIT
Reza Shadmehr
reza at ai.mit.edu
Thu Feb 17 09:03:53 EST 1994
Hello,
Following is a list of recent technical reports from the Center for
Biological and Computational Learning at M.I.T. These reports are
available via anonymous ftp. (see end of this message for details)
--------------------------------
:CBCL Paper #78/AI Memo #1405
:author Amnon Shashua
:title On Geometric and Algebraic Aspects of 3D Affine and Projective
Structures from Perspective 2D Views
:date July 1993
:pages 14
:keywords visual recognition, structure from motion, projective
geometry, 3D reconstruction
We investigate the differences --- conceptually and algorithmically
--- between affine and projective frameworks for the tasks of visual
recognition and reconstruction from perspective views. It is shown
that an affine invariant exists between any view and a fixed view
chosen as a reference view. This implies that for tasks for which a
reference view can be chosen, such as in alignment schemes for visual
recognition, projective invariants are not really necessary. We then
use the affine invariant to derive new algebraic connections between
perspective views. It is shown that three perspective views of an
object are connected by certain algebraic functions of image
coordinates alone (no structure or camera geometry needs to be
involved).
--------------
:CBCL Paper #79/AI Memo #1390
:author Jose L. Marroquin and Federico Girosi
:title Some Extensions of the K-Means Algorithm for Image Segmentation
and Pattern Classification
:date January 1993
:pages 21
:keywords K-means, clustering, vector quantization, segmentation,
classification
We present some extensions to the k-means algorithm for vector
quantization that permit its efficient use in image segmentation and
pattern classification tasks. We show that by introducing a certain
set of state variables it is possible to find the representative
centers of the lower dimensional manifolds that define the boundaries
between classes; this permits one, for example, to find class
boundaries directly from sparse data or to efficiently place centers
for pattern classification. The same state variables can be used to
determine adaptively the optimal number of centers for clouds of data
with space-varying density. Some examples of the application of these
extensions are also given.
--------------
:CBCL Paper #80/AI Memo #1431
:title Example-Based Image Analysis and Synthesis
:author David Beymer, Amnon Shashua and Tomaso Poggio
:date November, 1993
:pages 21
:keywords computer graphics, networks, computer vision,
teleconferencing, image compression, computer interfaces
Image analysis and graphics synthesis can be achieved with learning
techniques using directly image examples without physically-based, 3D
models. In our technique: 1) the mapping from novel images to a vector of
``pose'' and ``expression'' parameters can be learned from a small set of
example images using a function approximation technique that we call an
analysis network; 2) the inverse mapping from input ``pose'' and
``expression'' parameters to output images can be synthesized from a small
set of example images and used to produce new images using a similar synthesis
network. The techniques described here have several applications in computer
graphics, special effects, interactive multimedia and very low bandwidth
teleconferencing.
--------------
:CBCL Paper #81/AI Memo #1432
:title Conditions for Viewpoint Dependent Face Recognition
:author Philippe G. Schyns and Heinrich H. B\"ulthoff
:date August 1993
:pages 6
:keywords face recognition, RBF Network Symmetry
Face recognition stands out as a singular case of object recognition:
although most faces are very much alike, people discriminate between many
different faces with outstanding efficiency. Even though little is known
about the mechanisms of face recognition, viewpoint dependence, a recurrent
characteristic of many research on faces, could inform algorithms and
representations. Poggio and Vetter's symmetry argument predicts that learning
only one view of a face may be sufficient for recognition, if this view allows
the computation of a symmetric, "virtual," view. More specifically, as faces
are roughly bilaterally symmetric objects, learning a side-view---which always
has a symmetric view--- should give rise to better generalization performances
that learning the frontal view. It is also predicted that among all new
views, a virtual view should be best recognized. We ran two psychophysical
experiments to test these predictions. Stimuli were views of 3D models of
laser-scanned faces. Only shape was available for recognition; all other face
cues--- texture, color, hair, etc.--- were removed from the stimuli. The first
experiment tested wqhich single views of a face give rise to best
generalization performances. The results were compatible with the symmetry
argument: face recognition from a single view is always better when the
learned view allows the computation 0f a symmetric view.
--------------
:CBCL Paper #82/AI Memo #1437
:author Reza Shadmehr and Ferdinando A. Mussa-Ivaldi
:title Geometric Structure of the Adaptive Controller of the Human Arm
:date July 1993
:pages 34
:keywords Motor learning, reaching movements, internal models, force fields,
virtual environments, generalization, motor control
The objects with which the hand interacts with may significantly change the
dynamics of the arm. How does the brain adapt control of arm movements
to this new dynamics? We show that adaptation is via composition of a
model of the task's dynamics. By exploring generalization capabilities
of this adaptation we infer some of the properties of the computational
elements with which the brain formed this model:
the elements have broad receptive fields and encode the learned
dynamics as a map structured in an intrinsic coordinate system closely related
to the geometry of the skeletomusculature. The low--level nature of
these elements suggests that they may represent a set of primitives
with which movement are represented in the CNS.
--------------
:CBCL Paper #83/AI Memo #1440
:author Michael I. Jordan and Robert A. Jacobs
:title Hierarchical Mixtures of Experts and the EM Algorithm
:date August 1993
:pages 29
:keywords supervised learning, statistics, decision trees, neural
networks
We present a tree-structured architecture for supervised learning. The
statistical model underlying the architecture is a hierarchical mixture model
in which both the mixture coefficients and the mixture components are
generalized linear models (GLIM's). Learning is treated as a maximum
likelihood problem; in particular, we present an Expectation-Maximization (EM)
algorithm for adjusting the parameters of the architecture. We also develop
an on-line learning algorithm in which the parameters are updated
incrementally. Comparative simulation results are presented in the robot
dynamics domain.
--------------
:CBCL Paper #84/AI Memo #1441
:title On the Convergence of Stochastic Iterative Dynamic Programming
Algorithms
:author Tommi Jaakkola, Michael I. Jordan and Satinder P. Singh
:date August 1993
:pages 15
:keywords reinforcement learning, stochastic approximation,
convergence, dynamic programming
Recent developments in the area of reinforcement learning have yielded a
number of new algorithms for the prediction and control of Markovian
environments. These algorithms, including the TD(lambda) algorithm of Sutton
(1988) and the Q-learning algorithm of Watkins (1989), can be motivated
heuristically as approximations to dynamic programming (DP). In this paper
we provide a rigorous proof of convergence of these DP-based learning
algorithms by relating them to the powerful techniques of stochastic
approximation theory via a new convergence theorem. The theorem establishes
a general class of convergent algorithms to which both TD (lambda) and
Q-learning belong.
--------------
:CBCL Paper #86/AI Memo #1449
:title Formalizing Triggers: A Learning Model for Finite Spaces
:author Patha Niyogi and Robert Berwick
:pages 14
:keywords language learning, parameter systems, Markov chains,
convergence times, computational learning theory
:date November 1993
In a recent seminal paper, Gibson and Wexler (1993) take important
steps to formalizing the notion of language learning in a (finite)
space whose grammars are characterized by a finite number of {\it
parameters\/}. They introduce the Triggering Learning Algorithm (TLA)
and show that even in finite space convergence may be a problem due to
local maxima. In this paper we explicitly formalize learning in finite
parameter space as a Markov structure whose states are parameter
settings. We show that this captures the dynamics of TLA completely
and allows us to explicitly compute the rates of convergence for TLA
and other variants of TLA e.g. random walk. Also included in the paper
are a corrected version of GW's central convergence proof, a list of
``problem states'' in addition to local maxima, and batch and
PAC-style learning bounds for the model.
--------------
:CBCL Paper #87/AI Memo #1458
:title Convergence Results for the EM Approach to Mixtures of Experts
Architectures
:author Michael Jordan and Xei Xu
:pages 33
:date September 1993
The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs (1993) recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992). They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent. In the current paper we provide a theoretical analysis of this algorithm. We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood. We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate. In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.
--------------
:CBCL Paper #89/AI Memo #1461
:title Face Recognition under Varying Pose
:author David J. Beymer
:pages 14
:date December 1993
:keywords computer vision, face recognition, facial feature detection,
template matching
While researchers in computer vision and pattern recognition have
worked on automatic techniques for recognizing faces for the last 20
years, most systems specialize on frontal views of the face. We
present a face recognizer that works under varying pose, the difficult
part of which is to handle face rotations in depth. Building on
successful template-based systems, our basic approach is to represent
faces with templates from multiple model views that cover different
poses from the viewing sphere. Our system has achieved a recognition
rate of 98% on a data base of 62 people containing 10 testing and 15
modelling views per person.
--------------
:CBCL Paper #90/AI Memo #1452
:title Algebraic Functions for Recognition
:author Amnon Shashua
:pages 11
:date January 1994
In the general case, a trilinear relationship between three perspective views
is shown to exist. The trilinearity result is shown to be of much practical
use in visual recognition by alignment --- yielding a direct method that cuts
through the computations of camera transformation, scene structure and epipolar
geometry. The proof of the central result may be of further interest as it
demonstrates certain regularities across homographies of the plane and
introdues new view invariants. Experiments on simulated and real image data
were conducted, including a comparative analysis with epipolar intersection
and the linear combination methods, with results indicating a greater degree
of robustness in practice and higher level of performance in re-projection
tasks.
============================
How to get a copy of a report:
The files are in compressed postscript format and are named by their
AI memo number. They are put in a directory named as the year
in which the paper was written.
Here is the procedure for ftp-ing:
unix> ftp publications.ai.mit.edu (128.52.32.22, log-in as anonymous)
ftp> cd ai-publications/1993
ftp> binary
ftp> get AIM-number.ps.Z
ftp> quit
unix> zcat AIM-number.ps.Z | lpr
Best wishes,
Reza Shadmehr
Center for Biological and Computational Learning
M. I. T.
Cambridge, MA 02139
More information about the Connectionists
mailing list