Face Image Analysis by Unsupervised Learning

Marian Stewart Bartlett mbartlet at san.rr.com
Mon Jul 16 19:42:59 EDT 2001


I am pleased to announce the following new book:

Face Image Analysis by Unsupervised Learning, by Marian Stewart
Bartlett. Foreword by Terrence J. Sejnowski. Kluwer International Series
on Engineering and Computer Science, V. 612. Boston: Kluwer Academic
Publishers, 2001.

Please see http://inc.ucsd.edu/~marni for more information. The book can
be ordered at http://www.wkap.nl/book.htm/0-7923-7348-0.
 
Book Jacket:

Face Image Analysis by Unsupervised Learning explores adaptive
approaches to face image analysis. It draws upon principles of
unsupervised learning and information theory to adapt processing to the
immediate task environment. In contrast to more traditional approaches
to image analysis in which relevant structure is determined in advance
and extracted using hand-engineered techniques, [this book] explores
methods that have roots in biological vision and/or learn about the
image structure directly from the image ensemble. Particular attention
is paid to unsupervised learning techniques for encoding the statistical
dependencies in the image ensemble. 

The first part of this volume reviews unsupervised learning, information
theory, independent component analysis, and their relation to biological
vision. Next, a face image representation using independent component
analysis (ICA) is developed, which is an unsupervised learning technique
based on optimal information transfer between neurons. The ICA
representation is compared to a number of other face representations
including eigenfaces and Gabor wavelets on tasks of identity recognition
and expression analysis. Finally, methods for learning features that are
robust to changes in viewpoint and lighting are presented. These studies
provide evidence that encoding input dependencies through unsupervised
learning is an effective strategy for face recognition. 

Face Image Analysis by Unsupervised Learning is suitable as a secondary
text for a graduate level course, and as a reference for researchers and
practioners in industry. 

"Marian Bartlett's comparison of ICA with other algorithms on the
recognition of facial expressions is perhaps the most thorough analysis
we have of the strengths and limits of ICA as a preprocessing stage for
pattern recognition." 

- T.J. Sejnowski, The Salk Institute 

Table of Contents: http://www.cnl.salk.edu/~marni/contents.html

1. SUMMARY 

----------------------------------------------------------------

2. INTRODUCTION 
   1. Unsupervised learning in object representations 
      1. Generative models 
      2. Redundancy reduction as an organizational principle
      3. Information theory 
      4. Redundancy reduction in the visual system 
      5. Principal component analysis 
      6. Hebbian learning 
      7. Explicit discovery of statistical dependencies 
   2. Independent component analysis 
      1. Decorrelation versus independence 
      2. Information maximization learning rule 
      3. Relation of sparse coding to independence
   3. Unsupervised learning in visual development 
      1. Learning input dependencies: Biological evidence 
      2. Models of receptive field development based on 
        correlation sensitive learning mechanisms 
   4. Learning invariances from temporal dependencies
      1. Computational models 
      2. Temporal association in psychophysics and biology 
   5. Computational Algorithms for Recognizing Faces in Images 

----------------------------------------------------------------

3. INDEPENDENT COMPONENT REPRESENTATIONS FOR FACE RECOGNITION 
   1. Introduction 
      1. Independent component analysis (ICA)
      2. Image data 
   2. Statistically independent basis images 
      1. Image representation: Architecture 1 
      2. Implementation: Architecture 1 
      3. Results: Architecture 1 
   3. A factorial face code 
      1. Independence in face space versuspixel space 
      2. Image representation: Architecture 2 
      3. Implementation: Architecture 2 
      4. Results: Architecture 2 
   4. Examination of the ICA Representations 
      1. Mutual information 
      2. Sparseness 
   5. Combined ICA recognition system 
   6. Discussion 

----------------------------------------------------------------

4. AUTOMATED FACIAL EXPRESSION ANALYSIS 
   1. Review of other systems 
      1. Motion-based approaches 
      2. Feature-based approaches 
      3. Model-based techniques 
      4. Holistic analysis 
   2. What is needed 
   3. The Facial Action Coding System (FACS) 
   4. Detection of deceit 
   5. Overview of approach 

----------------------------------------------------------------

5. IMAGE REPRESENTATIONS FOR FACIAL EXPRESSION ANALYSIS: 
  COMPARITIVE STUDY I 
   1. Image database 
   2. Image analysis methods 
      1. Holistic spatial analysis 
      2. Feature measurement 
      3. Optic flow 
      4. Human subjects 
   3. Results 
      1. Hybrid system 
      2. Error analysis 
   4. Discussion 

----------------------------------------------------------------

6. IMAGE REPRESENTATIONS FOR FACIAL EXPRESSION ANALYSIS: 
  COMPARITIVE STUDY II 
   1. Introduction 
   2. Image database 
   3. Optic flow analysis 
      1. Local velocity extraction 
      2. Local smoothing 
      3. Classification procedure 
   4. Holistic analysis 
      1. Principal component analysis: ``EigenActions'' 
      2. Local feature analysis (LFA) 
      3. ``FisherActions'' 
      4. Independent component analysis 
   5. Local representations 
      1. Local PCA 
      2. Gabor wavelet representation 
      3. PCA jets 
   6. Human subjects 
   7. Discussion 
   8. Conclusions 

----------------------------------------------------------------

7. LEARNING VIEWPOINT INVARIANT REPRESENTATIONS OF FACES 
   1. Introduction 
   2. Simulation 
      1. Model architecture 
      2. Competitive Hebbian learning of temporal relations 
      3. Temporal association in an attractor network 
      4. Simulation results 
   3. Discussion 

----------------------------------------------------------------

8. CONCLUSIONS AND FUTURE DIRECTIONS 

References 
Index 

----------------------------------------------------------------

Foreword by Terrence J. Sejnowski

Computers are good at many things that we are not good at, like sorting
a long list of numbers and calculating the trajectory of a rocket, but
they are not at all good at things that we do easily and without much
thought, like seeing and hearing. In the early days of computers, it was
not obvious that vision was a difficult
problem. Today, despite great advances in speed, computers are still
limited in what they can pick out from a complex scene and recognize.
Some progress has been made, particularly in the area of face
processing, which is the subject of this monograph.

Faces are dynamic objects that change shape rapidly, on the time scale
of seconds during changes of expression, and more slowly over time as we
age. We use faces to identify individuals, and we rely of facial
expressions to assess feelings and get feedback on the how well we are
communicating. It is disconcerting to talk with someone whose face is a
mask. If we want computers to communicate with us, they will have to
learn how to make and assess facial expressions. A method for automating
the analysis of facial expressions would be useful in many psychological
and psychiatric studies as well as have great practical benefit in
business and forensics.

The research in this monograph arose through a collaboration with Paul
Ekman, which began 10 years ago. Dr. Beatrice Golomb, then a
postdoctoral fellow in my laboratory, had developed a neural network
called Sexnet, which could distinguish the sex of person from a
photograph of their face (Golomb et al. 1991). This is a difficult
problem since no single feature can be used to reliably make this
judgment, but humans are quite good at it. This project was the starting
point for a major research effort, funded by the National Science
Foundation, to automate the Facial Action Coding System (FACS),
developed by Ekman and Friesen (1978). Joseph Hager made a major
contribution in the early stages of this research by obtaining a high
quality set of videos of experts who could produce each facial action.
Without such a large dataset of labeled images of each action it would
not have been possible to use neural network learning algorithms.

In this monograph, Dr. Marian Stewart Bartlett presents the results of
her doctoral research into automating the analysis of facial
expressions.  When she began her research, one of the methods that she
used to study the FACS dataset, a new algorithm for Independent
Component Analysis (ICA), had recently been developed, so she was
pioneering not only facial analysis of expressions, but also the initial
exploration of ICA. Her comparison of ICA with other algorithms on the
recognition of facial expressions is perhaps the most thorough analysis
we have of the strengths and limits ICA.

Much of human learning is unsupervised; that is, without the benefit of
an explicit teacher. The goal of unsupervised learning is to discover
the underlying probability distributions of sensory inputs (Hinton &
Sejnowski, 1999). Or as Yogi Berra once said, "You can observe a lot
just by watchin'." The identification of an object in an image nearly
always depends on the physical causes of the image rather than the pixel
intensities.  Unsupervised learning can be used to solve the difficult
problem of extracting the underlying causes, and decisions about
responses can be left to a supervised learning algorithm that takes the
underlying causes rather than the raw sensory data as its inputs.

Several types of input representation are compared here on the problem
of discriminating between facial actions. Perhaps the most intriguing
result is that two different input representations, Gabor filters and a
version of ICA, both gave excellent results that were roughly comparable
with trained humans. The responses of simple cells in the first stage of
processing in the visual cortex of primates are similar to those of
Gabor filters, which form a roughly statistically independent set of
basis vectors over a wide range of natural images (Bell & Sejnowski,
1997). The disadvantage of Gabor filters from an image processing
perspective is that they are computationally intensive. The ICA filters,
in contrast, are much more computationally efficient, since they were
optimized for faces. The disadvantage is that they are too specialized a
basis set and could not be used for other problems in visual pattern
discrimination.

One of the reasons why facial analysis is such a difficult problem in
visual pattern recognition is the great variability in the images of
faces.  Lighting conditions may vary greatly and the size and
orientation of the face make the problem even more challenging. The
differences between the same face under these different conditions are
much greater than the differences between the faces of different
individuals. Dr. Bartlett takes up this challenge in Chapter 7 and shows
that learning algorithms may also be used to help overcome some of these
difficulties.

The results reported here form the foundation for future studies on face
analysis, and the same methodology can be applied toward other problems
in visual recognition. Although there may be something special about
faces, we may have learned a more general lesson about the problem of
discriminating between similar complex shapes: A few good filters are
all you need, but each class of object may need a quite different set
for optimal discrimination.

-- 
 
Marian Stewart Bartlett, Ph.D.          marni at salk.edu
Institute for Neural Computation, 0523  http://inc.ucsd.edu/~marni
University of California, San Diego     phone: (858) 534-7368
La Jolla, CA 92093-0523                 fax: (858) 534-2014




More information about the Connectionists mailing list