new dissertation available

Wed Jun 8 16:23:47 EDT 1994

FTP-host: cs.utexas.edu
FTP-filename: pub/neural-nets/papers/leow.diss.tar

The following dissertation is available through anonymous ftp.
It is also available in the WWW from the UTCS Neural Nets Research Group home
page http://www.cs.utexas.edu/~sirosh/nn.html under High-Level Vision
publications.
It contains 198 pages.

-----------------

VISOR: Learning Visual Schemas in Neural Networks for
Object Recognition and Scene Analysis

Wee Kheng Leow

Department of Computer Sciences
The University of Texas at Austin

Abstract

This dissertation describes a neural network system called VISOR for 
object recognition and scene analysis.
The research with VISOR aims at three general goals:
(1) to contribute to building robust, general vision systems that can be 
adapted to different applications,
(2) to contribute to a better understanding of the human visual system by 
modeling high-level perceptual phenomena, and
(3) to address several fundamental problems in neural network implementation 
of intelligent systems, including resource-limited representation, and 
representing and learning structured knowledge.
These goals lead to a schema-based approach to visual processing,
and focus the research on the representation and learning of visual schemas 
in neural networks.

Given an input scene,
VISOR focuses attention at one component of an object at a time, and
extracts the shape and position of the component.
The schemas, represented in a hierarchy of maps and connections between 
them, cooperate and compete to determine which one best matches the input.
VISOR keeps shifting attention to other parts of the scene, reusing the 
same schema representations to identify the objects one at a time, eventually 
recognizing what the scene depicts.
The recognition result consists of labels for the objects and the entire scene.
VISOR also learns to encode the schemas' spatial structures
through unsupervised modification of connection weights, and
reinforcement feedback from the environment is used to determine whether to 
adapt existing schemas or create new schemas to represent novel inputs.

VISOR's operation is based on cooperative, competitive, and parallel 
bottom-up and top-down processes that seem to underlie many human 
perceptual phenomena.
Therefore, VISOR can provide a computational account of many such phenomena,
including shifting of attention, priming effect, perceptual reversal, and 
circular reaction, and
may lead to a better understanding of how these processes are carried out 
in the human visual system.
Compared to traditional rule-based systems, VISOR shows remarkable robustness 
of recognition,
and is able to indicate the confidence of its analysis as the inputs differ 
increasingly from the schemas.
With such properties, VISOR is a promising first step towards a general 
vision system that can be used in different applications after learning the 
application-specific schemas.

---------------

The dissertation is contained in a tar file called leow.diss.tar,
which consists of 5 compressed ps files, with a total of 198 pages.
To retrieve the files, do the following:

unix> ftp cs.utexas.edu
Name: anonymous
Password: <your login id here>
ftp> binary
ftp> cd pub/neural-nets/papers
ftp> get leow.diss.tar
ftp> quit
unix> tar xvf leow.diss.tar
unix> uncompress *.ps.Z
unix> lpr -P<your_printer_name> *.ps

If the ps files are too large, you may have to use lpr with the -s option.