new dissertation available
Wee Kheng Leow
leow at cs.utexas.edu
Wed Jun 8 16:23:47 EDT 1994
FTP-host: cs.utexas.edu
FTP-filename: pub/neural-nets/papers/leow.diss.tar
The following dissertation is available through anonymous ftp.
It is also available in the WWW from the UTCS Neural Nets Research Group home
page http://www.cs.utexas.edu/~sirosh/nn.html under High-Level Vision
publications.
It contains 198 pages.
-----------------
VISOR: Learning Visual Schemas in Neural Networks for
Object Recognition and Scene Analysis
Wee Kheng Leow
Department of Computer Sciences
The University of Texas at Austin
Abstract
This dissertation describes a neural network system called VISOR for
object recognition and scene analysis.
The research with VISOR aims at three general goals:
(1) to contribute to building robust, general vision systems that can be
adapted to different applications,
(2) to contribute to a better understanding of the human visual system by
modeling high-level perceptual phenomena, and
(3) to address several fundamental problems in neural network implementation
of intelligent systems, including resource-limited representation, and
representing and learning structured knowledge.
These goals lead to a schema-based approach to visual processing,
and focus the research on the representation and learning of visual schemas
in neural networks.
Given an input scene,
VISOR focuses attention at one component of an object at a time, and
extracts the shape and position of the component.
The schemas, represented in a hierarchy of maps and connections between
them, cooperate and compete to determine which one best matches the input.
VISOR keeps shifting attention to other parts of the scene, reusing the
same schema representations to identify the objects one at a time, eventually
recognizing what the scene depicts.
The recognition result consists of labels for the objects and the entire scene.
VISOR also learns to encode the schemas' spatial structures
through unsupervised modification of connection weights, and
reinforcement feedback from the environment is used to determine whether to
adapt existing schemas or create new schemas to represent novel inputs.
VISOR's operation is based on cooperative, competitive, and parallel
bottom-up and top-down processes that seem to underlie many human
perceptual phenomena.
Therefore, VISOR can provide a computational account of many such phenomena,
including shifting of attention, priming effect, perceptual reversal, and
circular reaction, and
may lead to a better understanding of how these processes are carried out
in the human visual system.
Compared to traditional rule-based systems, VISOR shows remarkable robustness
of recognition,
and is able to indicate the confidence of its analysis as the inputs differ
increasingly from the schemas.
With such properties, VISOR is a promising first step towards a general
vision system that can be used in different applications after learning the
application-specific schemas.
---------------
The dissertation is contained in a tar file called leow.diss.tar,
which consists of 5 compressed ps files, with a total of 198 pages.
To retrieve the files, do the following:
unix> ftp cs.utexas.edu
Name: anonymous
Password: <your login id here>
ftp> binary
ftp> cd pub/neural-nets/papers
ftp> get leow.diss.tar
ftp> quit
unix> tar xvf leow.diss.tar
unix> uncompress *.ps.Z
unix> lpr -P<your_printer_name> *.ps
If the ps files are too large, you may have to use lpr with the -s option.
More information about the Connectionists
mailing list