Vision (What's wrong with Marr's model)

Tue Jan 8 08:28:18 EST 1991

Frank Smieja <gmdzi!smieja at relay.EU.net>  asks about recent objections
to Marr's theory of vision.  Here is my opinion.

David Marr's book  is  delightfully lucid and beautifully illustrated,
and  I thoroughly agree  with his   analysis of   the three levels  of
modeling.   Nevertheless I  believe that  there are two fatal flaws in
the philosophy of his vision model.

The first fatal flaw is the feedforward nature of this model, from the
raw   primal sketch   through  the 2&1/2  D sketch   to the  3-D model
representation.  Decades  of   "image   understanding"  and   "pattern
recognition" research have shown us that such  feed-forward processing
has a great deal of difficulty with natural imagery.  The problem lies
in the fact that  whenever "feature extraction" or "image enhancement"
are  performed, they  recognize  or enhance some features   but in the
process they  inevitably degrade others or  introduce artifacts.  With
successive levels  of  processing the artifacts accumulate and combine
until  at the  highest  levels of  processing  there   is no  way   to
distinguish the real features  from  the artifacts.  Even in  our  own
vision, with  all its sophistication,  we occasionally see things that
are not there.  The real problem with such  feedforward models is that
once  a stage of processing  is performed,  it  is never  reviewed  or
reconsidered.

Grossberg suggests how nature solves this problem,  by use of top-down
feedback.  Whenever a  feature is recognized at  any level, a  copy of
that feature is   passed back  DOWN  the  processing  hierarchy in  an
attempt to improve the match at  the lower levels.   If for instance a
set of disconnected edges suggest a larger continuous edge to a higher
level, that "hypothesis" is passed down to the local edge detectors to
see if they can  find  supporting evidence  for the missing  pieces by
locally lowering  their detection   thresholds.  If  a  faint edge  is
indeed found where expected, it is enhanced by  resonant feedback.  If
however there  is strong  local opposition to  the hypothesis then the
enhancement is  NOT performed.  This is the  cooperative / competitive
loop  of the  BCS  model  which  serves to   disambiguate the image by
simultaneous matching at multiple  levels.  This explains how, when we
occasionally see something that isn't there, we see  it in such detail
until at a  higher  level   a  conflict  occurs, at  which   time  the
apparition "pops" back to  being something   more consistant with  the
global picture.

The second fatal flaw in Marr's vision model is  related to the first.
In the finest  tradition of "AI", Marr's  3-D  model  is  an  abstract
symbolic representation of the visual input, totally divorced from the
lower level stimuli which  generated  it.  The   great advance of  the
connectionist perspective  is that manipulation of high  level symbols
is  meaningless  without regard to   the hierarchy   of   lower  level
representations to  which they are  attached.   When  you look at your
grandmother for instance, some high level node (or nodes) must fire in
recognition.  At the same time  however you are  very conscious of the
low  level details of  the image, the  strands   of hair, the wrinkles
around the eyes etc.  In fact, even in her absence the high level node
conjurs up such low level features, without which that node would have
no real meaning.  It is only because that node rests on the pinacle of
a hierarchy of  such  lower level  nodes   that  it has  a  meaning of
"grandmother".  The perfectly   gramatical sentence  "Grandmother   is
purple" is only recognized  as nonsense when  visualized at the lowest
level, illustrating that  logical processing  cannot be separated from
low level visualization.

Although I recognize Marr's valuable and historic  contribution to the
understanding of vision, I  believe that in this  fast moving field we
have  already progressed    to  new insights and  radically  different
models.  I would be delighted to  provide further information by email
to   interested parties   on   Grossberg's    BCS model,  and my   own
implementation of it for image processing applications.

(O)((O))(((O)))((((O))))(((((O)))))(((((O)))))((((O))))(((O)))((O))(O)
(O)((O))(((               slehar at park.bu.edu               )))((O))(O)
(O)((O))(((    Steve Lehar Boston University Boston MA     )))((O))(O)
(O)((O))(((    (617) 424-7035 (H)   (617) 353-6741 (W)     )))((O))(O)
(O)((O))(((O)))((((O))))(((((O)))))(((((O)))))((((O))))(((O)))((O))(O)