Summary (long): pattern recognition comparisons

Thu Aug 30 12:30:30 EDT 1990

A quick response to the responses to my comments on the gap between nets and
computer vision (I've been out of town, and now trying to catch up on mail):

I certainly wasn't suggesting that the number of input nodes matters, but
simply that complex images must be resolved in enough detail to be
recognizable.  Gary Cottrell's 64x64 images may be adequate for faces (tho
I suspect finer resolution is needed as more people are used, with many
different expressions (much less rotations) for each).  But the point is that
complete connectivity from layer to layer needs O(N**2) links, and the fact that
"a preprocessing step" reduced the 64x64 array to 80 nodes is a good example of
how complete connectivity dominates.  Once the preprocessor is handled by the
net itself it will either need too many links or have ad hoc structure.
It's surely better to use partial connectivity (e.g., local - which is a very
general assumption motivated by physical interactions and brain structure)
than some inevitably ad hoc preprocessing steps of unknown value.
  Evaluation is tedious and unrewarding, but without it we simply can't make
claims or compare systems.  I'm not arguing against nets - to the contrary,
I think that highly parallel nets are the only possibility for handling really
hard problems like recognition, language handling, and reasoning.  But they'll
need much better structure (or the ability to evolve and generate needed
structures).  And I was asking for objective evidence that 3-layer feed-forward
nets with links between all nodes in adjacent layers actually handle complex
images better than some of the large and powerful computer vision systems.
True - we know that in theory they can do anything.  But that's no better than
knowing that random search through the space of all Turing machine programs
can do anything.

Len Uhr