Distributed Representations

Tom Dietterich tgd at turing.cs.orst.edu
Fri Jun 7 12:45:26 EDT 1991


   Date: Thu 6 Jun 91 22:02:27-PDT
   From: Ken Laws <LAWS at ai.sri.com>
   Mail-System-Version: <SUN-MM(229)+TOPSLIB(128)@AI.SRI.COM>


   I'm not sure this is the same concept, but there were several
   papers at the last IJCAI showing that neural networks worked
   better than decision trees.  The reason seemed to be that
   neural decisions depend on all the data all the time, whereas
   local decisions use only part of the data at one time.

This is not the same concept at all.  You are worrying about locality
in the input space, whereas distributed representations usually
concern (lack of) locality in the output space or in some intermediate
representation. I have applied decision trees to learn distributed
representations of output classes, and in all of my experiments,
the distributed representation performs better than learning either
one large tree (to make a k-way discrimination) or learning k separate
trees.  I believe this is because a distributed representation is able
to correct for errors made in learning any individual output unit.
The paper "dietterich.error-correcting.ps.Z" in the neuroprose archive
presents experimental support for this claim. 

   I've never put much stock in the military reliability claims.
   A bullet through the chip or its power supply will be a real
   challenge.  Noise tolerance is important, though, and I suspect
   that neural systems really are more tolerant.

It isn't a neural vs. non-neural issue:  distributed representations
are more redundant, and hence, more resistant to (local) damage.
Noise tolerance is also not a neural vs. non-neural issue.  To achieve
noise tolerance, you must control over-fitting.  There are many ways
to do this:  low-dimensional representations, smoothness assumptions,
minimum description length methods, cross-validation, etc.

   Terry Sejnowski's original NETtalk work has always bothered me.
   He used a neural network to set up a mapping from an input
   bit string to 27 output bits, if I recall.  I have never seen
   a "control" experiment showing similar results for 27 separate
   discriminant analyses, or for a single multivariate discriminant.
   I suspect that the results would be far better.  The wonder of
   the net was not that it worked so well, but that it worked
   at all.

I think you should perform these studies before you make such claims.
I myself doubt them very much, because the NETtalk task violates the
assumptions of discriminant analysis.  In my experience,
backpropagation works quite well on the NETtalk task.  We have found
that Wolpert's HERBIE (which is a kind of weighted 4-nearest-neighbor
method) and generalized radial basis functions do better than
backpropagation, but everything else we have tried does worse
(decision trees, perceptrons, Fringe).

   I have come to believe strongly in "coarse-coded" representations,
   which are somewhat distributed.  (I have no insight as to whether
   fully distributed representations might be even better.  I suspect
   that their power is similar to adding quadratic and higher-order
   terms to a standard statistical model.)  The real win in
   coarse coding occurs if the structure of the code models
   structure in the data source (or perhaps in the problem
   to be solved).

                                           -- Ken Laws

The real win in any problem comes from good modelling, of course.  But
since we can't guarantee a priori that our representations are good
models, it is important to develop ways for recovering from
inappropriate models.  I believe distributed representations provide
one such way.

--Tom Dietterich





More information about the Connectionists mailing list