Distributed Representations
Tom Dietterich
tgd at turing.cs.orst.edu
Fri Jun 7 12:45:26 EDT 1991
Date: Thu 6 Jun 91 22:02:27-PDT
From: Ken Laws <LAWS at ai.sri.com>
Mail-System-Version: <SUN-MM(229)+TOPSLIB(128)@AI.SRI.COM>
I'm not sure this is the same concept, but there were several
papers at the last IJCAI showing that neural networks worked
better than decision trees. The reason seemed to be that
neural decisions depend on all the data all the time, whereas
local decisions use only part of the data at one time.
This is not the same concept at all. You are worrying about locality
in the input space, whereas distributed representations usually
concern (lack of) locality in the output space or in some intermediate
representation. I have applied decision trees to learn distributed
representations of output classes, and in all of my experiments,
the distributed representation performs better than learning either
one large tree (to make a k-way discrimination) or learning k separate
trees. I believe this is because a distributed representation is able
to correct for errors made in learning any individual output unit.
The paper "dietterich.error-correcting.ps.Z" in the neuroprose archive
presents experimental support for this claim.
I've never put much stock in the military reliability claims.
A bullet through the chip or its power supply will be a real
challenge. Noise tolerance is important, though, and I suspect
that neural systems really are more tolerant.
It isn't a neural vs. non-neural issue: distributed representations
are more redundant, and hence, more resistant to (local) damage.
Noise tolerance is also not a neural vs. non-neural issue. To achieve
noise tolerance, you must control over-fitting. There are many ways
to do this: low-dimensional representations, smoothness assumptions,
minimum description length methods, cross-validation, etc.
Terry Sejnowski's original NETtalk work has always bothered me.
He used a neural network to set up a mapping from an input
bit string to 27 output bits, if I recall. I have never seen
a "control" experiment showing similar results for 27 separate
discriminant analyses, or for a single multivariate discriminant.
I suspect that the results would be far better. The wonder of
the net was not that it worked so well, but that it worked
at all.
I think you should perform these studies before you make such claims.
I myself doubt them very much, because the NETtalk task violates the
assumptions of discriminant analysis. In my experience,
backpropagation works quite well on the NETtalk task. We have found
that Wolpert's HERBIE (which is a kind of weighted 4-nearest-neighbor
method) and generalized radial basis functions do better than
backpropagation, but everything else we have tried does worse
(decision trees, perceptrons, Fringe).
I have come to believe strongly in "coarse-coded" representations,
which are somewhat distributed. (I have no insight as to whether
fully distributed representations might be even better. I suspect
that their power is similar to adding quadratic and higher-order
terms to a standard statistical model.) The real win in
coarse coding occurs if the structure of the code models
structure in the data source (or perhaps in the problem
to be solved).
-- Ken Laws
The real win in any problem comes from good modelling, of course. But
since we can't guarantee a priori that our representations are good
models, it is important to develop ways for recovering from
inappropriate models. I believe distributed representations provide
one such way.
--Tom Dietterich
More information about the Connectionists
mailing list