No subject

Tue Jun 6 06:52:25 EDT 2006

to distribute this given datum (association vector or whatever) over my
representational units so that I can recover it from a partial stimulus.
The issue of how the given datum itself is represented is obviously *very*
important --- no quarrel on that --- but the question of "internal
representations" (as Bo Xu so appropriately calls them) seems more
immediate from a connectionist point of view because it relates *directly*
to the problem of learning.

As we all know, learning from a finite data set is ill-posed and, even
with a fixed network topology, can (and will) produce multiple "equally
good" solutions. Unlike Bo Xu, I am not at all convinced that "most of the
current networks' topology ensures that the internal representations are
mixed distributed representations" --- at least not "optimally" so. For the
last year and some, I've been working on the problem of classifying "equally
good" network solutions to approximation problems by their ability to
withstand internal perturbations gracefully. I have found that, while a
large enough network often does distribute responsibility, there is some
considerable variation from net to net. My ideal distributed representation
would be one that is minimally degraded by the malfunction of individual
representative units (weights and neurons) so that it could withstand the
maximum trauma better than any other network in its class. Of course, this
is a theoretical ideal and the order of the effects I am talking about is
insane. However, I think that the internal interactions on which this
characterization depends are amenable to relatively simple empirical
analysis (!) under simplifying assumptions, and are a "black box" only
with respect to exact analysis. In any case, even if they were a black
box, the characterization would still be applicable --- we just wouldn't
be able to use it. In effect, what I am advocating is already present in
most estimation methods under the guise of regularization.

An interesting contrast, however, exists with regard to the various
"pruning" algorithms used to improve generalization. If things go well
and they succeed (most of them do, I think), then the networks they
produce have a near-minimal number of representational units, across
which the various associastions are quite well-distributed. However,
precisely because of their minimality, each representational units has
acquired maximal internal relevance, and is now minimally dispensable.
Had there been no pruning, and some other method had been used to force
well-distributedness, I think that good generalization could have been
obtained without losing robustness. In effect, I am saying that instead
of using a minimal number of resources as much as possible, we could use
all the available resources as little as possible. Both create highly
distributed representations, but the latter does so without losing any
robustness (indeed, maximizing it).

>I want to thank Ali Minai for his comments.  All of his comments are very
>valuable and thought-stimulating.

Ditto for Bo Xu. I've been enjoying this discussion very much.

Ali Minai
University of Virginia
aam9n at Virginia.EDU