Distributed Representations

Sat Jun 22 21:49:33 EDT 1991

Bo Xu raises some questions about distributed representations in the
context of feed-forward neural networks, particularly with regard to
graceful degradation. I do not agree that to require graceful degradation
is to imply "brain-like" networks. In my opinion, the very notion of
distribution is fundamentally linked to the requirement that each
representational unit be minimally loaded, and that each representation
be as homogeneously distributed over all representational units as possible.
That this produces graceful degradation is partly true (only to the first
order, given the non-linearity of the system), but that is incidental.

Speaking of which layers to apply the definition to, I think that in a
feed-forward associative network (analog or binary), the hidden neurons
(or all the weights) are the representational units. The input neurons
merely distribute the prior part of the association, and the output neurons
merely produce the posterior part. The latter are thus a "recovery mechanism"
designed to "decode" the distributed representation of the hidden units and
recover the "original" item. Of course, in a heteroassociative system, the
"recovered original" is not the same as the "stored original". I realize that
this is stretching the definition of "representation", but it seems quite
natural to me.

The issue of a "recovery mechanism" is quite fundamental to the question
of representational distribution. Without a requirement for adequate
recoverability, any finite medium could be "distributedly" loaded with
a potentially infinite number of representations, without being able
to reproduce any of them. To ensure adequate recoverability, however,
representations must be "distinct", or mutually non-interacting, in some
sense. Given the countervailing requirement of distributedness, the
obvious route of separation by localization is not available, and we
must arrive at some compromise principle of minimum mutual disturbance,
such as a requirement for orthogonality or linear independence (rather
artificial, if you ask me).

My point is that defining distributed representations only in terms
of unconstrained characteristics is a partial solution. Internal
and external constraining factors must be included in the formulation
to adequately ground the definition. These are provided by the
requirements of maximum dispensibility and adequate recoverability.
Zillions of issues remain unaddressed by this formulation too, especially
those of consistent measurement. I feel that each domain and situation
will have to supply its own specifics.

I am not sure I understand Bo Xu's assertion that analog representations
are "more natural". Certainly, to approximate a parabola (which I have
done hundreds of times with different neural nets) would imply using an
analog representation, but it is not clear if that is so natural for
classifying apples and pears. Using different analog values to indicate
intra-class variations is reasonable and, under specific circumstances,
might even be provably better than a binary representation. But I would
be very hesitant to generalize over all possible circumstances. In any
case, a global characterization of distributed representation should depend
of specifics only for details, and should apply to both discrete and analog
representations.

Ali Minai
University of Virginia
aam9n at Virginia.EDU