Connectionist symbol processing: any progress?

Sat Aug 15 22:32:12 EDT 1998

On Sat, 15 Aug 1998, Mitsu Hadeishi wrote:

> Since you are using terms like "metric" extremely loosely, I was also doing
> so. 

Please, note that although I'm not that precise, I have not used the 
"terms like 'metric' extremely loosely".

> What I mean here is that with certain connectionist schemes, for example
> those that use an error function of some kind, one could conceive of the error
> measure as a kind of distance function (however, it is not a metric formally
> speaking).  However, the error measure is far more useful and important than
> the "metric" you might impose on the input space when conceiving of it as a
> vector space, since the input space is NOT a vector space.

If the input and "the intermediate" spaces are not vector spaces, then what is
the advantage of the "connectionist" architectures? 

> > If "the input space is NOT a vector space in the usual sense of the word",
> > then what is it? Are we talking about the formal concepts known in
> > mathematics or we don't care about such "trifle" things at all?
> > Remember, that "even" physicists care about such things, and I said
> > "even", because to model the inductive learning we will need more
> > abstract models.
> 
> As you (should) know, a vector space is supposed to have vector field
> symmetries.  For example, something should be preserved under rotations and
> translations of the input vectors.  However, what do you get when you do
> arbitrary rotations of the input to a connectionist network?  I don't mean
> rotations of, say, the visual field to a pattern recognition network, but
> rather taking the actual values of the inputs to each neuron in a network as
> coordinates to a vector, and then "rotating" them or translating them, or
> both.  What meaning does this have when used with a recurrent connectionist
> architecture?  It seems to me that it has very little meaning if any.
> 
> > [metrics are equivalent if they induce the same topology, or the same
> > convergence]
> 
> Again, the only really important function is the structure of the error
> function, not the "metric" on the input space conceived as a vector space, and
> it isn't even a metric in the usual sense of the word.
> 
> > Can I give you a one sentence answer? If you look very carefully at the
> > topologies induced on the set of strings (over an alphabet of size > 1) by
> > various symbolic distances (of type given in the parity class problem),
> > then you will discover that they have hardly anything to do with the
> > continuous topologies we are used to from the classical mathematics. In
> > this sense, the difficulties ANNs have with the parity problem are only
> > the tip of the iceberg.
> 
> I do not dispute the value of your work, I simply dispute the fact that you
> seem to think it dooms connectionist approaches, because your intuitive
> arguments against connectionist approaches are not cogent it seems to me.
> While your work is probably quite valuable, and I think I understand what you
> are getting at, I see no reason why what you are talking about would prevent a
> connectionist approach (based on a recurrent or more sophisticated
> architecture) from being able to discover the same symbolic metric---because,
> as I say, the input space is not in any meaningful sense a vector space, and
> the recurrent architecture allows the "metric" of the learning algorithm, it
> seems to me, to acquire precisely the kind of structure that you need it
> to---or, at least, I do not see in principle why it cannot.  The reason this
> is so is again because the input is spread out over multiple presentations to
> the network.
> 
> There are good reasons to use connectionist schemes, however, I believe, as
> opposed to purely symbolic schemes.  For one: symbolic techniques are
> inevitably limited to highly discrete representations, whereas connectionist
> architectures can at least in theory combine both discrete and continuous
> representations.  

The main reason we are developing the ETS model is precisely related to
the fact that we believe it offers THE ONLY ONE POSSIBLE NATURAL (and
fundamentally new) SYMBIOSIS of the discrete and the continuous FORMALISMS
as opposed to the unnatural ones. I would definitely say (and you would
probably agree) that (if, indeed, this is the case) it is the most
important consideration.

Moreover, it turns out that the concept of a fuzzy set, which was
originally introduced in a rather artificial manner that didn't clarify
the underlying source of fuzziness (and this have caused an understandable
and substantial resistance to its introduction), emerges VERY naturally
within the ETS model: the definition of the class via the corresponding
distance function typically and naturally induces the fuzzy class boundary
and also reveals the source of fuzziness, which includes the interplay
between the corresponding weighted operations and (in the case of noise in
the training set) a nonzero radius. Note that in the parity class problem,
the parity class is not fuzzy, as reflected in the corresponding weighting
scheme and the radius of 0. 

>                      Two, it may be that the simplest or most efficient
> representation of a given set of rules may include both a continous and a
> discrete component; that is, for example, considering issues such as imprecise
> application of rules, or breaking of rules, and so forth.  For example,
> consider poetic speech; the "rules" for interpreting poetry are clearly not
> easily enumerable, yet human beings can read poetry and get something out of
> it.  A purely symbolic approach may not be able to easily capture this,
> whereas it seems to me a connectionist approach has a better chance of dealing
> with this kind of situation.
> 
> I can see value in your approach, and things that connectionists can learn
> from it, but I do not see that it dooms connectionism by any means.

See the previous comment.

Cheers,
        Lev