Connectionist symbol processing: any progress?

Sat Aug 15 20:22:52 EDT 1998

Lev Goldfarb wrote:

> On Fri, 14 Aug 1998, Mitsu Hadeishi wrote:
> >     Your arguments so far seem to be focusing on the "metric" on the
> > input space, but this does not in itself mean anything at all about
> > the metric of the learning algorithm as a whole.
>
> What does it mean "the metric of the learning algorithm as a whole"?
> There is no such a concept as "the metric of the learning algorithm as a
> whole".

Since you are using terms like "metric" extremely loosely, I was also doing
so.  What I mean here is that with certain connectionist schemes, for example
those that use an error function of some kind, one could conceive of the error
measure as a kind of distance function (however, it is not a metric formally
speaking).  However, the error measure is far more useful and important than
the "metric" you might impose on the input space when conceiving of it as a
vector space, since the input space is NOT a vector space.

> If "the input space is NOT a vector space in the usual sense of the word",
> then what is it? Are we talking about the formal concepts known in
> mathematics or we don't care about such "trifle" things at all?
> Remember, that "even" physicists care about such things, and I said
> "even", because to model the inductive learning we will need more
> abstract models.

As you (should) know, a vector space is supposed to have vector field
symmetries.  For example, something should be preserved under rotations and
translations of the input vectors.  However, what do you get when you do
arbitrary rotations of the input to a connectionist network?  I don't mean
rotations of, say, the visual field to a pattern recognition network, but
rather taking the actual values of the inputs to each neuron in a network as
coordinates to a vector, and then "rotating" them or translating them, or
both.  What meaning does this have when used with a recurrent connectionist
architecture?  It seems to me that it has very little meaning if any.

> [metrics are equivalent if they induce the same topology, or the same
> convergence]

Again, the only really important function is the structure of the error
function, not the "metric" on the input space conceived as a vector space, and
it isn't even a metric in the usual sense of the word.

> Can I give you a one sentence answer? If you look very carefully at the
> topologies induced on the set of strings (over an alphabet of size > 1) by
> various symbolic distances (of type given in the parity class problem),
> then you will discover that they have hardly anything to do with the
> continuous topologies we are used to from the classical mathematics. In
> this sense, the difficulties ANNs have with the parity problem are only
> the tip of the iceberg.

I do not dispute the value of your work, I simply dispute the fact that you
seem to think it dooms connectionist approaches, because your intuitive
arguments against connectionist approaches are not cogent it seems to me.
While your work is probably quite valuable, and I think I understand what you
are getting at, I see no reason why what you are talking about would prevent a
connectionist approach (based on a recurrent or more sophisticated
architecture) from being able to discover the same symbolic metric---because,
as I say, the input space is not in any meaningful sense a vector space, and
the recurrent architecture allows the "metric" of the learning algorithm, it
seems to me, to acquire precisely the kind of structure that you need it
to---or, at least, I do not see in principle why it cannot.  The reason this
is so is again because the input is spread out over multiple presentations to
the network.

There are good reasons to use connectionist schemes, however, I believe, as
opposed to purely symbolic schemes.  For one: symbolic techniques are
inevitably limited to highly discrete representations, whereas connectionist
architectures can at least in theory combine both discrete and continuous
representations.  Two, it may be that the simplest or most efficient
representation of a given set of rules may include both a continous and a
discrete component; that is, for example, considering issues such as imprecise
application of rules, or breaking of rules, and so forth.  For example,
consider poetic speech; the "rules" for interpreting poetry are clearly not
easily enumerable, yet human beings can read poetry and get something out of
it.  A purely symbolic approach may not be able to easily capture this,
whereas it seems to me a connectionist approach has a better chance of dealing
with this kind of situation.

I can see value in your approach, and things that connectionists can learn
from it, but I do not see that it dooms connectionism by any means.

Mitsu

>
>
> So, isn't it scientifically more profitable to work DIRECTLY with the
> symbolic topologies, i.e. the symbolic distance functions, by starting
> with some initial set of symbolic operations and then proceeding in a
> systematic manner to seek the optimal topology (i.e. the optimal set of
> weighted operations) for the training set. To simplify things, this is
> what the evolving transformation system model we are developing attempts
> to do. It appears that there are profound connections between the relevant
> symbolic topologies (and hardly any connections with the classical numeric
> topologies). Based on those connections, we are developing an efficient
> inductive learning model that will work with MUCH SMALLER training set
> than has been the case in the past. The latter is possible due to the fact
> that, typically, computation of the distance between two strings involves
> many operations and the optimization function involves O(n*n)
> interdistances, where n is the size of the training set.
>
> Cheers,
>           Lev