Connectionist symbol processing: any progress?

Fri Aug 14 09:58:44 EDT 1998

On Thu, 13 Aug 1998, Mitsu Hadeishi wrote:

> I'm not quite sure I agree with your analysis.  Since I haven't looked at it in
> great detail, I present this as a tentative critique of your presentation.
> 
> Since recurrent ANNs can be made to carry out any Turing operation (i.e., modulo
> their finite size, they are Turing equivalent), then unless you are saying that
> your represention cannot be implemented on a Turing machine, then it is clearly
> NOT the case that recurrent ANNs *cannot* learn arbitrary symbolic
> representations. Not having looked at your scheme in detail, of course, I don't
> know whether your scheme somehow is unimplementable even on a Turing machine, but
> it seems to me you must not be claiming this.

Yes, I'm not claiming this. Moreover, I'm not discussing the SIMULATIONAL,
or computing, power of a learning model, since in a simulation of a Turing
machine one doesn't care how the actual "parameters" are constructed based
on a small training set, i.e. no learning is involved. Many present (and
future) computational models are "Turing equivalent".  That doesn't make
them learning models, does it?

In other words, as I mentioned in my original posting, IF YOU KNOW THE
SYMBOLIC CLASS STRUCTURE of course you can simulate, or encode, it on
many machines (similar quote straight from the horse's mouse i.e. from
Siegelmann and Sontag paper "On the computational power of neural nets" ,
section 1.2:  "Many other types of 'machines' may be used for
universality").  Again, the discovery of the symbolic class structure is a
fundamentally different matter, much less trivial than just a simple
encoding of this structure using any other, including numeric, structure.

> You seem to be basing your argument on the notion that the input space to a
> recurrent ANN is a set of numbers, which you interpret as the coordinates of a
> vector.  However, this is only a kind of vague analogy, since the field
> operations of the vector space (addition, multiplication, etc.) have no clear
> meaning on the input space.  "Adding" two input vectors does not necessarily
> result in anything meaningful except in the sense that the recurrent ANN to be
> useful must be locally stable with respect to small variations in the input.
> However, the actual structure or metric of the input space is in some sense
> determined not a priori but by the state of the recurrent ANN itself, and can
> change over time both as a result of training and as a result of iteration.  The
> input space is numbers, yes, but that doesn't make it a vector space.  For
> example, what properties of the input would be preserved if I, say, added the
> vector (10^25, 10^25, ...) to the input?  If it is a "vector space" then that
> operation would yield something sensible, some symmetries, and yet it obviously
> does not.  Thus, while I sympathize with your claim that the vector field of R(n)
> does not admit to the structure necessary to make visible much symbolic
> structure, this in itself does not doom connectionist symbol processing by any
> means.

It appears that something VERY BASIC is missing from the above
description: How could a recurrent net learn without some metric and, as
far as I know, some metric equivalent to the Euclidean metric? (All "good'
metrics on a finite-dimensional vector space are equivalent to the
Euclidean, see [1] in my original posting.)

> Your argument does have weight when applied to a single-layer perceptron, which
> is, after all, just a thresholded/distorted linear transformation.  Although it
> seemed to take the early connectionist community by surprise, it should be no
> surprise at all that a single-layer perceptron cannot learn the parity problem,
> because obviously the parity problem is not linearly separable, and how could any
> linear discriminator possibly learn a non-linearly-separable problem?  However,
> we do not live in a world of single-layer perceptrons.  Because networks are more
> complex than this, arguments about the linearity of the input space seem to me
> rather irrelevant.  I suspect you mean something else, however.

Yes, I do.

> I think the intuitive point you are perhaps trying to make is that symbolic
> representations are arbitrarily nestable (recursively recombinable), and an input
> space which consists of a fixed number of dimensions cannot handle recursive
> combinations.  However, one can use time-sequence to get around this problem (as
> we all are doing when we read and write for example).  Rather than make our eyes,
> for example, capable of handling arbitrarily recombinable input all at once, we
> sequence the input to our eyes by reading material over time.  The same trick can
> be used with recurrent networks for example.

Mitsu,

I'm not making just this point.

The main point I'm making can be stated as follows. Inductive learning
requires some object dissimilarity and/or similarity, measure(s). The
accumulated mathematical experience strongly suggests that the distance in
the input space must be consistent with the underlying operational, or
compositional, structure of the chosen object representation (e.g. 
topological group, topological vector space, etc). It turns out that while
the classical vector space (because of the "simple"  compositional
structure of its objects) allows essentially one metric consistent with
the underlying algebraic structure [1], each symbolic "space"  (e.g.
strings, trees, graphs) allows infinitely many of them. In the latter
case, the inductive learning becomes the learning of the corresponding
class distance function (refer to my parity example).  Moreover, since
some noise is always present in the training set, I cannot imagine how
RELIABLE symbolic inductive class structure can be learned from a SMALL
training set without the right symbolic bias and without the help of the
corresponding symbolic distance measures. 

Cheers,
          Lev