Connectionist symbol processing: any progress?

Fri Aug 14 14:44:07 EDT 1998

Lev,

    Okay, so we agree on the following:

    Recurrent ANNs have the computational power required.  The only
thing at issue is the learning algortihm.

>How could a recurrent net learn without some metric and, as
>far as I know, some metric equivalent to the Euclidean metric?

    Your arguments so far seem to be focusing on the "metric" on the
input space, but this does not in itself mean anything at all about
the metric of the learning algorithm as a whole.  Clearly the input
space is NOT a vector space in the usual sense of the word, at least
if you use the metric which is defined by the whole system (learning
algortihm, error measure, state of the network).  So, what you are now
saying is that the metric must be equivalent (rather than equal to) to
the Euclidean metric: you do not define what you mean by this.

    The learning "metric" in the connectionist paradigm changes over
time: it is a function of the structure of the learning algorithm and
the state of the network, as I mentioned above.  The only sense in
which the metric is "equivalent" to the Euclidean metric is locally;
that is, due to the need to discriminate noise, this metric must be
locally stable, thus there is an open neighborhood around most points
in the topological input space for which the "metric" vanishes.
However, the metric can be quite complex, it can have singularities,
it can change over time, it can fold back onto itself, etc.

    This local stability may not be of interest, however, since the
input may be coded so that each discrete possibility is coded as exact
numbers which are separated in space.  In this case the input space
may not be a continuous space at all, but a discrete lattice or
something else.  If the input space is a lattice, then there are no
small open neighborhoods around the input points, and thus even this
similaity to the Euclidean metric no longer applies.  At least, so
far, your arguments do not seem to show anything beyond this.

>It turns out that while
>the classical vector space (because of the "simple"  compositional
>structure of its objects) allows essentially one metric consistent with
>the underlying algebraic structure [1], each symbolic "space"  (e.g.
>strings, trees, graphs) allows infinitely many of them.

    Recurrent networks spread the representation of a compound symbol
over time; thus, you can present a string of symbols to a recurrent
network and its internal state will change.  You have not shown, it
seems to me, that in this case the learning metric would look anything
like a Euclidean metric, or that there would be only "one" such
metric.  In fact it seems obvious to me that this would NOT be the
case.  I would like to hear why you might disagree.

Mitsu

Lev Goldfarb wrote:

> On Thu, 13 Aug 1998, Mitsu Hadeishi wrote:
>
> > I'm not quite sure I agree with your analysis.  Since I haven't looked at it in
> > great detail, I present this as a tentative critique of your presentation.
> >
> > Since recurrent ANNs can be made to carry out any Turing operation (i.e., modulo
> > their finite size, they are Turing equivalent), then unless you are saying that
> > your represention cannot be implemented on a Turing machine, then it is clearly
> > NOT the case that recurrent ANNs *cannot* learn arbitrary symbolic
> > representations. Not having looked at your scheme in detail, of course, I don't
> > know whether your scheme somehow is unimplementable even on a Turing machine, but
> > it seems to me you must not be claiming this.
>
> Yes, I'm not claiming this. Moreover, I'm not discussing the SIMULATIONAL,
> or computing, power of a learning model, since in a simulation of a Turing
> machine one doesn't care how the actual "parameters" are constructed based
> on a small training set, i.e. no learning is involved. Many present (and
> future) computational models are "Turing equivalent".  That doesn't make
> them learning models, does it?
>
> In other words, as I mentioned in my original posting, IF YOU KNOW THE
> SYMBOLIC CLASS STRUCTURE of course you can simulate, or encode, it on
> many machines (similar quote straight from the horse's mouse i.e. from
> Siegelmann and Sontag paper "On the computational power of neural nets" ,
> section 1.2:  "Many other types of 'machines' may be used for
> universality").  Again, the discovery of the symbolic class structure is a
> fundamentally different matter, much less trivial than just a simple
> encoding of this structure using any other, including numeric, structure.
>
> > You seem to be basing your argument on the notion that the input space to a
> > recurrent ANN is a set of numbers, which you interpret as the coordinates of a
> > vector.  However, this is only a kind of vague analogy, since the field
> > operations of the vector space (addition, multiplication, etc.) have no clear
> > meaning on the input space.  "Adding" two input vectors does not necessarily
> > result in anything meaningful except in the sense that the recurrent ANN to be
> > useful must be locally stable with respect to small variations in the input.
> > However, the actual structure or metric of the input space is in some sense
> > determined not a priori but by the state of the recurrent ANN itself, and can
> > change over time both as a result of training and as a result of iteration.  The
> > input space is numbers, yes, but that doesn't make it a vector space.  For
> > example, what properties of the input would be preserved if I, say, added the
> > vector (10^25, 10^25, ...) to the input?  If it is a "vector space" then that
> > operation would yield something sensible, some symmetries, and yet it obviously
> > does not.  Thus, while I sympathize with your claim that the vector field of R(n)
> > does not admit to the structure necessary to make visible much symbolic
> > structure, this in itself does not doom connectionist symbol processing by any
> > means.
>
> It appears that something VERY BASIC is missing from the above
> description: How could a recurrent net learn without some metric and, as
> far as I know, some metric equivalent to the Euclidean metric? (All "good'
> metrics on a finite-dimensional vector space are equivalent to the
> Euclidean, see [1] in my original posting.)
>
> > Your argument does have weight when applied to a single-layer perceptron, which
> > is, after all, just a thresholded/distorted linear transformation.  Although it
> > seemed to take the early connectionist community by surprise, it should be no
> > surprise at all that a single-layer perceptron cannot learn the parity problem,
> > because obviously the parity problem is not linearly separable, and how could any
> > linear discriminator possibly learn a non-linearly-separable problem?  However,
> > we do not live in a world of single-layer perceptrons.  Because networks are more
> > complex than this, arguments about the linearity of the input space seem to me
> > rather irrelevant.  I suspect you mean something else, however.
>
> Yes, I do.
>
> > I think the intuitive point you are perhaps trying to make is that symbolic
> > representations are arbitrarily nestable (recursively recombinable), and an input
> > space which consists of a fixed number of dimensions cannot handle recursive
> > combinations.  However, one can use time-sequence to get around this problem (as
> > we all are doing when we read and write for example).  Rather than make our eyes,
> > for example, capable of handling arbitrarily recombinable input all at once, we
> > sequence the input to our eyes by reading material over time.  The same trick can
> > be used with recurrent networks for example.
>
> Mitsu,
>
> I'm not making just this point.
>
> The main point I'm making can be stated as follows. Inductive learning
> requires some object dissimilarity and/or similarity, measure(s). The
> accumulated mathematical experience strongly suggests that the distance in
> the input space must be consistent with the underlying operational, or
> compositional, structure of the chosen object representation (e.g.
> topological group, topological vector space, etc). It turns out that while
> the classical vector space (because of the "simple"  compositional
> structure of its objects) allows essentially one metric consistent with
> the underlying algebraic structure [1], each symbolic "space"  (e.g.
> strings, trees, graphs) allows infinitely many of them. In the latter
> case, the inductive learning becomes the learning of the corresponding
> class distance function (refer to my parity example).  Moreover, since
> some noise is always present in the training set, I cannot imagine how
> RELIABLE symbolic inductive class structure can be learned from a SMALL
> training set without the right symbolic bias and without the help of the
> corresponding symbolic distance measures.
>
> Cheers,
>           Lev