Connectionist symbol processing: any progress?

Lev Goldfarb goldfarb at unb.ca
Sat Aug 15 19:29:10 EDT 1998


On Fri, 14 Aug 1998, Mitsu Hadeishi wrote:

> Lev,
> 
>     Okay, so we agree on the following:
> 
>     Recurrent ANNs have the computational power required.  The only
> thing at issue is the learning algortihm.

"The ONLY thing at issue" IS the MAIN thing at issue, because the
simulation of the Turing machine is just a clever game, while an adequate
model of inductive learning should, among many other things, change our
understanding of what science is (see, for example, Alexander Bird,
Philosophy of Science, McGill-Queen's University Press, 1998).

> >How could a recurrent net learn without some metric and, as
> >far as I know, some metric equivalent to the Euclidean metric?
> 
>     Your arguments so far seem to be focusing on the "metric" on the
> input space, but this does not in itself mean anything at all about
> the metric of the learning algorithm as a whole. 

What does it mean "the metric of the learning algorithm as a whole"?
There is no such a concept as "the metric of the learning algorithm as a
whole".

>                                                    Clearly the input
> space is NOT a vector space in the usual sense of the word, at least
> if you use the metric which is defined by the whole system (learning
> algortihm, error measure, state of the network).

If "the input space is NOT a vector space in the usual sense of the word",
then what is it? Are we talking about the formal concepts known in
mathematics or we don't care about such "trifle" things at all? 
Remember, that "even" physicists care about such things, and I said
"even", because to model the inductive learning we will need more
abstract models.
  
>                                                 So, what you are now
> saying is that the metric must be equivalent (rather than equal to) to
> the Euclidean metric: you do not define what you mean by this.

[metrics are equivalent if they induce the same topology, or the same
convergence]

>     The learning "metric" in the connectionist paradigm changes over
> time: it is a function of the structure of the learning algorithm and
> the state of the network, as I mentioned above.  The only sense in
> which the metric is "equivalent" to the Euclidean metric is locally;
> that is, due to the need to discriminate noise, this metric must be
> locally stable, thus there is an open neighborhood around most points
> in the topological input space for which the "metric" vanishes.
> However, the metric can be quite complex, it can have singularities,
> it can change over time, it can fold back onto itself, etc.
> 
>     This local stability may not be of interest, however, since the
> input may be coded so that each discrete possibility is coded as exact
> numbers which are separated in space.  In this case the input space
> may not be a continuous space at all, but a discrete lattice or
> something else.  If the input space is a lattice, then there are no
> small open neighborhoods around the input points, and thus even this
> similaity to the Euclidean metric no longer applies.  At least, so
> far, your arguments do not seem to show anything beyond this.
> 
> >It turns out that while
> >the classical vector space (because of the "simple"  compositional
> >structure of its objects) allows essentially one metric consistent with
> >the underlying algebraic structure [1], each symbolic "space"  (e.g.
> >strings, trees, graphs) allows infinitely many of them.
> 
>     Recurrent networks spread the representation of a compound symbol
> over time; thus, you can present a string of symbols to a recurrent
> network and its internal state will change.  You have not shown, it
> seems to me, that in this case the learning metric would look anything
> like a Euclidean metric, or that there would be only "one" such
> metric.  In fact it seems obvious to me that this would NOT be the
> case.  I would like to hear why you might disagree.
 
Mitsu,

Forgive me for the analogy, but from the above as well as from other
published sources, it appears to me that in the "connectionist symbol
processing", by throwing into one model two, I strongly suggest,
INCOMPATIBLE ingredients (vector space model and the symbolic operations)
one hopes to prepare a magic soup for inductive learning. I strongly
believe that this is not a scientifically fruitful approach. Why? 

Can I give you a one sentence answer? If you look very carefully at the
topologies induced on the set of strings (over an alphabet of size > 1) by
various symbolic distances (of type given in the parity class problem),
then you will discover that they have hardly anything to do with the
continuous topologies we are used to from the classical mathematics. In
this sense, the difficulties ANNs have with the parity problem are only
the tip of the iceberg. 

So, isn't it scientifically more profitable to work DIRECTLY with the
symbolic topologies, i.e. the symbolic distance functions, by starting
with some initial set of symbolic operations and then proceeding in a
systematic manner to seek the optimal topology (i.e. the optimal set of
weighted operations) for the training set. To simplify things, this is
what the evolving transformation system model we are developing attempts
to do. It appears that there are profound connections between the relevant
symbolic topologies (and hardly any connections with the classical numeric
topologies). Based on those connections, we are developing an efficient
inductive learning model that will work with MUCH SMALLER training set
than has been the case in the past. The latter is possible due to the fact
that, typically, computation of the distance between two strings involves
many operations and the optimization function involves O(n*n) 
interdistances, where n is the size of the training set. 


Cheers,
          Lev 





More information about the Connectionists mailing list