Connectionist symbol processing: any progress?

Sun Aug 16 21:20:59 EDT 1998

On Sat, 15 Aug 1998, Mitsu Hadeishi wrote:

> Lev Goldfarb wrote:
> 
> > On Sat, 15 Aug 1998, Mitsu Hadeishi wrote:
> >
> > > Since you are using terms like "metric" extremely loosely, I was also doing
> > > so.
> >
> > Please, note that although I'm not that precise, I have not used the
> > "terms like 'metric' extremely loosely".
> 
> I am referring to this statement:
> 
> >How could a recurrent net learn without some metric and, as
> >far as I know, some metric equivalent to the Euclidean metric?Here you are talking
> about the input space as though the Euclidean metric on that space is particularly
> key, when it is rather the structure of the whole network, the feedback scheme, the
> definition of the error measure, the learning algortihm, and so forth which actually
> create the relevant and important mathematical structure.

Mitsu, I'm afraid, I failed to see what is wrong with my (quoted) 
question. First, I suggested in it that to do inductive learning properly
ONE MUST HAVE AN EXPLICIT AND MEANINGFUL DISTANCE FUNCTION ON THE INPUT
SPACE. And, second, given the latter plus the "foundations of the
connectionism" (e.g. Michael Jordan's chapter 9, in the PDP, vol.1), if,
indeed, one wants to use the n-tuple of real numbers as the input
representation, then it is very natural to assume (at least for a
mathematician) that the input space is a vector space, with the resulting
necessity of an essentially unique metric on it (if the metric is
consistent with the underlying vector space structure, which is
practically a universal assumption in mathematics, see [2] in my first
posting).

>                                                In a sufficiently complex
> network, you can pretty much get any arbitrary map you like from the input space to
> the output, and the error measure is biased by the specific nature of the training
> set (for example), and is measured on the output of the network AFTER it has gone
> through what amounts to an arbitrary differentiable transformation.  By this time,
> the "metric" on the original input space can be all but destroyed.  Add recurrency
> and you even get rid of the fixed dimensionality of the input space.  In the quote
> above, it appears you are implying that there is some direct relationship between
> the metric on the initial input space and the operation of the learning algorithm.
> I do not see how this is the case.

YES, INDEED, I AM STRONGLY SUGGESTING THAT THERE MUST BE A DIRECT
CONNECTION "BETWEEN THE METRIC ON THE INITIAL INPUT SPACE AND THE
OPERATIONS OF THE LEARNING ALGORITHM". IN OTHER WORDS, THE SET OF CURRENT
OPERATIONS ON THE REPRESENTATION SPACE (WHICH, OF COURSE, CAN NOW BE
DYNAMICALLY MODIFIED DURING LEARNING) SHOULD ALWAYS BE USED FOR DISTANCE
COMPUTATION.

What is the point of, first, changing the symbolic representation to the
numeric representation, and, then, applying to this numeric representation
"very strange", symbolic, operations? I absolutely fail to see the need
for such an artificial contortion. 

> > The main reason we are developing the ETS model is precisely related to
> > the fact that we believe it offers THE ONLY ONE POSSIBLE NATURAL (and
> > fundamentally new) SYMBIOSIS of the discrete and the continuous FORMALISMS
> > as opposed to the unnatural ones. I would definitely say (and you would
> > probably agree) that (if, indeed, this is the case) it is the most
> > important consideration.
> >
> > Moreover, it turns out that the concept of a fuzzy set, which was
> > originally introduced in a rather artificial manner that didn't clarify
> > the underlying source of fuzziness (and this have caused an understandable
> > and substantial resistance to its introduction), emerges VERY naturally
> > within the ETS model: the definition of the class via the corresponding
> > distance function typically and naturally induces the fuzzy class boundary
> > and also reveals the source of fuzziness, which includes the interplay
> > between the corresponding weighted operations and (in the case of noise in
> > the training set) a nonzero radius. Note that in the parity class problem,
> > the parity class is not fuzzy, as reflected in the corresponding weighting
> > scheme and the radius of 0.
> 
> Well, what one mathematician calls natural and the other calls artificial may be
> somewhat subject to taste as well as rational argument.  At this point one can get
> into the realm of mathematical aesthetics or philosophy rather than hard science.
> >From my point of view, symbolic representations can be seen as merely emergent
> phenomena or patterns of behavior of physical feedback systems (i.e., looking at
> cognition as essentially a bounded feedback system---bounded under normal
> conditions, unless the system goes into seizure (explodes mathematically---well, it
> is still bounded but it tries to explode!), of course.)  From this point of view
> both symbols and fuzziness and every other conceptual representation are neither
> "true" nor "real" but simply patterns which tend to be, from an
> information-theoretic point of view, compact and useful or efficient
> representations.  But they are built on a physical substrate of a feedback system,
> not vice-versa.
> 
> However, it isn't the symbol, fuzzy or not, which is ultimately general, it is the
> feedback system, which is ultimately a physical system of course.  So, while we may
> be convinced that your formalism is very good, this does not mean it is more
> fundamentally powerful than a simulation approach.  It may be that your formalism is
> in fact better for handling symbolic problems, or even problems which require a
> mixture of fuzzy and discrete logic, etc., but what about problems which are not
> symbolic at all?  What about problems which are both symbolic and non-symbolic (not
> just fuzzy, but simply not symbolic in any straightforward way?)
> 
> The fact is, intuitively it seems to me that some connectionist approach is bound to
> be more general than a more special-purpose approach.  This does not necessarily
> mean it will be as good or fast or easy to use as a specialized approach, such as
> yours.  But it is not at all convincing to me that just because the input space to a
> connectionist network looks like R(n) in some superficial way, this would imply that
> somehow a connectionist model would be incapable of doing symbolic processing, or
> even using your model per se.

The last paragraphs betray your classical physical bias based on our
present (incidentally vector-space based) mathematics. As you can see from
my home page, I do not believe in it any more: we believe that the
(inductive) symbolic representation is a more basic and much more adequate
(evolved during the evolution) form of representation, while the numeric
form is a very special case of the latter when the alphabet consists of a
single letter. 

By the way, I'm not the only one to doubt the adequacy of the classical
form of representation. For example, here are two quotes from Erwin
Schrodinger's book "Science and Humanism" (Cambridge Univ. Press), a
substantial part of which is devoted to a popular explication of the
following ideas: 

"The observed facts (about particles and light and all sorts of radiation
and their mutual interaction) appear to be REPUGNANT to the classical
ideal of continuous description in space and time."

"If you envisage the development of physics in THE LAST HALF-CENTURY, you
get the impression that the discontinuous aspect of nature has been forced
upon us VERY MUCH AGAINST OUR WILL. We seemed to feel quite happy with the
continuum. Max Plank was seriously frightened by the idea of a
discontinuous exchange of energy . . ."

(italics are in the original)

Cheers,
         Lev