On the structure of connectionist models

Thu Feb 22 16:56:11 EST 1996

Dear connectionists:

Since my posting of the workshop announcement (What is inductive
learning?) several days ago, I was asked to clarify what I meant when I
said that "one can show that inductive class representations (in other
words, representations of concepts and categories) cannot be adequately
specified within the classical (numeric) mathematical models" including,
of course, connectionist models. Here are some general ideas from the
paper which will be presented at the workshop. The following observations
about the STRUCTURE of inductive learning models strongly suggest why the
classical (numeric) mathematical models will be able to support only
"weak" inductive learning models, i.e. the models that can perform 
reasonably only in VERY rigidly delineated environments. 

The questions I'm going to address in this posting on the one hand lay
at the very foundations of connectionism and on the other hand are
relatively simple, provided one keeps in mind that we are discussing the
overall FORMAL STRUCTURE of the learning models (which requires a
relatively high level of abstraction).

Let's look at the structure of connectionist models through the very basic
problem of inductive learning. In order to arrive at a useful formulation
of the inductive learning problem and, at the same time, at a useful
framework for solving the problem, I propose to proceed as follows. 

First and foremost, the inductive learning involves a finite set of data
(objects from the class C) labeled either (C+, C-), positive and negative
examples, or, more generally, simply C', examples. Since we want to
compare quite different classes of models (e.g. symbolic and numeric), let
us focus only on very general assumptions about the nature of the object
representation (input) space: 

    Postulate 1. Input space S satisfies a finite set A of axioms. 

                 (S, in fact, provide a formal specifications of all
                 the necessary data properties; compare with the 
                 concept of abstract data type in computer science).

Thus, for example, the vector (linear) space is defined by means of the 
well known set of axioms for vector addition and scalar multiplication.

Next, let us attach the name "inductive class representation" (ICR) to 
the formal description (specification) of the class C obtained in a 
chosen model as a result of an inductive learning process:

    Postulate 2. In a learning model, ICR is specified in some (which?) 
                 formal manner.                                         

 ---------------------------------------------------------------------
|  My first main point connects Postulate 2 to Postulate 1: ICR       | 
|  should be expressed in the "language" of the axioms from set A.    |
 ---------------------------------------------------------------------

For example, in a vector space ICR should be specified only in terms of 
the given data set plus the operations in the vector space, i.e. we 
are restricted to the spanned affine subspace or its approximation.

The reason is quite simple: the only relationships that can be
(mathematically) legitimately extracted from the input data are those that
are expressible in the language of the input space S. Otherwise, we are,
in fact, IMPLICITLY postulating some other relationships not specified in
the input space by Postulate 1, and, therefore, the "discovery" of such
implicit relationships in the data during the learning process is an 
illusion: such relationships are not "visible" in S. 

Thus, for example, "non-linear" relationships cannot be discovered from a
finite data in a vector space, simply because a non-linear relationship is
not part of the linear structure and, therefore, cannot be
(mathematically) legitimately extracted from the finite input set of vectors 
in the vector space. 

What is happening (of necessity) in a typical connectionist model is that
in addition to the set A of vector space axioms, some additional
non-linear structure (determined by the class of non-linear functions
chosen for the internal nodes of the NN) is being postulated IMPLICITLY
from the beginning. 

Question: What does this additional non-linear structure has to do with 
          the finite input set of vectors?

          (In fact, there are uncountably many such non-linear structures 
          and, typically, none of them is directly related to the 
          structure of the vector space or the input set of vectors.) 

 -----------------------------------------------------------------------
|  My second main point is this: if S is a vector space, in both cases, |
|  whether we do or don't postulate in addition to the vector space     |
|  axioms some non-linear structure (for the internal nodes), we are    |
|  faced with the following important question. What are we learning    | 
|  during the learning process? Certainly, we are not learning any      |
|  interesting ICR: the entire STRUCTURE is fixed before the learning   |
|  process.                                                             |
 -----------------------------------------------------------------------

It appears, that this situation is inevitable if we choose one of the
classical (numeric) mathematical structures to model the input space S. 
However, in an appropriately defined symbolic setting (i.e. with an
appropriate dynamic metric structure, see my home page) the situation 
changes fundamentally. 

To summarize (but not everything is before your eyes), the "strong" 
(symbolic) inductive learning models offer the ICRs that are much more
flexible than those offered by the classical (numeric) models. In other
words, the appropriate symbolic models offer true INDUCTIVE class
representations.  [The latter is given by a subset of objects + the 
constructed finite set of (weighted) operations that can transform objects 
into objects.] 

 Lev Goldfarb

http://wwwos2.cs.unb.ca/profs/goldfarb/goldfarb.htm