Connectionist symbolic processing

Mon Aug 24 14:25:57 EDT 1998

I've been doing research in connectionist symbol processing for some time, 
so I'd like to contribute something to the discussion.  I'll try to keep 
it brief and just say what I'm ready to say.  I am not prepared to address 
Michael Arbib's question about real brain function at this time, although 
it's possible to make a connection.

First, here are some references to the literature of rule extraction with 
neural networks, which I have been following.  The list omits a lot of 
good work, but is meant to be representative:

  Andrews, R., Diederich, J. & Tickle, A. B. (1995)
   "Survey and critique of techniques for extracting rules 
   from trained artificial neural networks", Knowledge-Based Systems, 
   vol. 8, no. 6, 373-389.

  Craven, M. W. & Shavlik, J. W. (1993) 
   "Learning Symbolic Rules using Artificial Neural Networks", 
   Proceedings of the 10th International Machine Learning Conference, 
   Amherst, MA.  73-80.  San Mateo, CA:Morgan Kaufmann.

  Healy, M. J. & Caudell, T. P. (1997) 
   "Acquiring Rule Sets as a Product of Learning in a Logical Neural 
   Architecture", IEEE Transactions on Neural Networks, 
   vol. 8, no. 3, 461-475.

  Kasabov, N. K. (1996) 
   "Adaptable neuro production systems", 
   Neurocomputing, vol. 13, 95-117.

  Setiono, R. (1997) 
   "Extracting Rules from Neural Networks by Pruning and Hidden-Unit 
   Splitting", Neural Computation, vol. 9, no. 1, 205-225.

  Sima, J. (1995) 
   "Neural Expert Systems", Neural Networks, vol. 8, 261-271.

Most of the work is empirical, but is accompanied by analyses of the 
practical aspects of extracting knowledge from data and of incorporating 
pre-existing knowledge along with the extracted knowledge.  The supposed 
knowledge here is mostly in the form of  if-then  rules which, to greater 
or lesser extent, represent propositional statements.  There is also some 
recent work on mathematically formalizing connectionist symbolic 
computations, for example:

  Pinkas, G. (1995) 
   "Reasoning, nonmonotonicity and learning in connectionist networks 
   that capture propositional knowledge", 
   Artificial Intelligence 77, 203-247.

I've been developing a formal semantic model for neural networks---
a mathematical model of concept representation in connectionist 
memories and learning by connectionist systems.  I've found that such 
a model requires an explicit semantic model, in which the "universe" 
of things the concepts are about receives as much attention in the 
mathematical model as the concepts themselves.  I think this is essential 
for resolving the ambiguities that crop up in discussions about symbolic 
processing and neural networks.  For example, it allows me to make some 
statements about issues brought up in the discussion of connectionist symbol 
processing.  Whether you agree with me or not, I'd certainly be interested 
in further discussion.

I've been concentrating on geometric logic and its model theory 
(different sense of the word "model"), mostly (so far) in the form of 
point-set topology.  The set-theoretic form is the simple version of 
the semantics of geometric logic.  It's really a categorical logic, so 
the full semantic model requires category theory.  Geometric logic is 
very strict in what it takes to assert a statement.  It is meant to 
represent observational statements, ones whose positive instances can 
be observed.  Topology is commonly studied in its point-set version, 
but the categorical form is better for formal semantics.  Having said 
that, I'll stick with sets in the following.  Also, I'll refer to the 
models of a theory as its instances.

My main finding to date is that a sound and complete rule base---
one in which the rules are actually valid for all the data and which 
has all the rules---has the semantics of a continuous function between 
the right topological spaces.  This requires some explaining, not only 
the "all the rules" and "right topological spaces" business, but also 
the statement about continuous functions.  For most, continuity means 
continuous functions on the real or complex numbers, or on vector spaces 
over same.  But those are a special case: the topologies and continuous 
functions I work with also involve spaces normally represented by discrete-
valued variables.  Continuity is really the mathematical way of saying 
"similar things map to similar things".  My first publication on this has 
some details (a more extensive treatment is to appear):

  M. J. Healy, Continuous Functions and Neural Network Semantics, 
   Proc. of Second World Cong. of Nonlinear Analysts (WCNA96), 
   Athens.  In Nonlinear Analysis, Volume 30, issue #3, 1997. 
   pp. 1335-1341

In geometric logic, a continuous function is two functions---a mapping from 
the instances (worlds, states, models) of theory A to the instances of theory 
B, and an associated mapping from the formulas of theory B to those of theory 
A.  Without going into too much detail, the topological connection is that a 
set of things that satisfy a formula (instances of the formula) form an open 
set in a particular topological space.  In the applications we often deal 
with, the training examples for a neural network are instances of a theory 
of the domain of application.  A formula in the theory expresses a property 
of or a relation between instances.  The instances are called "points" of 
the space, and the corresponding open set contains the points.  Finite 
conjunctions of formulas correspond to the finite intersections of open 
sets, and we allow arbitrary disjunctions, corresponding to the unions 
(arbitrary disjunctions are appropriate for observations).  There is a 
little more to it, because instead of the usual set unions we use unions 
over directed sets of subsets.  A valid and complete rule base can be 
refined to have the form of the formula mapping half of a continuous 
function from space A (theory A and its instances, with the induced 
topology) to space B (as a special case, the two spaces can be the same, 
or can have the same points).  Correspondingly, the open set for the 
antecedent of each refined rule is the inverse image under the point 
mapping of the open set for its consequent.  The refinement is obtained 
by forming the disjunction of all antecedents with the same consequent.  
The points mapping of the continuous function expresses the fact that 
every instance of the antecedent of a rule must map to an instance of 
the consequent of the rule, where the rule expresses truth-preservation.  

This mathematical model relates directly to the work being done in rule 
extraction, even with the many different approaches and neural network 
models in use.  Furthermore, I think it supports intuition, but I'd like 
you to be the judge.  One thing I'd like to add is that the topological 
model is consistent with probabilistic modeling and fuzzy logic.  The 
focus of this model really is upon semantics (or semiotics, if this is 
regarded as a model of sign-meaning relationships; I am mostly interested 
in the semantics).

Finally, I'd like to comment upon an important issue that has appeared in 
this thread---how important is the input space topology (metric, structure, 
theory, ... )?  I apologize if I've misiniterpreted any of what's been said, 
but here's my two cents.  

I don't think there is always a single "right" topological space.  The 
form of the data and how you handle it depends on what assumptions were 
made in working up the data for presentation as training (or testing) 
examples for the neural network.  Formalizing, I would say that the 
assumptions yield a theory about the domain of inputs, and this in turn 
yields a topology.  The topology does not have to be induced by a metric, 
not unless you make the assumption that distances between data points (in 
the metric sense) are valid.  For example, if you have applied a Euclidean 
clustering algorithm, you have implicitly made the assumption that the 
Euclidean metric is the semantics of the application items that are being 
encoded as data items.  What you get will be partly a result of that 
assumption.

But what you get also depends upon the assumptions underlying the algorithm.  
If the algorithm is really coming up with anything new, it will impose a new 
topology upon the data.  For example, a Euclidean clustering algorithm 
doesn't return all the open balls of the Euclidean-induced topology---it 
returns a finite set of class representations.  However, you'd like the 
final result to have some connection with your original interpretation of 
the data, since after all that was your way of seeing the application.  
So, it would be nice to have continuity, meaning that every instance of 
the input domain theory maps to an instance of the output theory in a 
manner consistent with the formulas (open sets) in both theories 
(topologies).  An advantage of the continuous function model here is that 
it tells me what I need to do: Modify the topologies (hence the theories) 
so that the inverse of an open set is open.  Of course, that's only a 
mathematical abstraction, and the question is still So what do you do?  

Well, I don't think you want to discard the input topology outright, for 
the reason I gave: It is the theory that gave you your data.  But you can 
modify it if need be.  If you assumed a metric and your final classification 
result (assuming you were doing classification or clustering) has an output 
domain consisting of metric-induced open sets, you need do nothing.  You can 
get more information by going to a more sophisticated pair of spaces by an 
embedding, but at least your algorithm gave you classes that projected back 
into the input topology, so you're OK there.  However, for many data and 
machine models, the input space (or the output space or both) won't accept 
the projections gracefully, so you need to do something.  One thing you can 
do is suppose you have the wrong learning algorithm and try to find one that 
will automatically yield continuity without changing the input space.  
Another thing you can do is suppose that the algorithm is telling you 
something about the input data space, and modify the topology as needed 
to accept the new open sets (extend the sub-base of the topology).  See 
how your application looks now!

How you proceed from here depends upon what kinds of properties you want to 
study.  What I'm proposing is that the topological model is good as a guide 
for further work because of its mathematical precision in semantic modeling. 

Regards,
Mike Healy

--

===========================================================================
                                         e	     
Michael J. Healy                          A
                                  FA ----------> GA
(425)865-3123                     |              |
FAX(425)865-2964                  |              |
                               Ff |              | Gf
c/o The Boeing Company            |              |   
PO Box 3707  MS 7L-66            \|/            \|/
Seattle, WA 98124-2207            '              '
USA                               FB ----------> GB
-or for priority mail-                   e             "I'm a natural man."
2760 160th Ave SE  MS 7L-66               B
Bellevue, WA 98008
USA

michael.j.healy at boeing.com          -or-            mjhealy at u.washington.edu

============================================================================