Connectionist symbolic processing
Michael J. Healy 425-865-3123
michael.j.healy at boeing.com
Mon Aug 24 14:25:57 EDT 1998
I've been doing research in connectionist symbol processing for some time,
so I'd like to contribute something to the discussion. I'll try to keep
it brief and just say what I'm ready to say. I am not prepared to address
Michael Arbib's question about real brain function at this time, although
it's possible to make a connection.
First, here are some references to the literature of rule extraction with
neural networks, which I have been following. The list omits a lot of
good work, but is meant to be representative:
Andrews, R., Diederich, J. & Tickle, A. B. (1995)
"Survey and critique of techniques for extracting rules
from trained artificial neural networks", Knowledge-Based Systems,
vol. 8, no. 6, 373-389.
Craven, M. W. & Shavlik, J. W. (1993)
"Learning Symbolic Rules using Artificial Neural Networks",
Proceedings of the 10th International Machine Learning Conference,
Amherst, MA. 73-80. San Mateo, CA:Morgan Kaufmann.
Healy, M. J. & Caudell, T. P. (1997)
"Acquiring Rule Sets as a Product of Learning in a Logical Neural
Architecture", IEEE Transactions on Neural Networks,
vol. 8, no. 3, 461-475.
Kasabov, N. K. (1996)
"Adaptable neuro production systems",
Neurocomputing, vol. 13, 95-117.
Setiono, R. (1997)
"Extracting Rules from Neural Networks by Pruning and Hidden-Unit
Splitting", Neural Computation, vol. 9, no. 1, 205-225.
Sima, J. (1995)
"Neural Expert Systems", Neural Networks, vol. 8, 261-271.
Most of the work is empirical, but is accompanied by analyses of the
practical aspects of extracting knowledge from data and of incorporating
pre-existing knowledge along with the extracted knowledge. The supposed
knowledge here is mostly in the form of if-then rules which, to greater
or lesser extent, represent propositional statements. There is also some
recent work on mathematically formalizing connectionist symbolic
computations, for example:
Pinkas, G. (1995)
"Reasoning, nonmonotonicity and learning in connectionist networks
that capture propositional knowledge",
Artificial Intelligence 77, 203-247.
I've been developing a formal semantic model for neural networks---
a mathematical model of concept representation in connectionist
memories and learning by connectionist systems. I've found that such
a model requires an explicit semantic model, in which the "universe"
of things the concepts are about receives as much attention in the
mathematical model as the concepts themselves. I think this is essential
for resolving the ambiguities that crop up in discussions about symbolic
processing and neural networks. For example, it allows me to make some
statements about issues brought up in the discussion of connectionist symbol
processing. Whether you agree with me or not, I'd certainly be interested
in further discussion.
I've been concentrating on geometric logic and its model theory
(different sense of the word "model"), mostly (so far) in the form of
point-set topology. The set-theoretic form is the simple version of
the semantics of geometric logic. It's really a categorical logic, so
the full semantic model requires category theory. Geometric logic is
very strict in what it takes to assert a statement. It is meant to
represent observational statements, ones whose positive instances can
be observed. Topology is commonly studied in its point-set version,
but the categorical form is better for formal semantics. Having said
that, I'll stick with sets in the following. Also, I'll refer to the
models of a theory as its instances.
My main finding to date is that a sound and complete rule base---
one in which the rules are actually valid for all the data and which
has all the rules---has the semantics of a continuous function between
the right topological spaces. This requires some explaining, not only
the "all the rules" and "right topological spaces" business, but also
the statement about continuous functions. For most, continuity means
continuous functions on the real or complex numbers, or on vector spaces
over same. But those are a special case: the topologies and continuous
functions I work with also involve spaces normally represented by discrete-
valued variables. Continuity is really the mathematical way of saying
"similar things map to similar things". My first publication on this has
some details (a more extensive treatment is to appear):
M. J. Healy, Continuous Functions and Neural Network Semantics,
Proc. of Second World Cong. of Nonlinear Analysts (WCNA96),
Athens. In Nonlinear Analysis, Volume 30, issue #3, 1997.
pp. 1335-1341
In geometric logic, a continuous function is two functions---a mapping from
the instances (worlds, states, models) of theory A to the instances of theory
B, and an associated mapping from the formulas of theory B to those of theory
A. Without going into too much detail, the topological connection is that a
set of things that satisfy a formula (instances of the formula) form an open
set in a particular topological space. In the applications we often deal
with, the training examples for a neural network are instances of a theory
of the domain of application. A formula in the theory expresses a property
of or a relation between instances. The instances are called "points" of
the space, and the corresponding open set contains the points. Finite
conjunctions of formulas correspond to the finite intersections of open
sets, and we allow arbitrary disjunctions, corresponding to the unions
(arbitrary disjunctions are appropriate for observations). There is a
little more to it, because instead of the usual set unions we use unions
over directed sets of subsets. A valid and complete rule base can be
refined to have the form of the formula mapping half of a continuous
function from space A (theory A and its instances, with the induced
topology) to space B (as a special case, the two spaces can be the same,
or can have the same points). Correspondingly, the open set for the
antecedent of each refined rule is the inverse image under the point
mapping of the open set for its consequent. The refinement is obtained
by forming the disjunction of all antecedents with the same consequent.
The points mapping of the continuous function expresses the fact that
every instance of the antecedent of a rule must map to an instance of
the consequent of the rule, where the rule expresses truth-preservation.
This mathematical model relates directly to the work being done in rule
extraction, even with the many different approaches and neural network
models in use. Furthermore, I think it supports intuition, but I'd like
you to be the judge. One thing I'd like to add is that the topological
model is consistent with probabilistic modeling and fuzzy logic. The
focus of this model really is upon semantics (or semiotics, if this is
regarded as a model of sign-meaning relationships; I am mostly interested
in the semantics).
Finally, I'd like to comment upon an important issue that has appeared in
this thread---how important is the input space topology (metric, structure,
theory, ... )? I apologize if I've misiniterpreted any of what's been said,
but here's my two cents.
I don't think there is always a single "right" topological space. The
form of the data and how you handle it depends on what assumptions were
made in working up the data for presentation as training (or testing)
examples for the neural network. Formalizing, I would say that the
assumptions yield a theory about the domain of inputs, and this in turn
yields a topology. The topology does not have to be induced by a metric,
not unless you make the assumption that distances between data points (in
the metric sense) are valid. For example, if you have applied a Euclidean
clustering algorithm, you have implicitly made the assumption that the
Euclidean metric is the semantics of the application items that are being
encoded as data items. What you get will be partly a result of that
assumption.
But what you get also depends upon the assumptions underlying the algorithm.
If the algorithm is really coming up with anything new, it will impose a new
topology upon the data. For example, a Euclidean clustering algorithm
doesn't return all the open balls of the Euclidean-induced topology---it
returns a finite set of class representations. However, you'd like the
final result to have some connection with your original interpretation of
the data, since after all that was your way of seeing the application.
So, it would be nice to have continuity, meaning that every instance of
the input domain theory maps to an instance of the output theory in a
manner consistent with the formulas (open sets) in both theories
(topologies). An advantage of the continuous function model here is that
it tells me what I need to do: Modify the topologies (hence the theories)
so that the inverse of an open set is open. Of course, that's only a
mathematical abstraction, and the question is still So what do you do?
Well, I don't think you want to discard the input topology outright, for
the reason I gave: It is the theory that gave you your data. But you can
modify it if need be. If you assumed a metric and your final classification
result (assuming you were doing classification or clustering) has an output
domain consisting of metric-induced open sets, you need do nothing. You can
get more information by going to a more sophisticated pair of spaces by an
embedding, but at least your algorithm gave you classes that projected back
into the input topology, so you're OK there. However, for many data and
machine models, the input space (or the output space or both) won't accept
the projections gracefully, so you need to do something. One thing you can
do is suppose you have the wrong learning algorithm and try to find one that
will automatically yield continuity without changing the input space.
Another thing you can do is suppose that the algorithm is telling you
something about the input data space, and modify the topology as needed
to accept the new open sets (extend the sub-base of the topology). See
how your application looks now!
How you proceed from here depends upon what kinds of properties you want to
study. What I'm proposing is that the topological model is good as a guide
for further work because of its mathematical precision in semantic modeling.
Regards,
Mike Healy
--
===========================================================================
e
Michael J. Healy A
FA ----------> GA
(425)865-3123 | |
FAX(425)865-2964 | |
Ff | | Gf
c/o The Boeing Company | |
PO Box 3707 MS 7L-66 \|/ \|/
Seattle, WA 98124-2207 ' '
USA FB ----------> GB
-or for priority mail- e "I'm a natural man."
2760 160th Ave SE MS 7L-66 B
Bellevue, WA 98008
USA
michael.j.healy at boeing.com -or- mjhealy at u.washington.edu
============================================================================
More information about the Connectionists
mailing list