Rule Extraction From ANNs - AISB96 Workshop
Robert Andrews
robert at fit.qut.edu.au
Fri Oct 20 01:55:04 EDT 1995
=============================================================
FIRST CALL FOR PAPERS
AISB-96 WORKSHOP
Society for the Study of Artificial Intelligence
and Simulation of Behaviour (SSAISB)
University of Sussex,
Brighton, England
April 2, 1996
--------------------------------------------
RULE-EXTRACTION FROM TRAINED NEURAL NETWORKS
--------------------------------------------
Robert Andrews
Neurocomputing Research Centre
Queensland University of Technology
Brisbane 4001 Queensland, Australia
Phone: +61 7 864-1656
Fax: +61 7 864-1969
E-mail: robert at fit.qut.edu.au
Joachim Diederich
Neurocomputing Research Centre
Queensland University of Technology
Brisbane 4001 Queensland, Australia
Phone: +61 7 864-2143
Fax: +61 7 864-1801
E-mail: joachim at fit.qut.edu.au
Lee Giles
NEC Research Institute
4 Independence Way
Princeton, NJ 08540
The objective of the workshop is to provide a discussion
platform for researchers interested in Artificial Neural
Networks (ANNs), Artificial Intelligence (AI) and Cognitive
Science. The workshop should be of considerable interest to
computer scientists and engineers as well as to cognitive
scientists and people interested in ANN applications which
require a justification of a classification or inference.
INTRODUCTION
It is becoming increasingly apparent that without some form
of explanation capability, the full potential of trained
Artificial Neural Networks may not be realised. The problem
is an inherent inability to explain in a comprehensible
form, the process by which a given decision or output
generated by an ANN has been reached.
For Artificial Neural Networks to gain a even wider degree
of user acceptance and to enhance their overall utility as
learning and generalisation tools, it is highly desirable if
not essential that an `explanation' capability becomes an
integral part of the functionality of a trained ANN. Such a
requirement is mandatory if, for example, the ANN is to be
used in what are termed as `safety critical' applications
such as airlines and power stations. In these cases it is
imperative that a system user be able to validate the output
of the Artificial Neural Network under all possible input
conditions. Further the system user should be provided with
the capability to determine the set of conditions under
which an output unit within an ANN is active and when it is
not, thereby providing some degree of transparency of the
ANN solution.
Craven & Shavlik (1994) define the rule-extraction from
neural networks task as follows: "Given a trained neural
network and the examples used to train it, produce a concise
and accurate symbolic description of the network." The
following discussion of the importance of rule-extraction
algorithms is based on this definition.
THE IMPORTANCE OF RULE-EXTRACTION ALGORITHMS
Since rule extraction from trained Artificial Neural
Networks comes at a cost in terms of resources and
additional effort, an early imperative in any discussion is
to delineate the reasons why rule extraction is an
important, if not mandatory, extension of conventional ANN
techniques. The merits of including rule extraction
techniques as an adjunct to conventional Artificial Neural
Network techniques include:
Data exploration and the induction of scientific theories
Over time neural networks have proven to be extremely
powerful tools for data exploration with the capability to
discover previously unknown dependencies and relationships
in data sets. As Craven and Shavlik (1994) observe, `a
(learning) system may discover salient features in the input
data whose importance was not previously recognised.'
However, even if a trained Artificial Neural Network has
learned interesting and possibly non-linear relationships,
these relationships are encoded incomprehensibly as weight
vectors within the trained ANN and hence cannot easily serve
the generation of scientific theories. Rule-extraction
algorithms significantly enhance the capabilities of ANNs to
explore data to the benefit of the user.
Provision of a `user explanation' capability
Experience has shown that an explanation capability is
considered to be one of the most important functions
provided by symbolic AI systems. In particular, the salutary
lesson from the introduction and operation of Knowledge
Based systems is that the ability to generate even limited
explanations (in terms of being meaningful and coherent) is
absolutely crucial for the user-acceptance of such systems.
In contrast to symbolic AI systems, Artificial Neural
Networks have no explicit declarative knowledge
representation. Therefore they have considerable difficulty
in generating the required explanation structures. It is
becoming increasingly apparent that the absence of an
`explanation' capability in ANN systems limits the
realisation of the full potential of such systems and it is
this precise deficiency that the rule extraction process
seeks to redress.
Improving the generalisation of ANN solutions
Where a limited or unrepresentative data set from the
problem domain has been used in the ANN training process, it
is difficult to determine when generalisation can fail even
with evaluation methods such as cross-validation. By being
able to express the knowledge embedded within the trained
Artificial Neural Network as a set of symbolic rules, the
rule-extraction process may provide an experienced system
user with the capability to anticipate or predict a set of
circumstances under which generalisation failure can occur.
Alternatively the system user may be able to use the
extracted rules to identify regions in input space which are
not represented sufficiently in the existing ANN training
set data and to supplement the data set accordingly.
A CLASSIFICATION SCHEME FOR RULE EXTRACTION ALGORITHMS
The method of classification proposed here is in terms of:
(a) the expressive power of the extracted rules; (b) the
`translucency' of the view taken within the rule extraction
technique of the underlying Artificial Neural Network units;
(c) the extent to which the underlying ANN incorporates
specialised training regimes; (d) the `quality' of the
extracted rules; and (e) the algorithmic `complexity' of the
rule extraction/rule refinement technique.
The `translucency' dimension of classification is of
particular interest. It is designed to reveal the
relationship between the extracted rules and the internal
architecture of the trained ANN. It comprises two basic
categories of rule extraction techniques viz
`decompositional' and `pedagogical' and a third - labelled
as `eclectic' - which combines elements of the two basic
categories.
The distinguishing characteristic of the `decompositional'
approach is that the focus is on extracting rules at the
level of individual (hidden and output) units within the
trained Artificial Neural Network. Hence the `view' of the
underlying trained Artificial Neural Network is one of
`transparency'. The translucency dimension - `pedagogical'
is given to those rule extraction techniques which treat the
trained ANN as a `black box' ie the view of the underlying
trained Artificial Neural Network is `opaque'. The core idea
in the `pedagogical' approach is to `view rule extraction as
a learning task where the target concept is the function
computed by the network and the input features are simply
the network's input features'. Hence the `pedagogical'
techniques aim to extract rules that map inputs directly
into outputs. Where such techniques are used in conjunction
with a symbolic learning algorithm, the basic motif is to
use the trained Artificial Neural Network to generate
examples for the learning algorithm.
As indicated above the proposed third category in this
classification scheme are composites which incorporate
elements of both the `decompositional' and `pedagogical' (or
`black-box') rule extraction techniques. This is the
`eclectic' group. Membership in this category is assigned to
techniques which utilise knowledge about the internal
architecture and/or weight vectors in the trained Artificial
Neural Network to complement a symbolic learning algorithm.
An ancillary problem to that of rule extraction from trained
ANNs is that of using the ANN for the `refinement' of
existing rules within symbolic knowledge bases. The goal in
rule refinement is to use a combination of ANN learning and
rule extraction techniques to produce a `better' (ie a
`refined') set of symbolic rules which can then be applied
back in the original problem domain. In the rule refinement
process, the initial rule base (ie what may be termed `prior
knowledge') is inserted into an ANN by programming some of
the weights. The rule refinement process then proceeds in
the same way as normal rule extraction viz (1) train the
network on the available data set(s); and (2) extract (in
this case the `refined') rules - with the proviso that the
rule refinement process may involve a number of iterations
of the training phase rather than a single pass.
DISCUSSION POINTS FOR WORKSHOP PARTICIPANTS
1. Decompositional vs. learning approaches to rule-
extraction from ANNs - What are the advantages and
disadvantages w.r.t. performance, solution time,
computational complexity, problem domain etc. Are
decompositional approaches always dependent on a certain ANN
architecture?
2. Rule-extraction from trained neural networks vs. symbolic
induction. What are the relative strength and weaknesses?
3. What are the most important criteria for rule quality?
4. What are the most suitable representation languages for
extracted rules? How does the extraction problem vary
across different languages?
5. What is the relationship between rule-initialisation
(insertion) and rule-extraction? For instance, are these
equivalent or complementary processes? How important is
rule-refinement by neural networks?
6. Rule-extraction from trained neural networks and
computational learning theory. Is generating a minimal
rule-set which mimics an ANN a hard problem?
7. Does rule-initialisation result in faster learning and
improved generalisation?
8. To what extent are existing extraction algorithms limited
in their applicability? How can these limitations be
addressed?
9. Are there any interesting rule-extraction success
stories? That is, problem domains in which the application
of rule-extraction methods has resulted in an interesting or
significant advance.
ACKNOWLEDGEMENT
Many thanks to Mark Craven, and Alan Tickle
for comments on earlier versions of this proposal.
RELEVANT PUBLICATIONS
Andrews, R Diederich, J and Tickle, A.B.: A survey and
critique of techniques for extracting rules from trained
artificial neural networks. To appear: Knowledge-Based
Systems, 1995 (ftp:fit.qut.edu.au//pub/NRC/ps/QUTNRC-95-01-
02.ps.Z)
Andrews, R and Geva, S: `Rule extraction from a constrained
error back propagation MLP' Proc. 5th Australian Conference
on Neural Networks Brisbane Queensland (1994) pp 9-12
Andrews, R and Geva, S `Inserting and extracting knowledge
from constrained error back propagation networks' Proc. 6th
Australian Conference on Neural Networks Sydney NSW (1995)
Craven, M W and Shavlik , J W `Using sampling and queries to
extract rules from trained neural networks' Machine
Learning: Proceedings of the Eleventh International
Conference (San Francisco CA) (1994) (in print)
Diederich, J `Explanation and artificial neural networks'
International Journal of Man-Machine Studies Vol 37 (1992)
pp 335-357
Fu, L M `Neural networks in computer intelligence' McGraw
Hill (New York) (1994)
Fu, L M `Rule generation from neural networks' IEEE
Transactions on Systems, Man, and Cybernetics Vol 28 No 8
(1994) pp 1114-1124
Gallant, S `Connectionist expert systems' Communications of
the ACM Vol 31 No 2 (February 1988) pp 152-169
Giles, C L and Omlin C W `Rule refinement with recurrent
neural networks' Proc. of the IEEE International Conference
on Neural Networks (San Francisco CA) (March 1993) pp
801-806
Giles, C L and Omlin C W `Extraction, insertion, and
refinement of symbolic rules in dynamically driven recurrent
networks' Connection Science Vol 5 Nos 3 and 4 (1993) pp
307-328
Giles, C L, Miller, C B, Chen, D, Chen, H, Sun, G Z and Lee,
Y C `Learning and extracting finite state automata with
second-order recurrent neural networks' Neural Computation
Vol 4 (1992) pp 393-405
Hayward, R.; Pop, E.; Diederich, J.: Extracting Rules for
Grammar Recognition from Cascade-2 Networks. Proceeding,
IJCAI-95 Workshop on Machine Learning and Natural Language
Processing.
McMillan, C, Mozer, M C and Smolensky, P `The connectionist
scientist game: rule extraction and refinement in a neural
network' Proc. of the Thirteenth Annual Conference of the
Cognitive Science Society (Hillsdale NJ) 1991
Omlin, C W, Giles, C L and Miller, C B `Heuristics for the
extraction of rules from discrete time recurrent neural
networks' Proc. of the International Joint Conference on
Neural Networks (IJCNN'92) (Baltimore MD) Vol 1 (1992) pp 33
Pop, E, Hayward, R, and Diederich, J `RULENEG: extracting
rules from a trained ANN by stepwise negation' QUT NRC
(December 1994)
Sestito, S and Dillon, T `Automated knowledge acquisition of
rules with continuously valued attributes' Proc. 12th
International Conference on Expert Systems and their
Applications (AVIGNON'92) (Avignon France) (May 1992) pp
645-656.
Sestito, S and Dillon, T `Automated knowledge acquisition'
Prentice Hall (Australia) (1994)
Thrun, S B `Extracting Provably Correct Rules From
Artificial Neural Networks' Technical Report IAI-TR-93-5
Institut fur Informatik III Universitat Bonn (1994)
Tickle, A B, Orlowski, M, and Diederich, J `DEDEC: decision
detection by rule extraction from neural networks' QUT NRC
(September 1994)
Towell, G and Shavlik, J `The Extraction of Refined Rules
Tresp, V, Hollatz, J and Ahmad, S `Network Structuring and
Training Using Rule-based Knowledge' Advances In Neural
Information Processing Vol 5 (1993) pp871-878
SUBMISSION OF WORKSHOP EXTENDED ABSTRACTS/PAPERS
Authors are invited to submit 3 copies of either an extended
abstract or full paper relating to one of the topic areas
listed above. Papers should be written in English in single
column format and should be limited to no more than eight,
(8) sides of A4 paper including figures and references.
Centered at the top of the first page should be complete
title, author name(s), affiliation(s), and mailing and email
address(es), followed by blank space, abstract(15-20 lines),
and text. Please include the following information in an
accompanying cover letter:
Full title of paper, presenting author's name, address, and
telephone and fax numbers, authors e-mail address.
Submission Deadline is January 15th,1996 with notification
to authors by 31st January,1996.
For further information, inquiries, and paper submissions
please contact:
Robert Andrews
Queensland University of Technology
GPO Box 2434 Brisbane Q. 4001. Australia.
phone +61 7 864-1656
fax +61 7 864-1969
email robert at fit.qut.edu.au
More information about the AISB-96 workshop series is
available from:
ftp: ftp.cogs.susx.ac.uk
pub/aisb/aisb96
WWW: (http://www.cogs.susx.ac.uk/aisb/aisb96)
WORKSHOP PARTICIPATION CHARGES
The workshop fees are listed below. Note that these fees
include lunch. Student charges are shown in brackets.
AISB NON-ASIB
MEMBERS MEMBERS
1 Day Workshop 65 (45) 80
LATE REGISTRATION: 85 (60) 100
PROGRAM COMMITTEE MEMBERS
R. Andrews, Queensland University of Technology
A. Tickle, Queensland University of Technology
S. Sestito, DSTO, Australia
J. Shavlik, University of Wisconsin
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Mr Robert Andrews
School of Information Systems robert at fit.qut.edu.au
Faculty of Information Technology R.Andrews at qut.edu.au
Queensland University of Technology +61 7 864 1656 (voice)
GPO Box 2434 _--_|\ +61 7 864 1969 (fax)
Brisbane Q 4001 / QUT
Australia \_.--._/
v
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
More information about the Connectionists
mailing list