Rule Extraction From ANNs - AISB96 Workshop

Fri Oct 20 01:55:04 EDT 1995

=============================================================
                   FIRST CALL FOR PAPERS

                     AISB-96 WORKSHOP 
       Society for the Study of Artificial Intelligence
          and Simulation of Behaviour (SSAISB)

                  University of Sussex,
                    Brighton, England

                      April 2, 1996

        --------------------------------------------
        RULE-EXTRACTION FROM TRAINED NEURAL NETWORKS
        --------------------------------------------

                     Robert Andrews
               Neurocomputing Research Centre
            Queensland University of Technology
            Brisbane 4001 Queensland, Australia
                   Phone: +61 7 864-1656
                   Fax:   +61 7 864-1969
               E-mail: robert at fit.qut.edu.au

                     Joachim Diederich
               Neurocomputing Research Centre
            Queensland University of Technology
            Brisbane 4001 Queensland, Australia
                   Phone: +61 7 864-2143
                   Fax:   +61 7 864-1801
               E-mail: joachim at fit.qut.edu.au

                         Lee Giles
                   NEC Research Institute
                     4 Independence Way
                    Princeton, NJ 08540

The objective of the workshop is  to  provide  a  discussion
platform  for  researchers  interested  in Artificial Neural
Networks (ANNs), Artificial Intelligence (AI) and  Cognitive
Science.  The workshop should be of considerable interest to
computer scientists and engineers as well  as  to  cognitive
scientists  and  people interested in ANN applications which
require a justification of a classification or inference.

INTRODUCTION

It is becoming increasingly apparent that without some  form
of  explanation  capability,  the  full potential of trained
Artificial Neural Networks may not be realised. The  problem
is  an  inherent  inability  to  explain in a comprehensible
form, the process  by  which  a  given  decision  or  output
generated by an ANN has been reached.

For Artificial Neural Networks to gain a even  wider  degree
of  user  acceptance and to enhance their overall utility as
learning and generalisation tools, it is highly desirable if
not  essential  that  an `explanation' capability becomes an
integral part of the functionality of a trained ANN.  Such a
requirement  is  mandatory if, for example, the ANN is to be
used in what are termed as  `safety  critical'  applications
such  as  airlines  and power stations. In these cases it is
imperative that a system user be able to validate the output
of  the  Artificial  Neural Network under all possible input
conditions. Further the system user should be provided  with
the  capability  to  determine  the  set of conditions under
which an output unit within an ANN is active and when it  is
not,  thereby  providing  some degree of transparency of the
ANN solution.

Craven & Shavlik  (1994)  define  the  rule-extraction  from
neural  networks  task  as  follows: "Given a trained neural
network and the examples used to train it, produce a concise
and  accurate  symbolic  description  of  the  network." The
following discussion of the  importance  of  rule-extraction
algorithms is based on this definition.

THE IMPORTANCE OF RULE-EXTRACTION ALGORITHMS

Since  rule  extraction  from  trained   Artificial   Neural
Networks   comes  at  a  cost  in  terms  of  resources  and
additional effort, an early imperative in any discussion  is
to   delineate   the  reasons  why  rule  extraction  is  an
important, if not mandatory, extension of  conventional  ANN
techniques.    The   merits  of  including  rule  extraction
techniques as an adjunct to conventional  Artificial  Neural
Network techniques include:

Data exploration and the induction of scientific theories

Over time  neural  networks  have  proven  to  be  extremely
powerful  tools  for data exploration with the capability to
discover previously unknown dependencies  and  relationships
in  data  sets.  As  Craven  and  Shavlik (1994) observe, `a
(learning) system may discover salient features in the input
data   whose  importance  was  not  previously  recognised.'
However, even if a trained  Artificial  Neural  Network  has
learned  interesting  and possibly non-linear relationships,
these relationships are encoded incomprehensibly  as  weight
vectors within the trained ANN and hence cannot easily serve
the  generation  of  scientific  theories.   Rule-extraction
algorithms significantly enhance the capabilities of ANNs to
explore data to the benefit of the user.

Provision of a `user explanation' capability

Experience has  shown  that  an  explanation  capability  is
considered  to  be  one  of  the  most  important  functions
provided by symbolic AI systems. In particular, the salutary
lesson  from  the  introduction  and  operation of Knowledge
Based systems is that the ability to generate  even  limited
explanations  (in terms of being meaningful and coherent) is
absolutely crucial for the user-acceptance of such  systems.
In  contrast  to  symbolic  AI  systems,  Artificial  Neural
Networks   have   no    explicit    declarative    knowledge
representation.  Therefore they have considerable difficulty
in generating the required  explanation  structures.  It  is
becoming  increasingly  apparent  that  the  absence  of  an
`explanation'  capability  in   ANN   systems   limits   the
realisation  of the full potential of such systems and it is
this precise deficiency that  the  rule  extraction  process
seeks to redress.

Improving the generalisation of ANN solutions

Where a  limited  or  unrepresentative  data  set  from  the
problem domain has been used in the ANN training process, it
is difficult to determine when generalisation can fail  even
with  evaluation methods such as cross-validation.  By being
able to express the knowledge embedded  within  the  trained
Artificial  Neural  Network  as a set of symbolic rules, the
rule-extraction process may provide  an  experienced  system
user  with  the capability to anticipate or predict a set of
circumstances under which generalisation failure can  occur.
Alternatively  the  system  user  may  be  able  to  use the
extracted rules to identify regions in input space which are
not  represented  sufficiently  in the existing ANN training
set data and to supplement the data set accordingly.

A CLASSIFICATION SCHEME FOR RULE EXTRACTION ALGORITHMS

The method of classification proposed here is in  terms  of:
(a)  the  expressive  power  of the extracted rules; (b) the
`translucency' of the view taken within the rule  extraction
technique of the underlying Artificial Neural Network units;
(c) the extent to  which  the  underlying  ANN  incorporates
specialised  training  regimes;  (d)  the  `quality'  of the
extracted rules; and (e) the algorithmic `complexity' of the
rule extraction/rule refinement technique.

The  `translucency'  dimension  of  classification   is   of
particular   interest.   It   is   designed  to  reveal  the
relationship between the extracted rules  and  the  internal
architecture  of  the  trained  ANN.  It comprises two basic
categories    of    rule    extraction    techniques     viz
`decompositional'  and  `pedagogical' and a third - labelled
as `eclectic' - which combines elements  of  the  two  basic
categories.

The distinguishing characteristic of  the  `decompositional'
approach  is  that  the  focus is on extracting rules at the
level of individual (hidden and  output)  units  within  the
trained  Artificial  Neural Network. Hence the `view' of the
underlying trained  Artificial  Neural  Network  is  one  of
`transparency'.  The  translucency dimension - `pedagogical'
is given to those rule extraction techniques which treat the
trained  ANN  as a `black box' ie the view of the underlying
trained Artificial Neural Network is `opaque'. The core idea
in the `pedagogical' approach is to `view rule extraction as
a learning task where the target  concept  is  the  function
computed  by  the  network and the input features are simply
the  network's  input  features'.  Hence  the  `pedagogical'
techniques  aim  to  extract  rules that map inputs directly
into outputs.  Where such techniques are used in conjunction
with  a  symbolic  learning algorithm, the basic motif is to
use  the  trained  Artificial  Neural  Network  to  generate
examples for the learning algorithm.

As indicated above  the  proposed  third  category  in  this
classification   scheme  are  composites  which  incorporate
elements of both the `decompositional' and `pedagogical' (or
`black-box')   rule   extraction  techniques.  This  is  the
`eclectic' group. Membership in this category is assigned to
techniques   which  utilise  knowledge  about  the  internal
architecture and/or weight vectors in the trained Artificial
Neural Network to complement a symbolic learning algorithm.

An ancillary problem to that of rule extraction from trained
ANNs  is  that  of  using  the  ANN  for the `refinement' of
existing rules within symbolic knowledge bases. The goal  in
rule  refinement is to use a combination of ANN learning and
rule extraction techniques  to  produce  a  `better'  (ie  a
`refined')  set  of symbolic rules which can then be applied
back in the original problem domain. In the rule  refinement
process, the initial rule base (ie what may be termed `prior
knowledge') is inserted into an ANN by programming  some  of
the  weights.  The  rule refinement process then proceeds in
the same way as normal rule extraction  viz  (1)  train  the
network  on  the  available data set(s); and (2) extract (in
this case the `refined') rules - with the proviso  that  the
rule  refinement  process may involve a number of iterations
of the training phase rather than a single pass.

DISCUSSION POINTS FOR WORKSHOP PARTICIPANTS

1.  Decompositional  vs.  learning   approaches   to   rule-
extraction   from   ANNs  -  What  are  the  advantages  and
disadvantages    w.r.t.    performance,    solution    time,
computational    complexity,   problem   domain   etc.   Are
decompositional approaches always dependent on a certain ANN
architecture?

2. Rule-extraction from trained neural networks vs. symbolic
induction.  What are the relative strength and weaknesses?

3. What are the most important criteria for rule quality?

4. What are the most suitable representation  languages  for
extracted  rules?   How  does  the  extraction  problem vary
across different languages?

5. What  is  the  relationship  between  rule-initialisation
(insertion)  and  rule-extraction?  For  instance, are these
equivalent or  complementary  processes?  How  important  is
rule-refinement by neural networks?

6.  Rule-extraction  from  trained   neural   networks   and
computational  learning  theory.  Is  generating  a  minimal
rule-set which mimics an ANN a hard problem?

7. Does rule-initialisation result in  faster  learning  and
improved generalisation?

8. To what extent are existing extraction algorithms limited
in   their  applicability?  How  can  these  limitations  be
addressed?

9.  Are  there  any  interesting   rule-extraction   success
stories?  That  is, problem domains in which the application
of rule-extraction methods has resulted in an interesting or
significant advance.

ACKNOWLEDGEMENT

Many thanks to Mark Craven,  and  Alan  Tickle
for comments on earlier versions of this proposal.

RELEVANT PUBLICATIONS

Andrews, R Diederich, J  and  Tickle,  A.B.:  A  survey  and
critique  of  techniques  for  extracting rules from trained
artificial  neural  networks.  To  appear:   Knowledge-Based
Systems,  1995 (ftp:fit.qut.edu.au//pub/NRC/ps/QUTNRC-95-01-
02.ps.Z)

Andrews, R and Geva, S: `Rule extraction from a  constrained
error  back propagation MLP' Proc. 5th Australian Conference
on Neural Networks Brisbane Queensland (1994) pp 9-12

Andrews, R and Geva, S `Inserting and  extracting  knowledge
from  constrained error back propagation networks' Proc. 6th
Australian Conference on Neural Networks Sydney  NSW  (1995)

Craven, M W and Shavlik , J W `Using sampling and queries to
extract   rules   from   trained  neural  networks'  Machine
Learning:  Proceedings   of   the   Eleventh   International
Conference (San Francisco CA) (1994) (in print)

Diederich, J `Explanation and  artificial  neural  networks'
International  Journal  of Man-Machine Studies Vol 37 (1992)
pp 335-357

Fu, L M `Neural networks in  computer  intelligence'  McGraw
Hill (New York) (1994)

Fu,  L  M  `Rule  generation  from  neural  networks'   IEEE
Transactions  on  Systems,  Man, and Cybernetics Vol 28 No 8
(1994) pp 1114-1124

Gallant, S `Connectionist expert systems' Communications  of
the ACM Vol 31 No 2 (February 1988) pp 152-169

Giles, C L and Omlin C W  `Rule  refinement  with  recurrent
neural  networks' Proc. of the IEEE International Conference
on Neural  Networks  (San  Francisco  CA)  (March  1993)  pp
801-806

Giles, C  L  and  Omlin  C  W  `Extraction,  insertion,  and
refinement of symbolic rules in dynamically driven recurrent
networks' Connection Science Vol 5 Nos 3  and  4  (1993)  pp
307-328

Giles, C L, Miller, C B, Chen, D, Chen, H, Sun, G Z and Lee,
Y  C  `Learning  and  extracting  finite state automata with
second-order recurrent neural networks'  Neural  Computation
Vol 4 (1992) pp 393-405

Hayward, R.; Pop, E.; Diederich, J.:  Extracting  Rules  for
Grammar  Recognition  from  Cascade-2  Networks. Proceeding,
IJCAI-95 Workshop on Machine Learning and  Natural  Language
Processing.

McMillan, C, Mozer, M C and Smolensky, P `The  connectionist
scientist  game:  rule extraction and refinement in a neural
network' Proc. of the Thirteenth Annual  Conference  of  the
Cognitive Science Society (Hillsdale NJ) 1991

Omlin, C W, Giles, C L and Miller, C B `Heuristics  for  the
extraction  of  rules  from  discrete  time recurrent neural
networks' Proc. of the  International  Joint  Conference  on
Neural Networks (IJCNN'92) (Baltimore MD) Vol 1 (1992) pp 33

Pop, E, Hayward, R, and Diederich,  J  `RULENEG:  extracting
rules  from  a  trained  ANN  by  stepwise negation' QUT NRC
(December 1994)

Sestito, S and Dillon, T `Automated knowledge acquisition of
rules   with  continuously  valued  attributes'  Proc.  12th
International  Conference  on  Expert  Systems   and   their
Applications  (AVIGNON'92)  (Avignon  France)  (May 1992) pp
645-656.

Sestito, S and Dillon, T `Automated  knowledge  acquisition'
Prentice Hall (Australia) (1994)

Thrun,  S  B  `Extracting  Provably   Correct   Rules   From
Artificial  Neural  Networks'  Technical  Report IAI-TR-93-5
Institut fur Informatik III Universitat Bonn (1994)

Tickle, A B, Orlowski, M, and Diederich, J `DEDEC:  decision
detection  by  rule extraction from neural networks' QUT NRC
(September 1994)

Towell, G and Shavlik, J `The Extraction  of  Refined  Rules
Tresp, V, Hollatz, J and Ahmad, S `Network  Structuring  and
Training  Using  Rule-based  Knowledge'  Advances  In Neural
Information Processing Vol 5 (1993) pp871-878

SUBMISSION OF WORKSHOP EXTENDED ABSTRACTS/PAPERS

Authors are invited to submit 3 copies of either an extended
abstract  or  full paper  relating to one of the topic areas
listed above.  Papers should be written in English in single
column format  and should  be limited to no more than eight, 
(8) sides of A4 paper including figures and references.

Centered  at the  top of the  first  page should be complete 
title, author name(s), affiliation(s), and mailing and email 
address(es), followed by blank space, abstract(15-20 lines), 
and text.  Please  include  the following  information in an 
accompanying cover letter: 
Full title of paper, presenting author's name, address,  and
telephone and fax numbers, authors e-mail address.

Submission Deadline is January 15th,1996  with  notification 
to authors by 31st January,1996.

For further information,  inquiries,  and paper  submissions 
please contact:

	Robert Andrews
	Queensland University of Technology
        GPO Box 2434 Brisbane Q. 4001. Australia.
        phone  +61 7 864-1656
        fax    +61 7 864-1969
        email  robert at fit.qut.edu.au	

More  information  about  the  AISB-96  workshop  series is 
available from:

ftp:  ftp.cogs.susx.ac.uk 
      pub/aisb/aisb96 
WWW:  (http://www.cogs.susx.ac.uk/aisb/aisb96)

WORKSHOP PARTICIPATION CHARGES
The workshop fees are listed below. Note that these fees
include lunch. Student charges are shown in brackets.

                             AISB        NON-ASIB
                             MEMBERS     MEMBERS
1 Day Workshop               65  (45)       80
LATE REGISTRATION:           85  (60)      100

PROGRAM COMMITTEE MEMBERS

R. Andrews,  Queensland  University of Technology
A. Tickle, Queensland University of Technology
S. Sestito, DSTO, Australia
J. Shavlik, University of Wisconsin

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Mr Robert Andrews                          
School of Information Systems            robert at fit.qut.edu.au
Faculty of Information Technology        R.Andrews at qut.edu.au
Queensland University of Technology      +61 7 864 1656 (voice)
GPO Box 2434                  _--_|\     +61 7 864 1969 (fax)
Brisbane  Q 4001            /      QUT
Australia                   \_.--._/
                                  v
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=