International Comparison of Learning Algorithms: MONK

Tue Jun 6 06:52:25 EDT 2006

Dear Connectionists:

This is an announcement of a forthcoming Technical Report. In the last
months, we did run a first worldwide comparison of some major learning
algorithms on three simple classification problems.  Two connectionist
learning algorithms were also compared, namely plain Backpropagation and
Cascade Correlation. Although a) the problems were taken from a domain
which supported (some of the) symbolical algorithms, and b) this
comparison is considerably un-biased since the testers did really know
the methods they tested, connectionist techniques performed surprisingly
well.  The final report will be available shortly after NIPS conference,
but everyone who is interested in this comparison and attends the
conference may feel free to contact me at NIPS. I will bring a few
pre-prints.

                                                  Sebastian Thrun
                                                  thrun at cs.cmu.edu

----------------------------------------------------------------------
----------------------------------------------------------------------

                         The MONK's Problems

        A Performance Comparison of Different Learning Algorithms

 S. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong,
   S. Dzeroski, S.E. Fahlman, D. Fisher, R. Hamann, K. Kaufman, S. Keller,
    I. Kononenko, J. Kreuziger, R.S. Michalski, T. Mitchell, P. Pachowicz,
    Y. Reich, H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, and J. Zhang

This report summarizes a comparison of different learning techniques which was
performed at the 2nd European Summer School on Machine Learning, held in
Belgium during summer 1991.  A variety of symbolic and non-symbolic learning
techniques - namely AQ17-DCI, AQ17-HCI, AQ17-FCLS, AQ14-NT, AQ15-GA, Assistant
Professional, mFOIL, ID5R, IDL, ID5R-hat, TDIDT, ID3, AQR, CN2, CLASSWEB,
PRISM, Backpropagation, and Cascade Correlation - are compared on three
classification problems, the MONK's problems.

The MONK's problems are derived from a domain in which each training example is
represented by six discrete-valued attributes.  Each problem involves learning
a binary function defined over this domain, from a sample of training examples
of this function.  Experiments were performed with and without noise in the
training examples.

One significant characteristic of this comparison is that it was performed by a
collection of researchers, each of whom was an advocate of the technique they
tested (often they were the creators of the various methods).  In this sense,
the results are less biased than in comparisons performed by a single person
advocating a specific learning method, and more accurately reflect the
generalization behavior of the learning techniques as applied by knowledgeable
users.

----------------------------------------------------------------------

                      ================================
                        RESULTS - A SHORT OVERVIEW
                      ================================

Problem:                 MONK-1       MONK-2     MONK-3(noisy)

AQ17-DCI                100.0%       100.0%       94.2%
AQ17-HCI                100.0%        93.1%      100.0%
AQ17-FCLS                             92.6%       97.2% 
AQ14-NT                                          100.0%
AQ15-GA                 100.0%        86.8%      100.0%

                           (by J. Bala,  E. Bloedorn, K. De Jong, K. Kaufman, 
                                     R.S. Michalski, P. Pachowicz, H. Vafaie,
                                                       J. Wnek, and J. Zhang)

Assistant Professional  100.0%        81.25%     100.0%   

                                 (by B. Cestnik, I. Kononenko, and I. Bratko)

mFOIL                   100.0%        69.2%      100.0%   
                                                             (by S. Dzeroski)

ID5R                     81.7%        61.8%  
IDL                      97.2%        66.2%  
ID5R-hat                 90.3%        65.7% 
TDIDT                    75.7%        66.7%  
                                                         (by W. Van de Velde)

ID3                      98.6%        67.9%       94.4%   
ID3, no windowing        83.2%        69.1%       95.6%   
ID5R                     79.7%        69.2%       95.2%   
AQR                      95.9%        79.7%       87.0%   
CN2                     100.0%        69.0%       89.1%   
CLASSWEB 0.10            71.8%        64.8%       80.8% 
CLASSWEB 0.15            65.7%        61.6%       85.4% 
CLASSWEB 0.20            63.0%        57.2%       75.2% 

                                  (by J. Kreuziger, R. Hamann, and W. Wenzel)

PRISM                    86.3%        72.7%       90.3%   
                                                               (by S. Keller)

Backpropagation         100.0%       100.0%       93.1%   
                                                                (by S. Thrun)

Cascade Correlation     100.0%       100.0%       97.2%   
                                                            (by S.E. Fahlman)

----------------------------------------------------------------------
----------------------------------------------------------------------

                      ================================
                              TABLE OF CONTENTS
                      ================================

1		The MONK's Comparison Of Learning Algorithms 
		-- Introduction and Survey     1
  1.1		The problem     2
  1.2		Visualization     2
2		Applying Various AQ Programs to the MONK's Problems:
		Results and Brief Description of the Methods     7
  2.1		Introduction     8
  2.2		Results for the 1st problem (M1)     9
    2.2.1	Rules obtained by AQ17-DCI     9
    2.2.2	Rules obtained by AQ17-HCI     10
  2.3		Results for the 2nd problem (M2)     11
    2.3.1	Rules obtained by AQ17-DCI     11
    2.3.2	Rules obtained by AQ17-HCI     11
    2.3.3	Rules obtained by AQ17-FCLS     13
  2.4		Results for the 3rd problem (M3)     15
    2.4.1	Rules obtained by AQ17-HCI     15
    2.4.2	Rules obtained by AQ14-NT     16
    2.4.3	Rules obtained by AQ17-FCLS     16
    2.4.4	Rules obtained by AQ15-GA     17
  2.5		A Brief Description of the Programs and Algorithms     18
    2.5.1	AQ17-DCI (Data-driven constructive induction)     18
    2.5.2	AQ17-FCLS (Flexible concept learning)     19
    2.5.3	AQ17-HCI (Hypothesis-driven constructive induction)     19
    2.5.4	AQ14-NT (noise-tolerant learning from engineering data)     20
    2.5.5	AQ15-GA (AQ15 with attribute selection by a 
		genetic algorithm)     20
    2.5.6	The AQ Algorithm that underlies the programs     21
3		The Assistant Professional Inductive Learning System: 
       		MONK's Problems     23
  3.1		Introduction     24
  3.2		Experimental results     24
  3.3		Discussion     25
  3.4		Literature     25
  3.5		Resulting Decision Trees     26
4		mFOIL on the MONK's Problems     29
  4.1		Description     30
  4.2		Set 1     31
  4.3		Set 2     31
  4.4		Set 3     32
5		Comparison of Decision Tree-Based Learning Algorithms 
		on the MONK's Problems     33
  5.1		IDL: A Brief Introduction     34
    5.1.1	Introduction     34
    5.1.2	Related Work     35
    5.1.3	Conclusion     36
  5.2		Experimental Results     40
    5.2.1	ID5R on test set 1     43
    5.2.2	IDL on test set 1     43
    5.2.3	ID5R-HAT on test set 1     44
    5.2.4	TDIDT on test set 1     44
    5.2.5	ID5R on test set 2     45
    5.2.6	IDL on test set 2     46
    5.2.7	TDIDT on test set 2     48
    5.2.8	TDIDT on test set 1     49
    5.2.9	ID5R-HAT on test set 2     50
  5.3		Classification diagrams     52
  5.4		Learning curves     56
6		Comparison of Inductive Learning Programs     59
  6.1		Introduction     60
  6.2		Short description of the algorithms     60
    6.2.1	ID3     60
    6.2.2	ID5R     61
    6.2.3	AQR     61
    6.2.4	CN2     62
    6.2.5	CLASSWEB     62
  6.3		Results     63
    6.3.1	Training Time     63
    6.3.2	Classifier Results     64
  6.4		Conclusion     68
  6.5		Classification diagrams     69
7		Documentation of Prism -- an Inductive Learning 
		Algorithm     81
  7.1		Short Description     82
  7.2		Introduction     82
  7.3		PRISM: Entropy versus Information Gain     82
    7.3.1	Maximizing the information gain     82
    7.3.2	Trimming the tree     82
  7.4		The Basic Algorithm     83
  7.5		The Use of Heuristics     84
  7.6		General Considerations and a Comparison with ID3     84
  7.7		Implementation     84
  7.8		Results on Running PRISM on the MONK's Test Sets     85
    7.8.1	Test Set 1 -- Rules     86
    7.8.2	Test Set 2 -- Rules     87
    7.8.3	Test Set 3 -- Rules     90
  7.9		Classification diagrams     92
8		Backpropagation on the MONK's problems     95
  8.1		Introduction     96
  8.2		Classification diagrams     97
  8.3		Resulting weight matrices     99
9		The Cascade-Correlation Learning Algorithm on the 
		MONK's Problems     101
  9.1		The Cascade-Correlation algorithm     102
  9.2		Results     103
  9.3		Classification diagrams     106