International Comparison of Learning Algorithms: MONK
Sebastian.Thrun@B.GP.CS.CMU.EDU
Sebastian.Thrun at B.GP.CS.CMU.EDU
Tue Jun 6 06:52:25 EDT 2006
Dear Connectionists:
This is an announcement of a forthcoming Technical Report. In the last
months, we did run a first worldwide comparison of some major learning
algorithms on three simple classification problems. Two connectionist
learning algorithms were also compared, namely plain Backpropagation and
Cascade Correlation. Although a) the problems were taken from a domain
which supported (some of the) symbolical algorithms, and b) this
comparison is considerably un-biased since the testers did really know
the methods they tested, connectionist techniques performed surprisingly
well. The final report will be available shortly after NIPS conference,
but everyone who is interested in this comparison and attends the
conference may feel free to contact me at NIPS. I will bring a few
pre-prints.
Sebastian Thrun
thrun at cs.cmu.edu
----------------------------------------------------------------------
----------------------------------------------------------------------
The MONK's Problems
A Performance Comparison of Different Learning Algorithms
S. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong,
S. Dzeroski, S.E. Fahlman, D. Fisher, R. Hamann, K. Kaufman, S. Keller,
I. Kononenko, J. Kreuziger, R.S. Michalski, T. Mitchell, P. Pachowicz,
Y. Reich, H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, and J. Zhang
This report summarizes a comparison of different learning techniques which was
performed at the 2nd European Summer School on Machine Learning, held in
Belgium during summer 1991. A variety of symbolic and non-symbolic learning
techniques - namely AQ17-DCI, AQ17-HCI, AQ17-FCLS, AQ14-NT, AQ15-GA, Assistant
Professional, mFOIL, ID5R, IDL, ID5R-hat, TDIDT, ID3, AQR, CN2, CLASSWEB,
PRISM, Backpropagation, and Cascade Correlation - are compared on three
classification problems, the MONK's problems.
The MONK's problems are derived from a domain in which each training example is
represented by six discrete-valued attributes. Each problem involves learning
a binary function defined over this domain, from a sample of training examples
of this function. Experiments were performed with and without noise in the
training examples.
One significant characteristic of this comparison is that it was performed by a
collection of researchers, each of whom was an advocate of the technique they
tested (often they were the creators of the various methods). In this sense,
the results are less biased than in comparisons performed by a single person
advocating a specific learning method, and more accurately reflect the
generalization behavior of the learning techniques as applied by knowledgeable
users.
----------------------------------------------------------------------
================================
RESULTS - A SHORT OVERVIEW
================================
Problem: MONK-1 MONK-2 MONK-3(noisy)
AQ17-DCI 100.0% 100.0% 94.2%
AQ17-HCI 100.0% 93.1% 100.0%
AQ17-FCLS 92.6% 97.2%
AQ14-NT 100.0%
AQ15-GA 100.0% 86.8% 100.0%
(by J. Bala, E. Bloedorn, K. De Jong, K. Kaufman,
R.S. Michalski, P. Pachowicz, H. Vafaie,
J. Wnek, and J. Zhang)
Assistant Professional 100.0% 81.25% 100.0%
(by B. Cestnik, I. Kononenko, and I. Bratko)
mFOIL 100.0% 69.2% 100.0%
(by S. Dzeroski)
ID5R 81.7% 61.8%
IDL 97.2% 66.2%
ID5R-hat 90.3% 65.7%
TDIDT 75.7% 66.7%
(by W. Van de Velde)
ID3 98.6% 67.9% 94.4%
ID3, no windowing 83.2% 69.1% 95.6%
ID5R 79.7% 69.2% 95.2%
AQR 95.9% 79.7% 87.0%
CN2 100.0% 69.0% 89.1%
CLASSWEB 0.10 71.8% 64.8% 80.8%
CLASSWEB 0.15 65.7% 61.6% 85.4%
CLASSWEB 0.20 63.0% 57.2% 75.2%
(by J. Kreuziger, R. Hamann, and W. Wenzel)
PRISM 86.3% 72.7% 90.3%
(by S. Keller)
Backpropagation 100.0% 100.0% 93.1%
(by S. Thrun)
Cascade Correlation 100.0% 100.0% 97.2%
(by S.E. Fahlman)
----------------------------------------------------------------------
----------------------------------------------------------------------
================================
TABLE OF CONTENTS
================================
1 The MONK's Comparison Of Learning Algorithms
-- Introduction and Survey 1
1.1 The problem 2
1.2 Visualization 2
2 Applying Various AQ Programs to the MONK's Problems:
Results and Brief Description of the Methods 7
2.1 Introduction 8
2.2 Results for the 1st problem (M1) 9
2.2.1 Rules obtained by AQ17-DCI 9
2.2.2 Rules obtained by AQ17-HCI 10
2.3 Results for the 2nd problem (M2) 11
2.3.1 Rules obtained by AQ17-DCI 11
2.3.2 Rules obtained by AQ17-HCI 11
2.3.3 Rules obtained by AQ17-FCLS 13
2.4 Results for the 3rd problem (M3) 15
2.4.1 Rules obtained by AQ17-HCI 15
2.4.2 Rules obtained by AQ14-NT 16
2.4.3 Rules obtained by AQ17-FCLS 16
2.4.4 Rules obtained by AQ15-GA 17
2.5 A Brief Description of the Programs and Algorithms 18
2.5.1 AQ17-DCI (Data-driven constructive induction) 18
2.5.2 AQ17-FCLS (Flexible concept learning) 19
2.5.3 AQ17-HCI (Hypothesis-driven constructive induction) 19
2.5.4 AQ14-NT (noise-tolerant learning from engineering data) 20
2.5.5 AQ15-GA (AQ15 with attribute selection by a
genetic algorithm) 20
2.5.6 The AQ Algorithm that underlies the programs 21
3 The Assistant Professional Inductive Learning System:
MONK's Problems 23
3.1 Introduction 24
3.2 Experimental results 24
3.3 Discussion 25
3.4 Literature 25
3.5 Resulting Decision Trees 26
4 mFOIL on the MONK's Problems 29
4.1 Description 30
4.2 Set 1 31
4.3 Set 2 31
4.4 Set 3 32
5 Comparison of Decision Tree-Based Learning Algorithms
on the MONK's Problems 33
5.1 IDL: A Brief Introduction 34
5.1.1 Introduction 34
5.1.2 Related Work 35
5.1.3 Conclusion 36
5.2 Experimental Results 40
5.2.1 ID5R on test set 1 43
5.2.2 IDL on test set 1 43
5.2.3 ID5R-HAT on test set 1 44
5.2.4 TDIDT on test set 1 44
5.2.5 ID5R on test set 2 45
5.2.6 IDL on test set 2 46
5.2.7 TDIDT on test set 2 48
5.2.8 TDIDT on test set 1 49
5.2.9 ID5R-HAT on test set 2 50
5.3 Classification diagrams 52
5.4 Learning curves 56
6 Comparison of Inductive Learning Programs 59
6.1 Introduction 60
6.2 Short description of the algorithms 60
6.2.1 ID3 60
6.2.2 ID5R 61
6.2.3 AQR 61
6.2.4 CN2 62
6.2.5 CLASSWEB 62
6.3 Results 63
6.3.1 Training Time 63
6.3.2 Classifier Results 64
6.4 Conclusion 68
6.5 Classification diagrams 69
7 Documentation of Prism -- an Inductive Learning
Algorithm 81
7.1 Short Description 82
7.2 Introduction 82
7.3 PRISM: Entropy versus Information Gain 82
7.3.1 Maximizing the information gain 82
7.3.2 Trimming the tree 82
7.4 The Basic Algorithm 83
7.5 The Use of Heuristics 84
7.6 General Considerations and a Comparison with ID3 84
7.7 Implementation 84
7.8 Results on Running PRISM on the MONK's Test Sets 85
7.8.1 Test Set 1 -- Rules 86
7.8.2 Test Set 2 -- Rules 87
7.8.3 Test Set 3 -- Rules 90
7.9 Classification diagrams 92
8 Backpropagation on the MONK's problems 95
8.1 Introduction 96
8.2 Classification diagrams 97
8.3 Resulting weight matrices 99
9 The Cascade-Correlation Learning Algorithm on the
MONK's Problems 101
9.1 The Cascade-Correlation algorithm 102
9.2 Results 103
9.3 Classification diagrams 106
More information about the Connectionists
mailing list