Four Papers

Wed Feb 1 13:38:57 EST 1995

The following four papers are available for ftp access (abstracts are
included below):

Zhang, W., Dietterich, T. G., (submitted). A Reinforcement
Learning Approach to Job-shop Scheduling.
ftp://ftp.cs.orst.edu/users/t/tgd/papers/tr-jss.ps.gz

Dietterich, T. G., Flann, N. S., (submitted). Explanation-based
Learning and Reinforcement Learning: A Unified View. 
ftp://ftp.cs.orst.edu/users/t/tgd/papers/ml95-ebrl.ps.gz

Dietterich, T. G., Kong, E. B., (submitted). Machine Learning
Bias, Statistical Bias, and Statistical Variance of Decision Tree
Algorithms. 
ftp://ftp.cs.orst.edu/users/t/tgd/papers/ml95-bias.ps.gz

Kong, E. B., Dietterich, T. G., (submitted). Error-Correcting
Output Coding Corrects Bias and Variance.
ftp://ftp.cs.orst.edu/users/t/tgd/papers/ml95-why.ps.gz

These and other titles are available through my WWW homepage (see URL
at end of message).

----------------------------------------------------------------------
       A Reinforcement Learning Approach to Job-shop Scheduling

                              Wei Zhang
                         Thomas G. Dietterich
                     Computer Science Department
                       Oregon State University
                     Corvallis, Oregon 97331-3202

Abstract: We apply reinforcement learning methods to learn
domain-specific heuristics for job shop scheduling.  We employ
repair-based scheduling using a problem space that starts with a
critical-path schedule and incrementally repairs constraint violations
with the goal of finding a short conflict-free schedule.  The states
in this problem space are represented by a set of features.  The
temporal difference algorithm $TD(\lambda)$ is applied to train a
neural network to learn a heuristic evaluation function over states.
This evaluation function is used by a one-step lookahead search
procedure to find good solutions to new scheduling problems.  We
evaluate this approach on synthetic problems and on problems from a
NASA space shuttle payload processing task.  The evaluation function
is trained on problems involving a small number of jobs and then
tested on larger problems.  The TD scheduler performs better than the
best known existing algorithm for this task---Zweben's iterative
repair method based on simulated annealing.  The results suggest that
reinforcement learning algorithms can provide a new method for
constructing high-performance scheduling systems for important
industrial applications.
----------------------------------------------------------------------
        Explanation-Based Learning and Reinforcement Learning:
                            A Unified View

                         Thomas G. Dietterich
                    Department of Computer Science
                       Oregon State University
                         Corvallis, OR 97331

                          Nicholas S. Flann
                    Department of Computer Science
                        Utah State University
                         Logan, UT 84322-4205

Abstract: In speedup-learning problems, where full descriptions of
operators are always known, both explanation-based learning (EBL) and
reinforcement learning (RL) can be applied.  This paper shows that
both methods involve fundamentally the same process of propagating
information backward from the goal toward the starting state.  RL
performs this propagation on a state-by-state basis, while EBL
computes the weakest preconditions of operators, and hence, performs
this propagation on a region-by-region basis.  Based on the
observation that RL is a form of asynchronous dynamic programming,
this paper shows how to develop a dynamic programming version of EBL,
which we call Explanation-Based Reinforcement Learning (EBRL).  The
paper compares batch and online versions of EBRL to batch and online
versions of RL and to standard EBL.  The results show that EBRL
combines the strengths of EBL (fast learning and the ability to scale
to large state spaces) with the strengths of RL (learning of optimal
policies).  Results are shown in chess endgames and in synthetic maze
tasks.
----------------------------------------------------------------------
       Machine Learning Bias, Statistical Bias, and Statistical
                 Variance of Decision Tree Algorithms

                         Thomas G. Dietterich
                             Eun Bae Kong
                    Department of Computer Science
                          303 Dearborn Hall
                       Oregon State University
                       Corvallis, OR 97331-3202

Abstract: The term ``bias'' is widely used---and with different
meanings---in the fields of machine learning and statistics.  This
paper clarifies the uses of this term and shows how to measure and
visualize the statistical bias and variance of learning algorithms.
Statistical bias and variance can be applied to diagnose problems with
machine learning bias, and the paper shows four examples of this.
Finally, the paper discusses methods of reducing bias and variance.
Methods based on voting can reduce variance, and the paper compares
Breiman's bagging method and our own tree randomization method for
voting decision trees.  Both methods uniformly improve performance on
data sets from the Irvine repository.  Tree randomization yields
perfect performance on the Letter Recognition task.  A weighted
nearest neighbor algorithm based on the infinite bootstrap is also
introduced.  In general, decision tree algorithms have
moderate-to-high variance, so an important implication of this work is
that variance---rather than appropriate or inappropriate machine
learning bias---is an important cause of poor performance for decision
tree algorithms.
----------------------------------------------------------------------
      Error-Correcting Output Coding Corrects Bias and Variance

                             Eun Bae Kong
                         Thomas G. Dietterich
                    Department of Computer Science
                          303 Dearborn Hall
                       Oregon State University
                       Corvallis, OR 97331-3202

Abstract: Previous research has shown that a technique called
error-correcting output coding (ECOC) can dramatically improve the
classification accuracy of supervised learning algorithms that learn
to classify data points into one of k>>2 classes.  This paper
presents an investigation of why the ECOC technique works,
particularly when employed with decision-tree learning algorithms.  It
shows that the ECOC method---like any form of voting or
committee---can reduce the variance of the learning algorithm.
Furthermore---unlike methods that simply combine multiple runs of the
same learning algorithm---ECOC can correct for errors caused by the
bias of the learning algorithm.  Experiments show that this bias
correction ability relies on the non-local behavior of C4.5.
----------------------------------------------------------------------

Thomas G. Dietterich              Voice: 503-737-5559
Department of Computer Science    FAX:   503-737-3014
Dearborn Hall, 303                URL:   http://www.cs.orst.edu/~tgd
Oregon State University
Corvallis, OR 97331-3102