batch-mode parallel implementations

Sat Oct 19 17:09:06 EDT 1991

There has been a fair amount of work in decision-tree learning on the
issue of breaking large training sets into smaller batches.  In 1980,
Quinlan introduced a method called "windowing" in which a small sample
(or window) of the training data is initially drawn at random.  The
algorithm is trained on this window and then tested on the remainder of
the data (that was excluded from the window).  Then, some fraction of
the misclassified examples (possibly all of them) are added to the
window.

Generally speaking, in noise-free domains, windowing works quite well.
A very high-performing decision tree can be learned with a relatively
small window.  However, for noisy data, the general experience has
been that the window eventually grows to include the entire training set.
Jason Catlett (Sydney U) recently completed his dissertation on
testing windowing and various other related tricks on datasets of
roughly 100K examples (straight classification problems).  I recommend
his papers and thesis.

His main conclusion is that if you want high performance, you need to
look at all of the data.

--Tom