Technical Report Available.
Jerome H. Friedman
jhf at stat.Stanford.EDU
Fri Oct 17 16:38:10 EDT 1997
*** Technical Report Available ***
Bump Hunting in High-Dimensional Data
Jerome H. Friedman
Stanford University
Nicholas I. Fisher
CMIS - CSIRO, Sydney
ABSTRACT
Many data analytic questions can be formulated as (noisy) optimization
problems. They explicitly or implicitly involve finding simultaneous
combinations of values for a set of ("input") variables that imply
unusually large (or small) values of another designated ("output")
variable. Specifically, one seeks a set of subregions of the input
variable space within which the value of the output variable is
considerably larger (or smaller) than its average value over the
entire input domain. In addition it is usually desired that these
regions be describable in an interpretable form involving simple
statements ("rules") concerning the input values. This paper describes
a new procedure directed towards this goal based on the notion of
"patient" rule induction. This patient strategy is contrasted with the
greedy ones used by most rule induction methods, and semi-greedy ones
used by some partitioning tree techniques such as CART. Applications
involving scientific and commercial data bases are presented.
Keywords: noisy function optimization, classification, association,
rule induction, data mining.
Available by ftp from:
"ftp://stat.stanford.edu/pub/friedman/prim.ps.Z"
Note: This postscript does not view properly on some older versions
of ghostview. It seems to print OK on nearly all postscript printers.
More information about the Connectionists
mailing list