"malicious" training patterns

Tue Mar 13 12:34:57 EST 1990

   Date: Tue, 13 Mar 90 11:05:27 EST
   From: Gerald Tesauro <TESAURO at IBM.COM>

   The notion that points close to the decision surface are "malicious"
   comes as a surprise to me. From the point of view of extracting good
   generalizations from a limited number of training patterns, such
   "borderline" training patterns may in fact be the best possible
   patterns to use. A benevolent teacher might very well explicitly
   design a bunch of border patterns to clearly mark the boundaries
   between different conceptual classes.

   --Gerry Tesauro

This is true for cases where the decision surface is parallel to an
axis--in that case, the teacher can give two examples differing in
only one feature-value.  But in general, the more closely positive and
negative examples crowd together, the harder it is to resolve and
separate them, especially in noisy circumstances.

An average case analysis must always define "average", which is what
Baum has nicely done.  Do readers have examples of domains that are
linearly separable but hard to separate?  

In my experience, the problem is that simple linear models, while they
may give reasonable fits to training data, tend to underfit and hence
give poor predictive performance.  For example, the following points
are (just barely) linearly separable, but a multilayer perceptron or
a quadratic model would give better predictive performance:

    + + +        - - -
      + + +        - - -
        + + +        - - -
          + + +        - - -
           + + +        - - -
          + + +        - - -
        + + +        - - -
      + + +        - - -
    + + +        - - -

As the number of training examples increases, there is statistical
support for hypotheses more complex than simple linear separators.  

Tom Dietterich