Refs requested: Linear Threshold Decisionmaking

Wed Apr 8 16:43:38 EDT 1992

Below is a one-page abstract describing results for when simple linear
threshold algorithms (m-out-of-n functions) are effective for noisy
inputs and any number of outputs.  I have not been able to uncover
similar results for these algorithms or for linear threshold
algorithms in general.  What I have found are psychological studies
using linear regression to find optimized decision models:

	Dawes, R. M. and Corrigan, B. (1974). Linear models in
	decision making. Psychological Bulletin, 81(2):95-106. 

	Elstein, A. S., Shulman, L. S., and Sprafka, S. A. (1978).
	Medical Problem Solving. Harvard University Press, Cambridge,
	Massachusetts.

	Meehl, P. E. (1954). Clinical versus Statistical Predictions:
	A Theoretical Analysis and Review of the Evidence. University
	of Minnesota Press, Minneapolis.

and some explanations of why they work:

	Wainer, H. (1976). Estimating coefficients in linear models:
	It don't make no nevermind. Psychological Bulletin,
	83(2):213-217.

Of course, the statistical literature is filled with work on linear
discriminant functions and linear regression.  A superficial search
did not yield any results in the same spirit as mine.

I should also mention that I know of Nick Littlestone's algorithm for
learning m-out-of-n functions when the sample is linearly separable.
I don't know of any more recent results for noisy inputs.

	Littlestone, N. (1987). Learning quickly when irrelevant
	attributes abound: A new linear-threshold algorithm. Machine
	Learning 2(4):285-318.

If you know of related work, please send mail to
byland at cis.ohio-state.edu.
						Thanks, Tom

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

       The Effectiveness of Simple Linear Threshold Algorithms

			     Tom Bylander
	   Laboratory for Artificial Intelligence Research
	    Department of Computer and Information Science
		      The Ohio State University
			Columbus, Ohio, 43210
		      byland at cis.ohio-state.edu

Consider the following simple linear threshold algorithm (SLT)
(m-out-of-n function) for determining whether to believe a given
"output" proposition (hereafter called an "output") from a given set
of "input" propositions (hereafter called "inputs"):

	Collect all the inputs relevant to the output.
	Count the number of inputs in favor of the output.
	Subtract the number of inputs against the output.
	Believe the output if the result exceeds some threshold.

SLT is an effective algorithm if two conditions are met.  One,
sufficient inputs are available.  Two, the inputs are independent
evidence for the output.  For example, if x_1 and x_2 are two inputs
and y is the output, then it should be the case that: 

		  P(x_1&x_{2}|y) = P(x_1|y)*P(x_2|y)

I shall call this property "conditional independence."

Consider the case where there are n outputs, each output is either 0
or 1, and each output is associated with a set of conditionally
independent inputs, each 0 (unfavorable) or 1 (favorable).  Let t be a
number between 0 and .5 such that for any output y, the difference
between the average value of its inputs when y=1 and the average when
y=0 is greater than 2t.  Based on inequalities from Hoeffding (1963),
(ln n)/t^2 inputs per output are sufficient to ensure that the
probability that SLT makes a wrong conclusion (i.e., that some
assignment to some output is in error) is less than 1/n.  This result
does not depend on the prior probabilities of the outputs.

How does this compare to optimal?  Suppose that each input for an
output corresponds to the output except for noise r.  Based on the
central limit theorem, r(1-r)(ln n)/t^2 inputs per output are
insufficient to achieve 1/n error for the optimal algorithm, where
t=.5-r.  This result assumes that the prior probability of each
assignment to the outputs is equally likely.  I speculate that if H is
the entropy of the outputs, then ln H should be substituted for ln n
in the above bound.

Thus, SLT is nearly optimal, asymptotically speaking, for noisy inputs
and any number of outputs.

Conditional independence is the most problematic assumption in the
analysis.  Perfect conditional independence is unlikely in reality;
however, Hoeffding's inequalities for m-dependent random variables
strongly suggest that SLT should work well as long as the same
evidence is not counted too many times.  Finding inputs that are
sufficiently conditionally-independent is the central learning problem
for SLT.

Hoeffding, W. (1963). Probability inequalities for sums of bounded
variables. J. American Statistical Association, 58(1):13-30.