MLP classifiers == Bayes

Sun Sep 30 20:28:16 EDT 1990

EQUIVALENCE PROOFS FOR MULTI-LAYER PERCEPTRON CLASSIFIERS AND
          THE BAYESIAN DISCRIMINANT FUNCTION

   John B. Hampshire II       and     Barak A. Pearlmutter
                Carnegie Mellon University

             --------------------------------

  We show the conditions necessary for an MLP classifier to
yield (optimal) Bayesian classification performance.

Background:
==========

  Back in 1973, Duda and Hart showed that a simple perceptron
trained with the Mean-Squared Error (MSE) objective function
would minimize the squared approximation error to the
Bayesian discriminant function.  If the two-class random vector (RV)
being classified were linearly separable, then the MSE-trained
perceptron would produce outputs that converged to the
a posteriori probabilities of the RV, given an asymptotically
large set of statistically independent training samples of the RV.
Since then, a number of connectionists have re-stated this
proof in various forms for MLP classifiers.

What's new:
==========

  We show (in painful mathematical detail) that the proof
holds not just for MSE-trained MLPs, it also holds for
MLPs trained with any of two broad classes of objective
functions.  The number of classes associated with
the input RV is arbitrary, as is the dimensionality of the RV,
and the specific parameterization of the MLP.  Again, we
state the conditions necessary for Bayesian equivalence
to hold.

  The first class of "reasonable error measures" yields
Bayesian performance by producing MLP outputs that
converge to the a posterioris of the RV.  MSE and
a number of information theoretic learning rules leading
to the Cross Entropy objective function are familiar
examples of reasonable error measures.
The second class of objective functions, known as
Classification Figures of Merit (CFM), yield
(theoretically limited) Bayesian performance by
producing MLP outputs that reflect the identity
of the largest a posteriori of the input RV.

How to get a copy:
=================

  To appear in the "Proceedings of the 1990 Connectionist
Models Summer School," Touretzky, Elman, Sejnowski, and Hinton,
eds., San Mateo, CA:  Morgan Kaufmann, 1990.  This text will
be available at NIPS in late November.

  If you can't wait, pre-prints may be obtained from the OSU
connectionist literature database using the following procedure:

% ftp cheops.cis.ohio-state.edu  (or, ftp 128.146.8.62)
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get hampshire.bayes90.ps.Z
261245 bytes sent in 9.9 seconds (26 Kbytes/s)
ftp> quit
% uncompress hampshire.bayes90.ps.Z
% lpr hampshire.bayes90.ps