multiple models, hybrid estimation

Wed Aug 4 12:41:36 EDT 1993

When I emailled Wray Buntine about his original posting on the subject of
multiple models, I quipped:

`Shhh.... dont tell everyone, they'll all want one!' (a multiple model)

Little did I know everyman and his dog appears to have one already:)

The recent postings and especially Michael Perrone's recent contribution(s)
have persuaded me to sketch the extent of my work in this area and donate a
FREE piece of Mathematica code.

I mention Michael's work because it follows the same basic approach of
general least squares as mine, and I agree with many of the points that he
raises in his general discussion of hybrid estimation, such as the need for
a completely general method, the utility of a closed form solution, and his
novel description of distinct local minima in functional space as opposed
to parameter space.

However.....
he says that for his method (GEM):

 >> 7) The *optimal* parameters of the ensemble estimator are given in closed
 >> form.

 I present a method in the same general spirit of Michael's that is
slightly more optimal and general (and I am not claiming even this is the
best!). It is based on the unconstrained least squares of the estimator
poulation "design matrix" via SVD.

1 Generality: The technique utilises singular value decomposition (SVD),
and hence avoids the problem of collinearity between estimators that can
(and often does) occur in a population of estimators as mentioned by
Michael. SVD happily copes with highly collinear or even duplicate
estimators in the design matrix, without preprocessing/thresholding.

2 Optimality: The technique places no constraint on the value of the
weights (MP [1] has sum=1 and also in the results he presents all w are
0<w<1 due to the simplification made ). The *unconstrained* minimisation is
ipso facto more optimal. Also the inclusion of a bias weight (see recipe)
can improve matters further.

The resulting +ve and -ve weightings of estimators can be intepreted
loosely as both competition and co-operation between estimators and near 0
weightings as redundant (non-distinct) estimators.

I, however do not claim that the technique is completely *optimal* since it
is not clear how much of the improvement due to combining estimators is due
to extra degrees of freedom. (The same is true of Michael's technique).

This type of technique however is completely *general*, and as alluded by
Michael, all manner of estimators can be added to the population. This
includes, networks, KNN, parzen regressors, information trees, 1st
principles models, expert sytems, even the original raw or preprocessed
data......in fact anything you have kicking about to make the population
more information rich.
---------------------------------------------------- 
Here is the recipe:

1. Given a design matrix A composed of m estimators f_i,i=1,m and n samples
(x,y) where x is a vector input, y a scalar target .

A_ij = f_i(x_j)  i=1,m j=1,n

2. Then the general unconstrained least squares minimisation of

|A.w-y|^2

wrt w the weight vector for the m estimators, is given by

3.       U_i.y
      w= ----- V_i
          m_i

where U.m.V^T=A are the singular value decomposition (SVD) of the design
matrix A.  4. The method is easily extended to include a bias weight by
simply adding an m+1th "estimator" which is simply a column of 1s.
---------------------------------------------------- 
Here is the FREE code:

Hybrid[A_,b_]:=
Block[{u,w,v,a},
{u,w,v}=SingularValues[A];
{a=((v.b)/w).u,(a.A-b).(a.A-b)}]

The routine accepts n x m design matrix A and vector b of n targets and
returns a vector of m weights, and the result of the minimisation (which is
of course always an underestimate of the *true* MSE, since it is
effectively the trainiTo use with bias simply do something like:

Hybrid[AddColumn[A,Table[1,{Length[y]}]],y]
--------------------------------------------------------------
Of course like any fitting procedure by increasing dof you can overfit.
It is recommended that your favourite resampling plan is used to avoid this.
Like everything else: GIGO.

1. @inproceedings{PerroneCooper93CAIP,
   AUTHOR    = {Michael P. Perrone and Leon N Cooper},
   TITLE     = {When Networks Disagree: Ensemble Method for Neural Networks},
   BOOKTITLE = {Neural Networks for Speech and Image processing},
   YEAR      = {1993},
   PUBLISHER = {Chapman-Hall},
   EDITOR    = {R. J. Mammone},
   NOTE      = {[To Appear]},
   where     = {London}
}

Cheers Graham

Graham Lamont
Department of Chemical and Process Engineering
Merz Court
University of Newcastle
Newcastle-upon-Tyne
NE1 7RU
UK

Phone: 91-2226000 x7241
Fax: 91-2611182
Email: graham.lamont at ncl.ac.uk