Gaussian statistical models, Hilbert spaces

Tue Sep 1 15:00:30 EDT 1998

Readers of 
...............
  http://www.santafe.edu/~zhuh/draft/edmc.ps.gz 

		Error Decomposition and Model Complexity

			     Huaiyu Zhu

  Bayesian information geometry provides a general error decomposition
  theorem for arbitrary statistical models and a family of information
  deviations that include Kullback-Leibler information as a special case.
  When applied to Gaussian measures it takes the classical Hilbert space
  (Sobolev space) theories for estimation (regression, filtering,
  approximation, smoothing) as a special case.  When the statistical and
  computational models are properly distinguished, the dilemmas of
  over-fitting and ``curse of dimensionality'' disappears, and the optimal
  model order disregarding computing cost is always infinity.
.............

  will do doubt be interested in the long history of the relationship 
  between reproducing kernel Hilbert spaces (rkhs), gaussian measures  
  and regularization,  -
    see 

    1962 Proccedings of the Symposium on Time Series Analysis 
    edited by Murray Rosenblatt, Wiley 1962, esp. the paper by Parzen
    1962 J. Hajek On linear statistical problems in stochastic processes
	 Czech Math J. v 87. 
    1971 Kimeldorf and Wahba, Some results on Tchebycheffian spline functions, 
    J. Math Anal. Applic. v 33.
    1990 G. Wahba, Spline Models for Observational Data, SIAM 
    1997 F. Girosi, An equivalence between sparse approximation and 
      support vector machines, to appear Neural Comp
    1997 G. Wahba, Support vector vachines, reproducing kernel Hilbert 
      spaces and the randomized GACV, to appear, Schoelkopf, Burges 
      and Smola, eds, forthcoming book on Support Vector Machines, MIT Press 
    1981 C. Micchelli and G. Wahba, Design problems for optimal
      surface interpolation, Approximation Theory and Applications, 
      Z. Ziegler ed, Academic press. Also numerous works by 
      L. Plaskota and others on optimal bases. First k eigenfunctions 
      of the reproducing kernel are well known to have certain 
      optimal properties under restricted circumstances, 
      see e.g. Ch 12 of Spline Models and references there, but 
      if there are n observations, then the Bayes estimates 
      are found in an at most  n-dimensional subspace of the rkhs 
      associated with the prior, KW 1971.   B. Silverman 1982
      `On the estimation of a probability density fuction by 
      the maximum penalized likelihood method', Ann. Statist 1982 
      will also be of interest - convergence rates are related
      to the rate of decay of the eigenvalues of the reproducing
      kernel. 
	     Grace Wahba http://www.stat.wisc.edu/~wahba/