Geometry and Statistics in NN Learning Theory

Sumio Watanabe swatanab at pi.titech.ac.jp
Wed May 23 00:26:12 EDT 2001


Dear Connectionists,

We are very grad to inform that we have a special session,
"Geometry and Statistics in Neural Network Learning Theory"

http://watanabe-www.pi.titech.ac.jp/~swatanab/kes2001.html

in the International Conference KES'2001, which will be held
in Oska and Nara in Japan, 6th - 8th, September, 2001.

http://www.bton.ac.uk/kes/kes2001/

In our session, we study the statistical problem caused by
non-identifiability of layered learning machines.

Information :
* Date: September, 8th (Saturday), 2001, 14:40-16:45.
* Place: Nara New Public Hall, Nara City, Japan.
* Schedule: The time for each presentation is 25 minutes.
* (Remark) Before this session, Professor Amari gives
   an invited talk, 13:40-14:40.

**********

The authors and papers:

You can get these papers from the cite,
http://watanabe-www.pi.titech.ac.jp/~swatanab/kes2001.html

(1) S. Amari, T.Ozeki, and H.Park (RIKEN BSI)
"Singularities in Learning Models:Gaussian Random Field Approach."

(2) K. Fukumizu  (ISM)
"Asymptotic Theory of Locally Conic Models and its Application to
Multilayer Neural Networks."

(3) K.Hagiwara  (Mie Univ.)
"On the training error and generalization error of neural network
regression without identifiablity."

(4) T. Hayasaka, M.Kitahara, K.Hagiwara, N.Toda, and S.Usui  (TUT)
"On the Asymptotic Distribution of the Least Squares Estimators
for Non-identifiable Models."

(5) S. Watanabe  (TIT)
"Bayes and Gibbs Estimations, Empirical Processes, and Resolution
of Singularities."

**********

A Short Introduction:

[Why Non-identifiability ?]
A parametric model in statistics is called identifiable if the mappning from
the parameter to the probability distribution is one-to-one.
A lot of learning machines used in information processing, such as
artificial neural networks, normal mixtures, and Boltzmann machines
are not identifiable. We do not yet have mathematical and statistical
foundation on which we can research such models.

[Singularities and Asymptotics ]
If a non-identifiable model is redundant compared with the true
distribution, then the set of true paramters is an analytic set with complex
singularities, and the rank of the Fisher information matrix depends on the
parameter. The behaviors of the training and generalization errors of
layered learning machines are quite different from those of regular
statistical models. It should be emphasized that we can not apply the
standard asymptotic methods constructed by Fisher, Cramer, and Rao
to these models. Either we can not use AIC, MDL, or BIC in statistical
model selection for design of artificial neural networks.

[Geometry and Statistics ]
The purpose of this special session is to study and discuss the geometrical
and statistical methodology by which non-identifiable learning machines
can be analyzed. Remark that conic singularities are given by blowing-downs,
and normal crossing singularities are found by blowing-ups. These algebraic
geometrical methods take us to the statistical concepts, the order statistic
and the empirical process. We find that a new perspective in geometry and
statistics is opened.

[Results which will be reported]
(1) Professor Amari, et. al. clarify the generaliztion and traning errors of
learning models of conic singularities in both the maximum likelihood
method and the Bayesian method using the gaussian random field approach.
(2) Dr. Fukumizu proves that a three layered neural network can be
understood as a locally conic model, and that the asymptotic likelihood
ratio is in proportion to (log n), where n is the number of training
samples.
(3) Dr. Hagiwara shows that the training and generalization errors of
radial basis functions with gaussian units are in proportion to (log n)
based on the assumption that the inputs are fixed.
(4) Dr. Hayasaka, et.al. claim that the training error of three-layer
perceptron is closely related to the expectation value of the order
statistic.
(5) Lastly, Dr.Watanabe studies the Bayes and Gibbs estimations for the
case of statistical models with normal crossing singularities, and shows all
general cases result in this case by resolution theorem.

We expect that mathematicians, statisticians, information scientists,
and theoretical physists will be interested in this topic.

**********

Thank you very much for your interest in our special session.
For questions or comments, please send an e-mail to

Dr. Sumio Watanabe,

P&I Lab., Tokyo Institute of Technology.
E-mail: swatanab at pi.titech.ac.jp
http://watanabe-www.pi.titech.ac.jp/~swatanab/index.html
[Postal Mail] 4259 Nagatsuta, Midori-ku, Yokohama, 226-8503 Japan.
















More information about the Connectionists mailing list