Papers: NN Learning Theory and Algebraic Geometry

Sumio Watanabe swatanab at pi.titech.ac.jp
Wed Jul 25 22:07:38 EDT 2001


Dear Connectionists,

The following papers are available.

http://watanabe-www.pi.titech.ac.jp/~swatanab/index.html

I would like to announce that the reason why the hierarchical
structure is important in practical learning machines is now
being clarified. Also please visit the page of our special session,

http://watanabe-www.pi.titech.ac.jp/~swatanab/kes2001.html

Comments and remarks are welcome.

                        Thank you.

                        Sumio Watanabe
                        P&I Lab.
                        Tokyo Institute of Technology
                        swatanab at pi.titech.ac.jp


*****

(1) S. Watanabe "Learning efficiency of redundant neural networks
in Bayesian esitimation," to appear in IEEE Trans. on NN.

The generalization error of a three-layer neural network in a redundant
state
is clarified. The method in this paper is not algebraic but completely
analytic.
It is shown that the stochastic complexity of the three-layer perceptron
can be calculated by expanding the determinant of the singular information
matrix.
It is shown that, if the learner becomes more redundant compared with the
true distribution,
then the increase of the stochatsic complexity becomes smaller.
Non-identifiable models are compared with the regular stiatistical
models from the statistical model selection point of view, and
it is shown that Bayesian estimation is appropriate for layered learning
machines in almost redundant states.


(2) S. Watanabe, "Algebraic geometrical methods for
hierarchical learning machines," to appear in Neural Networks.

This paper establishes the algebraic geometrical methods in neural
network learning theory. The learning curve of a non-identifiable model
is determined by the pole of the Zeta function of the Kullback information,
and its pole can be found by resolution of singularities.
The blowing-up technology in algebraic geometry is applied to
the multi-layer perceptron, and its learning efficiency is
obtained systematically. Even when the true distribution is not contained in
parametric
models, singularities in the parameter space make the learning curve
smaller than the all curves of smaller models contained in the machine.

*** PLease compare these two papers. *** End






More information about the Connectionists mailing list