Neural network capabilities and alternatives to BP

Thu Sep 29 09:47:53 EDT 1988

Dear colleagues:

First of all, I have to apologize that my previous mails
have somewhat rude tone and un-intended negative effects.
I would like to correct them by making my points clear and will
try to supply usable and traceable information.

I suggested too many things with too few evidences.
What I want to point out are as follows.

(1) Neural network capabilities and learning 
algorithms are different problems. Separating these problems
will clarify their characteristics better.

(2) Theoretically, feed-forward networks with one hidden layer
can approximate any arbitrary continuous mapping from n dimensional
hypercube to m dimensional space. However, networks designed
according to procedures suggested by the theory (like Irie-Miyake)
will suffer from so-called "combinatorial explosion" problems, because 
complexity of the network is proportional to the degrees of freedom 
of the input space.

  Irie-Miyake proof is based on multi-dimensional Fourier transform.
  An interesting demonstration of neural network capabilities can
  be implemented using CT(Computerized Tomography) procedures.
  (Irie once said that his inspiration came from his knowledge on CT.)

(3) In pattern processing applications, there is a useful class of
neural network architectures including RBF. They are not likely to
suffer from "combinatorial explosion" problems, because the network 
complexity in this case is mainly bounded by the number of clusters 
in input space. In other words, the degrees of freedom is usually 
proportional to the number of clusters. (Thank you for providing 
useful information on RBF and PGU. Hanson's article and Niranjan's 
article supplied additional information.)

(4) There are simple transformations for converting feed-forward networks
to the networks which are members of a class mentioned in (3).
PGU introduced by Hanson and Burr is one of such extensions. However,
there are at least two cases where linear graded units can form
Radial Basis Functions.

Case(1):
If input vectors are distributed only on a surface of a hypersphere,
output of a linear graded unit will be a RBF.

Case(2):
If input vectors are auto-correlation coefficients of input signals,
and if weight vectors of a linear graded unit is calculated from
the maximum likelihood spectral parameters of a reference spectrum,
output of a linear graded unit also will be a RBF.

(5) These transformations and various neural network learning algorithms
can be combined to work together.
For example, self-organizing feature map can be utilized for preparing
reference points of RBF. A BP-based procedure can be used for fine tuning.

(6)Procedures through (3) to (4) suggest a prototype-based
perception model, because hidden units in this case correspond to
reference vectors in input space. This is a local representation.
Even if we choose a RBF function with broader radius, it resembles
coarse coding at best.
It is somewhat contrasting with our experience using
BP, where usually distributed representations emerge as internal
representations. This is an interesting point to discuss.

(7) My point of view:
I agree with Hanson's view that neural networks are not mere derivatives
of statistical methods. I believe that neural networks are fruitful 
sources of important algorithms, which are not discovered yet.
This doesn't imply that neural networks simply implement those algorithms.
It implies that we can extract those algorithms if we carefully investigate
its functions using appropriate formalism and abstractions.

I hope this mail will clarify my points and contribute for
increasing our knowledge on neural network characteristics
and hopefully will stimulate productive discussions.

Hideki Kawahara
NTT Basic Research Laboratories.

Reference: Itakura,F.: "Minimum Prediction Residual Principle
Applied to Speech Recognition," IEEE Trans., ASSP-23, pp.67-72,
Feb. 1975. 
(This is the original paper. The Itakura-measure may be found in many 
text books on speech processing.)