Radial basis functions etc.

M. Niranjan niranjan%digsys.engineering.cambridge.ac.uk at NSS.Cs.Ucl.AC.UK
Wed Nov 23 07:42:17 EST 1988


Some recent comments on RBFs that you might find interesting!

niranjan

============================== ONE =====================================
Date:        Wed, 16 Nov 88 09:41:13 +0200
From: Dario Ringach <dario at earn.techunix>
To: ajr <ajr%uk.ac.cambridge.engineering.digsys at uk.ac.ucl.cs.nss>
Subject:     Comments on TR.25
 
Thanks a lot for the papers!
 
I'd like to share a few thoughts on TR.25 "Generalising the Nodes of
the Error Propagation Network".  What are the advantages of choosing
radial basis functions (or Gaussian nodes) in *general* discrimination
tasks?  It seems clear to me, that the results presented in Table 1
are due to the fact that the spectral distribution of steady state
vowels can be closely represented by normal/radial distributions.  If
I have no a-priori information about the distribution of the classes
then how can I know which kind of nodes will perform better?  I think
that in this case the best we can do is to look at the combinatorical
problem of how many partitions of the n-dimensional Euclidean space
can be obtained using N (proposed shape) boundaries.  This is closely
related to obtaining the Vapnik-Chervonenkis Dimension of the boundary
class.  In the case of n-dimensional hyperplanes and hypershperes,
both have VC-dimension n+1, so I think there is really no difference
in using hyperplanes or hyperspheres in *general* discrimination
problems.  Don't you agree?
 
Thanks again for the papers!
Dario
 
============================== TWO =====================================

From: M. Niranjan <niranjan>
Date: Mon, 21 Nov 88 14:03:16 GMT
To: dario at bitnet.techunix
Subject: RBF etc
Cc: ajr, dr, fallside, idb, jpmg, lwc, mdp, mwhc, niranjan, psts, rwp, tpm,
        visakan, vsa10


With RBFs of the Gaussian type, the class conditional density function
is approximated by a mixture of multiple Gaussians. But the parameters
of the mixture are estimated to maximise the discrimination rather than
modelling the individual probability densities.

> If I have no a-priori information about the distribution of the classes
> then how can I know which kind of nodes will perform better?

There is no way other than by a set of experiments. In small scale
problems, we can probably plot cross sections of the feature space, or
even projections of it on a linear discriminant plane and get some
rough idea.

> problem of how many partitions of the n-dimensional Euclidean space
> can be obtained using N (proposed shape) boundaries.

It is not how many different partitions; I think our problem in pattern
classification is dealing with breakpoints of class boundary. It is this
capability that is the power in MLPs (and RBFs). In a two class problem,
we still partition the input space into two using N boundary segments
(or splines), with N-1 break-points.

What I like about RBFs is that you can have a probabilistic interpretation.
With standard MLPs this is not very obvious and what happens is more like a
functional interpolation.

> both have VC-dimension n+1, so I think there is really no difference
 
I dont know what VC-dimension is. Any reference please?

Best wishes
niranjan

============================ THREE =======================================
 
Date:        Tue, 22 Nov 88 08:22:53 +0200
From: Dario Ringach <dario at EARN.TECHUNIX>
To: M. Niranjan <niranjan at UK.AC.CAM.ENG.DSL>
Subject:     Re: RBF etc

Thanks for your Re!
 
[some stuff deleted]
 
> > I think  that in this case the best we can do is to look at the combinatoric
al
> > problem of how many partitions of the n-dimensional Euclidean space
> > can be obtained using N (proposed shape) boundaries.
>
> It is not how many different partitions; I think our problem in pattern
> classification is dealing with breakpoints of class boundary. It is this
> capability that is the power in MLPs (and RBFs). In a two class problem,
> we still partition the input space into two using N boundary segments
> (or splines), with N-1 break-points.
 
Sure, I agree.  But if you address the question of how many hidden units
of a determined type you need to classify the input vector into one of
N distinct classes, and consider it a rough measure of the complexity of
the boundary class proposed for the units, then the problem seems to be
the one of partitioning the input space.  Note that I don't care about
the nature of the class shapes in real world problems, in this case I must
agree with you that the issue of breakpoints of the class boundary becomes
of real importance.
 
[...]
>
> I dont know what VC-dimension is. Any reference please?
>
 
An earlier draft is "Classifying Learnable Geometric Concepts with the
Vapnik-Chervonenkis Dimension" by D. Haussler et al, at FOCS '86,
pp 273-282.  But if you don't know what the Valiant's lernability model
is take a look at "A Theory of the Learnable" by L. Valiant, CACM 27(11),
1984, pp 1134-42.  The original article by Vapnik and Chervonenkis is
"On the Uniform Convergence of Relative Frequencies of Events to their
 Probabilities", Th. Prob. and its Appl., 16(2), 1971, pp 264-80.  More
up-to-date papers dealing with the VC-dimension can be found at the
Proc. of the first Workshop on Computational Learning Theory, COLT '88,
held at MIT last June.
 
--Dario.

=========================== THE END =====================================



More information about the Connectionists mailing list