Radial basis functions etc.
M. Niranjan
niranjan%digsys.engineering.cambridge.ac.uk at NSS.Cs.Ucl.AC.UK
Wed Nov 23 07:42:17 EST 1988
Some recent comments on RBFs that you might find interesting!
niranjan
============================== ONE =====================================
Date: Wed, 16 Nov 88 09:41:13 +0200
From: Dario Ringach <dario at earn.techunix>
To: ajr <ajr%uk.ac.cambridge.engineering.digsys at uk.ac.ucl.cs.nss>
Subject: Comments on TR.25
Thanks a lot for the papers!
I'd like to share a few thoughts on TR.25 "Generalising the Nodes of
the Error Propagation Network". What are the advantages of choosing
radial basis functions (or Gaussian nodes) in *general* discrimination
tasks? It seems clear to me, that the results presented in Table 1
are due to the fact that the spectral distribution of steady state
vowels can be closely represented by normal/radial distributions. If
I have no a-priori information about the distribution of the classes
then how can I know which kind of nodes will perform better? I think
that in this case the best we can do is to look at the combinatorical
problem of how many partitions of the n-dimensional Euclidean space
can be obtained using N (proposed shape) boundaries. This is closely
related to obtaining the Vapnik-Chervonenkis Dimension of the boundary
class. In the case of n-dimensional hyperplanes and hypershperes,
both have VC-dimension n+1, so I think there is really no difference
in using hyperplanes or hyperspheres in *general* discrimination
problems. Don't you agree?
Thanks again for the papers!
Dario
============================== TWO =====================================
From: M. Niranjan <niranjan>
Date: Mon, 21 Nov 88 14:03:16 GMT
To: dario at bitnet.techunix
Subject: RBF etc
Cc: ajr, dr, fallside, idb, jpmg, lwc, mdp, mwhc, niranjan, psts, rwp, tpm,
visakan, vsa10
With RBFs of the Gaussian type, the class conditional density function
is approximated by a mixture of multiple Gaussians. But the parameters
of the mixture are estimated to maximise the discrimination rather than
modelling the individual probability densities.
> If I have no a-priori information about the distribution of the classes
> then how can I know which kind of nodes will perform better?
There is no way other than by a set of experiments. In small scale
problems, we can probably plot cross sections of the feature space, or
even projections of it on a linear discriminant plane and get some
rough idea.
> problem of how many partitions of the n-dimensional Euclidean space
> can be obtained using N (proposed shape) boundaries.
It is not how many different partitions; I think our problem in pattern
classification is dealing with breakpoints of class boundary. It is this
capability that is the power in MLPs (and RBFs). In a two class problem,
we still partition the input space into two using N boundary segments
(or splines), with N-1 break-points.
What I like about RBFs is that you can have a probabilistic interpretation.
With standard MLPs this is not very obvious and what happens is more like a
functional interpolation.
> both have VC-dimension n+1, so I think there is really no difference
I dont know what VC-dimension is. Any reference please?
Best wishes
niranjan
============================ THREE =======================================
Date: Tue, 22 Nov 88 08:22:53 +0200
From: Dario Ringach <dario at EARN.TECHUNIX>
To: M. Niranjan <niranjan at UK.AC.CAM.ENG.DSL>
Subject: Re: RBF etc
Thanks for your Re!
[some stuff deleted]
> > I think that in this case the best we can do is to look at the combinatoric
al
> > problem of how many partitions of the n-dimensional Euclidean space
> > can be obtained using N (proposed shape) boundaries.
>
> It is not how many different partitions; I think our problem in pattern
> classification is dealing with breakpoints of class boundary. It is this
> capability that is the power in MLPs (and RBFs). In a two class problem,
> we still partition the input space into two using N boundary segments
> (or splines), with N-1 break-points.
Sure, I agree. But if you address the question of how many hidden units
of a determined type you need to classify the input vector into one of
N distinct classes, and consider it a rough measure of the complexity of
the boundary class proposed for the units, then the problem seems to be
the one of partitioning the input space. Note that I don't care about
the nature of the class shapes in real world problems, in this case I must
agree with you that the issue of breakpoints of the class boundary becomes
of real importance.
[...]
>
> I dont know what VC-dimension is. Any reference please?
>
An earlier draft is "Classifying Learnable Geometric Concepts with the
Vapnik-Chervonenkis Dimension" by D. Haussler et al, at FOCS '86,
pp 273-282. But if you don't know what the Valiant's lernability model
is take a look at "A Theory of the Learnable" by L. Valiant, CACM 27(11),
1984, pp 1134-42. The original article by Vapnik and Chervonenkis is
"On the Uniform Convergence of Relative Frequencies of Events to their
Probabilities", Th. Prob. and its Appl., 16(2), 1971, pp 264-80. More
up-to-date papers dealing with the VC-dimension can be found at the
Proc. of the first Workshop on Computational Learning Theory, COLT '88,
held at MIT last June.
--Dario.
=========================== THE END =====================================
More information about the Connectionists
mailing list