Technical Report Series in Neural and Computational Learning

Sat Dec 10 07:18:37 EST 1994

The European Community ESPRIT Working Group in Neural and Computational 
       Learning Theory (NeuroCOLT): one new report available

----------------------------------------
NeuroCOLT Technical Report NC-TR-94-011:
----------------------------------------
Valid Generalisation from Approximate Interpolation

by  Martin Anthony, Department of Mathematics, The London School of Economics
    Peter Bartlett, Research School of Information Sciences and  Engineering,
       Australian National University
    Yuval Ishai, Department of Computer Science, Technion
    John Shawe-Taylor, Department of Computer Science, Royal Holloway, 
       University of London

Abstract:
Let $\H$ and $\C$ be  sets of functions from domain $X$ to the reals. We say
that $\H$ validly generalises $\C$ from approximate interpolation if and only if
for each $\eta>0$ and $\epsilon, \delta \in (0,1)$ there is a number
$m_0(\eta,\epsilon, \delta)$ such that for any function $t \in \C$ and any
probability distribution $P$ on $X$,  if $m \ge m_0$ then with $P^m$-probability
at least $1-\delta$, a sample $\vx =(x_1, x_2, \dots, x_m) \in X^m$ satisfies
$$\forall h \in \H, \,
|h(x_i) - t(x_i)|< \eta, \,(1 \le i \le m) \Longrightarrow
 P(\{x: |h(x) -t(x)| \ge \eta\}) < \epsilon.$$
We find conditions that are necessary and sufficient for $\H$ to validly
generalise $\C$ from approximate interpolation, and we obtain bounds on the
sample length $m_0(\eta, \epsilon, \delta)$ in terms of various parameters
describing the expressive power of $\H$.

----------------------------------------
NeuroCOLT Technical Report NC-TR-94-022:
----------------------------------------
Sample Sizes for Sigmoidal Neural Networks
by John Shawe-Taylor, Department of Computer Science, Royal Holloway
       University of London

Abstract:
This paper applies the theory of Probably Approximately Correct (PAC)
learning to feedforward neural networks with sigmoidal activation
functions.  Despite the best known upper bound on the VC dimension of
such networks being $O((WN)^2)$, for $W$ parameters and $N$
computational nodes, it is shown that the asymptotic bound on the sample
size required for learning with increasing accuracy $1 - \epsilon$ and
decreasing probability of failure $\delta$ is
$$O((1/\epsilon)(W\log(1/\epsilon) + (WN)^2 + \log(1/\delta)).$$ 
For practical values of $\epsilon$ and $\delta$ the formula obtained for
the sample sizes is a factor $2\log(2e/\epsilon)$ smaller than a naive
use of the VC dimension result would give. Similar results are obtained
for learning where the hypothesis is only guaranteed to correctly
classify a given proportion of the training sample. The results are
formulated in general terms and show that for many learning classes
defined by smooth functions thresholded at the output, the sample size
for a class with VC-dimension $d$ and $\ell$ parameters is
$O((1/\epsilon)(\ell\log(1/\epsilon) + o(\log(1/\epsilon))d +
\log(1/\delta))$.  

-----------------------
The Report NC-TR-94-011 can be accessed and printed as follows 

% ftp cscx.cs.rhbnc.ac.uk  (134.219.200.45)
Name: anonymous
password: your full email address
ftp> cd pub/neurocolt/tech_reports
ftp> binary
ftp> get nc-tr-94-011.ps.Z
ftp> bye
% zcat nc-tr-94-011.ps.Z | lpr -l

Similarly for the other technical reports.

Uncompressed versions of the postscript files have also been
left for anyone not having an uncompress facility.

A full list of the currently available Technical Reports in the Series is held in a file
`abstracts' in the same directory.

Best wishes
John Shawe-Taylor