Technical Report Series in Neural and Computational Learning

John Shawe-Taylor john at dcs.rhbnc.ac.uk
Fri Feb 2 11:45:57 EST 1996


The European Community ESPRIT Working Group in Neural and Computational 
Learning Theory (NeuroCOLT) has produced a set of new Technical Reports
available from the remote ftp site described below. They cover topics in
real valued complexity theory, computational learning theory, and analysis
of the computational power of continuous neural networks.  Abstracts are
included for the titles.

*** Please note that the location of the files was changed at the beginning of
** the year, so that any copies you have of the previous instructions should be 
* discarded. The new location and instructions are given at the end of the list.


----------------------------------------
NeuroCOLT Technical Report NC-TR-96-030:
----------------------------------------
Exponentially many local minima for single neurons
by  Peter Auer, University of California, Santa Cruz, USA,
    Mark Herbster, University of California, Santa Cruz, USA,
    Manfred K. Warmuth, University of California, Santa Cruz, USA

Abstract:
We show that for a single neuron with the logistic function as the
transfer function the number of local minima of the error function
based on the square loss can grow exponentially in the dimension.

----------------------------------------
NeuroCOLT Technical Report NC-TR-96-031:
----------------------------------------
An Efficient Implementation of Sigmoidal Neural Nets in Temporal
    Coding with Noisy Spiking Neurons
by  Wolfgang Maass, Institute for Theoretical Computer Science,
        Technische Universitaet Graz, Austria

Abstract:
We show that networks of rather realistic models for biological neurons
can in principle simulate arbitrary feedforward sigmoidal neural nets
in a way which has previously not been considered.  This new approach
is based on temporal coding by single spikes (respectively by the
timing of synchronous firing in pools of neurons), rather than on the
traditional interpretation of analog variables in terms of firing
rates. The resulting new simulation is substantially faster and
apparently more consistent with experimental results about fast
information processing in cortical neural systems.
As a consequence we can show that networks of noisy spiking neurons are
"universal approximators" in the sense that they can approximate with
regard to temporal coding {\it any} given continuous function of
several variables.
Our new proposal for the possible organization of computations in
biological neural systems has some interesting consequences for the
type of learning rules that would be needed to explain the
self-organization of such neural circuits.
Finally, our fast and noise-robust implementation of sigmoidal neural
nets via temporal coding points to possible new ways of implementing
sigmoidal neural nets with pulse stream VLSI.


----------------------------------------
NeuroCOLT Technical Report NC-TR-96-032:
----------------------------------------
A Framework for Stuctural Risk Minimisation
by  John Shawe-Taylor, Royal Holloway, University of London, UK
    Peter Bartlett, RSISE, Australian National University, Australia
    Robert Williamson, Australian National University, Australia
    Martin Anthony, London School of Economics, UK

Abstract:
The paper introduces a framework for studying structural risk
minimisation. The model views structural risk minimisation in a PAC
context. It then generalises to the case when the hierarchy of classes
is chosen in response to the data, hence explaining the impressive
performance of the maximal margin hyperplane algorithm of Vapnik.


----------------------------------------
NeuroCOLT Technical Report NC-TR-96-033:
----------------------------------------
Learning to Compress Ergodic Sources
by  Jonathan Baxter, Royal Holloway, University of London and London School
    of Economics, UK
    John Shawe-Taylor, Royal Holloway, University of London, UK

Abstract: 
We present an adaptive coding technique which is shown to achieve the optimal
coding in the limit as the size of the text grows, while the data structures
associated with the code only grow linearly with the text. The approach
relies on Huffman codes which are generated relative to the context in which
a particular character occurs. The Huffman codes themselves are inferred from
the data that has already been seen. A key part of the paper involves
showing that the loss per character incurred by the learning process tends
to zero as the size of the text tends to infinity.


----------------------------------------
NeuroCOLT Technical Report NC-TR-96-034:
----------------------------------------
Theory and Applications of Agnostic PAC-Learning with Small Decision Trees
by  Peter Auer, University of California, Santa Cruz, USA,
    Mark Herbster, University of California, Santa Cruz, USA,
    Robert C. Holte, University of Ottawa, Canada,
    Wolfgang Maass, Technische Universitaet Graz, Austria

Abstract:
We exhibit a theoretically founded algorithm $\Tii$ for agnostic
PAC-learning of decision trees of at most 2 levels, whose computation
time is almost linear in the size of the training set. We evaluate the
performance of this learning algorithm $\Tii$ on 15 common
``real-world'' datasets, and show that for most of these datasets
$\Tii$ provides simple decision trees with little or no loss in
predictive power (compared with C4.5).  In fact, for datasets with
continuous attributes its error rate tends to be lower than that of
C4.5.  To the best of our knowledge this is the first time that a
PAC-learning algorithm is shown to be applicable to ``real-world''
classification problems.
Since one can {\em prove} that $\Tii$ is an agnostic PAC-learning
algorithm, $\Tii$ is {\em guaranteed} to produce close to optimal
2-level decision trees from sufficiently large training sets for {\em
any} (!) distribution of data. In this regard $\Tii$ differs strongly
from all other learning algorithms that are considered in applied
machine learning, for which no {\em guarantee} can be given about their
performance on {\em new } datasets.
We also demonstrate that this algorithm $\Tii$ can be used as a
diagnostic tool for the investigation of the expressive limits of
2-level decision trees.  Finally, T2, in combination with new bounds on
the VC-dimension of decision trees of bounded depth that we derive,
provides us now for the first time with
 the tools necessary for comparing learning curves of decision trees
for ``real-world'' datasets with the theoretical estimates of
PAC-learning theory.


----------------------------------------
NeuroCOLT Technical Report NC-TR-96-035:
----------------------------------------
A recurrent network that performs a context-sensitive prediction task
by  Mark Steijvers, Indiana University,
    Peter Gr\"{u}nwald, CWI, Dept. of Algorithmics, The Netherlands

Abstract:
We address the problem of processing a context-sensitive language with
a recurrent neural network (RN). So far, the language processing
capabilities of RNs have only been investigated for regular and
context-free languages. We present an extremely simple RN with only one
parameter for its two hidden nodes that can perform a prediction task
on sequences of symbols from the language $\{ (ba^{k})^n \mid k \geq 0,
n > 0 \}$, a language that is context-sensitive but not context-free.
The input to the RN consists of any string of the language, one symbol
at a time. The network should then, at all times, predict the symbol
that should follow. This means that the network must be able to count
the number of $a$'s in the first subsequence and to retain this number
for future use. Our network can solve the task for $k=1$ up to $k=120$.
The network represents the count of $a$'s in the subsequence by having
different limit cycles for every different number of $a$'s counted. The
limit cycles are related in such a way that the representation of
network states in which an $a$ should be predicted are linearly
separable from those in which a $b$ should be predicted. Our work shows
that connectionism in general can handle more complex formal languages
than was previously known.


----------------------------------------
NeuroCOLT Technical Report NC-TR-96-036:
----------------------------------------
Tight worst-case loss bounds for predicting with expert advice
by  David Haussler, University of California, Santa Cruz, USA,
    Jyrki Kivinen, University of Helsinki, Finland,
    Manfred K. Warmuth,  University of California, Santa Cruz, USA

Abstract:
We consider on-line algorithms for predicting binary or
continuous-valued outcomes, when the algorithm has available the
predictions made by $N$ experts.  For a sequence of trials, we compute
total losses for both the algorithm and the experts under a loss
function.  At the end of the trial sequence, we compare the total loss
of the algorithm to the total loss of the best expert, \ie, the expert
with the least loss on the particular trial sequence.  For a large
class of loss functions, with binary outcomes the total loss of the
algorithm proposed by Vovk exceeds the total loss of the best expert at
most by the amount $c\ln N$, where $c$ is a constant determined by the
loss function.  This upper bound does not depend on any assumptions on
how the experts' predictions or the outcomes are generated, and the
trial sequence can be arbitrarily long.  We give a straightforward
method for finding the correct value $c$ and show by a lower bound that
for this value of $c$, the upper bound is asymptotically tight.  The
lower bound is based on a probabilistic adversary argument.  The class
of loss functions for which the $c\ln N$ upper bound holds includes the
square loss, the logarithmic loss, and the Hellinger loss.  We also
consider another class of loss functions, including the absolute loss,
for which we have an $\Omega\left(\sqrt{\ell\log N}\right)$ lower
bound, where $\ell$ is the number of trials.  We show that for the
square and logarithmic loss functions, Vovk's algorithm achieves the
same worst-case upper bounds with continuous-valued outcomes as with
binary outcomes.  For the absolute loss, we show how bounds earlier
achieved for binary outcomes can be achieved with continuous-valued
outcomes using a slightly more complicated algorithm.


----------------------------------------
NeuroCOLT Technical Report NC-TR-96-037:
----------------------------------------
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
by  Jyrki Kivinen, University of Helsinki, Finland,
    Manfred K. Warmuth,  University of California, Santa Cruz, USA

Abstract:
We consider two algorithm for on-line prediction based on a linear
model.  The algorithms are the well-known gradient descent ($\GD$)
algorithm and a new algorithm, which we call $\EGpm$.  They both
maintain a weight vector using simple updates.  For the $\GD$
algorithm, the update is based on subtracting the gradient of the
squared error made on a prediction.  The $\EGpm$ algorithm uses the
components of the gradient in the exponents of factors that are used in
updating the weight vector multiplicatively.  We present worst-case
loss bounds for $\EGpm$ and compare them to previously known bounds for
the $\GD$ algorithm.  The bounds suggest that the losses of the
algorithms are in general incomparable, but $\EGpm$ has a much smaller
loss if only few components of the input are relevant for the
predictions.  We have performed experiments, which show that our
worst-case upper bounds are quite tight already on simple artificial
data.


--------------------------------------------------------------------

***************** ACCESS INSTRUCTIONS ******************

The Report NC-TR-96-001 can be accessed and printed as follows 

% ftp ftp.dcs.rhbnc.ac.uk  (134.219.96.1)
Name: anonymous
password: your full email address
ftp> cd pub/neurocolt/tech_reports
ftp> binary
ftp> get nc-tr-96-001.ps.Z
ftp> bye
% zcat nc-tr-96-001.ps.Z | lpr -l

Similarly for the other technical reports.

Uncompressed versions of the postscript files have also been
left for anyone not having an uncompress facility. 

In some cases there are two files available, for example,
nc-tr-96-002-title.ps.Z
nc-tr-96-002-body.ps.Z
The first contains the title page while the second contains the body 
of the report. The single command,
ftp> mget nc-tr-96-002*
will prompt you for the files you require.

A full list of the currently available Technical Reports in the 
Series is held in a file `abstracts' in the same directory.

The files may also be accessed via WWW starting from the NeuroCOLT 
homepage (note that this is undergoing some corrections and may be 
temporarily inaccessible):

http://www.dcs.rhbnc.ac.uk/neural/neurocolt.html


Best wishes
John Shawe-Taylor




More information about the Connectionists mailing list