Technical Report Series in Neural and Computational Learning
John Shawe-Taylor
john at dcs.rhbnc.ac.uk
Tue Jan 23 10:35:55 EST 1996
The European Community ESPRIT Working Group in Neural and Computational
Learning Theory (NeuroCOLT) has produced a set of new Technical Reports
available from the remote ftp site described below. They cover topics in
real valued complexity theory, computational learning theory, and analysis
of the computational power of continuous neural networks. Abstracts are
included for most of the titles.
*** Please note that the location of the files has been changed so that
*** any copies you have of the previous instructions should be discarded.
*** The new location and instructions are given at the end of the list.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-001:
----------------------------------------
On digital nondeterminism
by Felipe Cucker, Universitat Pompeu Fabra, Spain
Martin Matamala, Universidad de Chile, Chile
No abstract available.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-002:
----------------------------------------
Complexity and Real Computation: A Manifesto
by Lenore Blum, International Computer Science Institute, Berkeley, USA
Felipe Cucker, Universitat Pompeu Fabra, Spain
Mike Shub, IBM T.J. Watson Research Center, New York, USA
Steve Smale, University of California, USA
Abstract: Finding a natural meeting ground between the highly
developed complexity theory of computer science -- with its historical
roots in logic and the discrete mathematics of the integers -- and the
traditional domain of real computation, the more eclectic less foundational
field of numerical analysis -- with its rich history and longstanding
traditions in the continuous mathematics of analysis -- presents a
compelling challenge. Here we illustrate the issues and pose our
perspective toward resolution.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-003:
----------------------------------------
Models for Parallel Computation with Real Numbers
by F. Cucker, Universitat Pompeu Fabra, Spain
J.L. Montana, Universidad de Cantabria, Spain
L.M. Pardo, Universidad de Cantabria, Spain
Abstract:
This paper deals with two models for parallel computations over the
reals. On the one hand, a generalization of the real Turing machine
obtained by assembling a polynomial number of such machines that work
together in polylogarithmic time (more or less like a PRAM in the
Boolean setting) and, on the other hand, a model consisting of families
of algebraic circuits generated in some uniform way. We show that the
classes defined by these two models are related by a chain of
inclusions and that some of these inclusions are strict.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-004:
----------------------------------------
Nash Trees and Nash Complexity
by Felipe Cucker, Universitat Pompeu Fabra, Spain
Thomas Lickteig, Universit\"at Bonn, Germany
Abstract:
Numerical analysis computational problems such as Cholesky decomposition
of a positive definite matrix, or unitary transformation of a complex
matrix into upper triangular form (for instance by the Householder
algorithm), require algorithms that use also ``non-arithmetical'' operations
such as square roots. The aim of this paper is twofold:
1. Generalizing the notions of arithmetical semi-algebraic decision trees
and computation trees (that is, with outputs) we suggest a definition of
Nash trees and Nash straight line programs (SLPs), necessary to formalize
and analyse numerical analysis algorithms and their complexity as mentioned
above. These trees and SLPs have a Nash operational signature $N^R$ over
a real closed field $R$. Based on the sheaf of abstract Nash functions over
the real spectrum of a ring as introduced by M.-F. Roy, we propose a
category nash_R of partial (homogeneous) N^R-algebras in which these Nash
operations make sense in a natural way.
2. Using this framework, in particular the execution of $N^R$-SLPs in
appropriate $N^R$-algebras, we extend the degree-gradient lower bound to
Nash decision complexity of the membership problem of co-one-dimensional
semi-algebraic subsets of open semi-algebraic subsets.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-005:
----------------------------------------
On the computational power and super-Turing capabilities of dynamical systems
by Olivier Bournez, Department LIP, ENS-Lyon, France
Michel Cosnard, Department LIP, ENS-Lyon, France
Abstract:
We explore the simulation and computational capabilities of dynamical
systems. We first introduce and compare several notions of simulation
between discrete systems. We give a general framework that allows
dynamical systems to be considered as computational machines. We
introduce a new discrete model of computation: the analog automaton
model. We determine the computational power of this model and prove
that it does have super-Turing capabilities. We then prove that many
very simple dynamical systems from the literature are actually able to
simulate analog automata. From this result we deduce that many
dynamical systems have intrinsically super-Turing capabilities.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-006:
----------------------------------------
Finite Sample Size Results for Robust Model Selection; Application to
Neural Networks
by Joel Ratsaby, Technion, Israel
Ronny Meir, Technion, Israel
Abstract:
The problem of model selection in the face of finite sample size is
considered within the framework of statistical decision theory.
Focusing on the special case of regression, we introduce a model
selection criterion which is shown to be robust in the sense that, with
high confidence, even for a finite sample size it selects the best
model. Our derivation is based on uniform convergence methods,
augmented by results from the theory of function approximation, which
permit us to make definite probabilistic statements about the finite
sample behavior. These results stand in contrast to classical
approaches, which can only guarantee the asymptotic optimality of the
choice. The criterion is demonstrated for the problem of model
selection in feedforward neural networks.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-007:
----------------------------------------
On the structure of $\npoly{C}$
by Gregorio Malajovich,
Klaus Meer, RWTH Aachen, Germany
Abstract:
This paper deals with complexity classes $\poly{C}$ and $\npoly{C}$, as
they were introduced over the complex numbers by Blum, Shub and Smale.
Under the assumption $\poly{C} \ne \npoly{C}$ the existence of
non-complete problems in $\npoly{C}$~, not belonging to $\poly{C}$~, is
established.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-008:
----------------------------------------
Dynamic Recurrent Neural Networks: a Dynamical Analysis
by Jean-Philippe DRAYE, Davor PAVISIC, Facult\'{e} Polytechnique de Mons,
Belgium,
Guy CHERON, Ga\"{e}tan LIBERT, University of Brussels, Belgium
Abstract:
In this paper, we explore the dynamical features of a neural network
model which presents two types of adaptative parameters~: the classical
weights between the units and the time constants associated with each
artificial neuron. The purpose of this study is to provide a strong
theoretical basis for modeling and simulating dynamic recurrent neural
networks. In order to achieve this, we study the effect of the
statistical distribution of the weights and of the time constants on
the network dynamics and we make a sta tistical analysis of the neural
transformation. We examine the network power spectra (to draw some
conclusions over the frequent ial behavior of the network) and we
compute the stability regions to explore the stability of the model.
We show that the network is sensitive to the variations of the mean
values of th e weights and the time constants (because of the temporal
aspects of the learned tasks). Nevertheless, our results highlight the
improvements in the network dynamics due to the introduction of
adaptative time constants and indicate that dynamic recu rrent neural
networks can bring new powerful features in the field of neural
computing.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-009:
----------------------------------------
Scale-sensitive Dimensions, Uniform Convergence, and Learnability
by Noga Alon, Tel Aviv University (ISRAEL),
Shai Ben-David, Technion, (ISRAEL),
Nicol\`o Cesa-Bianchi, DSI, Universit\`a di Milano,
David Haussler, UC Santa Cruz, (USA)
Abstract:
Learnability in Valiant's PAC learning model has been shown to be
strongly related to the existence of uniform laws of large numbers.
These laws define a distribution-free convergence property of means to
expectations uniformly over classes of random variables. Classes of
real-valued functions enjoying such a property are also known as
uniform Glivenko-Cantelli classes. In this paper we prove, through a
generalization of Sauer's lemma that may be interesting in its own
right, a new characterization of uniform Glivenko-Cantelli classes.
Our characterization yields Dudley, Gin\'e, and Zinn's previous
characterization as a corollary. Furthermore, it is the first based on
a simple combinatorial quantity generalizing the Vapnik-Chervonenkis
dimension. We apply this result to obtain the weakest combinatorial
condition known to imply PAC learnability in the statistical regression
(or ``agnostic'') framework. Furthermore, we show a characterization
of learnability in the probabilistic concept model, solving an open
problem posed by Kearns and Schapire. These results show that the
accuracy parameter plays a crucial role in determining the effective
complexity of the learner's hypothesis class.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-010:
----------------------------------------
On-line Prediction and Conversion Strategies
by Nicol\`o Cesa-Bianchi, DSI, Universit\`a di Milano,
Yoav Freund, AT\&T Bell Laboratories,
David P.\ Helmbold, University of California, Santa Cruz,
Manfred K.\ Warmuth, University of California, Santa Cruz
Abstract:
We study the problem of deterministically predicting boolean values by
combining the boolean predictions of several experts. Previous on-line
algorithms for this problem predict with the weighted majority of the
experts' predictions. These algorithms give each expert an exponential
weight $\beta^m$ where $\beta$ is a constant in $[0,1)$ and $m$ is the
number of mistakes made by the expert in the past. We show that it is
better to use sums of binomials as weights. In particular, we present
a deterministic algorithm using binomial weights that has a better
worst case mistake bound than the best deterministic algorithm using
exponential weights. The binomial weights naturally arise from a
version space argument. We also show how both exponential and binomial
weighting schemes can be used to make prediction algorithms robust
against noise.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-011:
----------------------------------------
Worst-case Quadratic Loss Bounds for Prediction Using Linear Functions
and Gradient Descent
by Nicol\`o Cesa-Bianchi, DSI, Universit\`a di Milano,
Philip M. Long, Duke University,
Manfred K. Warmuth, UC Santa Cruz
Abstract:
In this paper we study the performance of gradient descent when applied
to the problem of on-line linear prediction in arbitrary inner product
spaces. We show worst-case bounds on the sum of the squared prediction
errors under various assumptions concerning the amount of {\it a
priori} information about the sequence to predict. The algorithms we
use are variants and extensions of on-line gradient descent. Whereas
our algorithms always predict using linear functions as hypotheses,
none of our results requires the data to be linearly related. In fact,
the bounds proved on the total prediction loss are typically expressed
as a function of the total loss of the best fixed linear predictor with
bounded norm. All the upper bounds are tight to within constants.
Matching lower bounds are provided in some cases. Finally, we apply
our results to the problem of on-line prediction for classes of smooth
functions.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-012:
----------------------------------------
Using Bayesian Methods for Avoiding Overfitting and for Ranking Networks
in Multilayer Perceptrons Learning
by Michel de Bollivier, EC Joint Research Centre, Italy,
Domenico Perrotta, EC Joint Research Centre and Ecole Normale
Sup\'{e}rieure de Lyon, France
Abstract:
This work is an experimental attempt to determine whether the Bayesian
paradigm could improve Multi-Layer Perceptrons (MLPs) learning methods.
In particular, we exper iment here the paradigm developed by D. MacKay
(1992). The paper points out the main or critical points of MacKay's
work and introduces very practical points of Bayesian MLPs, having in
mind future applications. Then, Bayesian MLPs are used on three public
classification databases and compar ed to other methods.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-013:
----------------------------------------
Lower Bounds for the Computational Power of Networks of Spiking Neurons
by Wolfgang Maass, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Abstract:
We investigate the computational power of a formal model for networks
of spiking neurons. It is shown that simple operations on
phase-differences between spike-trains provide a very powerful
computational tool that can in principle be used to carry out highly
complex computations on a small network of spiking neurons. We
construct networks of spiking neurons that simulate arbitrary threshold
circuits, Turing machines, and a certain type of random access machines
with real valued inputs. We also show that relatively weak basic
assumptions about the response- and threshold-functions of the spiking
neurons are sufficient in order to employ them for such computations.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-014:
----------------------------------------
Analog Computations on Networks of Spiking Neurons
by Wolfgang Maass, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Abstract:
We characterize the class of functions with real-valued input and
output which can be computed by networks of spiking neurons with
piecewise linear response- and threshold-functions and unlimited timing
precision. We show that this class coincides with the class of
functions computable by recurrent analog neural nets with piecewise
linear activation functions, and with the class of functions computable
on a certain type of random access machine (N-RAM) which we introduce
in this article. This result is proven via constructive real-time
simulations. Hence it provides in particular a convenient method for
constructing networks of spiking neurons that compute a given
real-valued function $f$: it now suffices to write a program for
constructing networks of spiking neurons that compute a given
real-valued function $f$: it now suffices to write a program for
computing $f$ on an N-RAM; that program can be ``automatically''
transformed into an equivalent network of spiking neurons (by our
simulation result).
Finally, one learns from the results of this paper that certain very
simple piecewise linear response- and threshold-functions for spiking
neurons are {\it universal}, in the sense that neurons with these
particular response- and threshold-functions can simulate networks of
spiking neurons with {\it arbitrary} piecewise linear response- and
threshold-functions. The results of this paper also show that
certain very simple piecewise linear activation functions are in a
corresponding sense universal for recurrent analog neural nets.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-015:
----------------------------------------
Vapnik-Chervonenkis Dimension of Neural Nets
by Wolfgang Maass, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Abstract:
We will survey in this article the most important known bounds for the
VC-dimension of neural nets that consist of linear threshold gates
(section 2) and for the case of neural nets with real-valued activation
functions (section 3). In section 4 we discuss a generalization of the
VC-dimension for neural nets with non-boolean network-output. With
regard to a discussion of the VC-dimension of models for networks of
{\it spiking neurons} we refer to Maass (1994).
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-016:
----------------------------------------
On the Computational Power of Noisy Spiking Neurons
by Wolfgang Maass, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Abstract:
This article provides some first results about the computational power
of neural networks that are based on a neuron model which is
acceptable to many neurobiologists as being reasonably realistic for a
biological neuron.
Biological neurons communicate via spike-trains, i.e. via sequences of
stereotyped pulses (``spikes'') that encode information in their
time-differences (``temporal coding''). In addition it is wellknown
that biological neurons are quite ``noisy'', i.e. the precise times
when they ``fire'' (and thereby issue a spike) depend not only on the
incoming spike-trains, but also on various types of ``noise''.
It has remained unknown whether one can in principle carry out reliable
digital computations with noisy spiking neurons. This article presents
rigorous constructions for simulating in real-time arbitrary given
boolean circuits and finite automata with arbitrarily high reliability
by networks of noisy spiking neurons.
In addition we show that with the help of ``shunting inhibition'' such
networks can simulate in real-time any McCulloch-Pitts neuron (or
``threshold gate''), and therefore any multilayer perceptron (or
``threshold circuit'') in a reliable manner. These constructions
provide a possible explanation for the fact that biological neural
systems can carry out quite complex computations within 100 msec.
It turns out that the assumption that these constructions require about
the shape of the EPSP's and the behaviour of the noise are surprisingly
weak.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-017:
----------------------------------------
Die Komplexit\"at des Rechnens und Lernens mit neuronalen Netzen --
Ein Kurzf\"uhrer
by Michael Schmitt, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Abstract:
This is a very short guide to the basic concepts of the theory of
computing and learning with neural networks with emphasis on
computational complexity. Fundamental results on circuit complexity of
neural networks and PAC-learning are mentioned but no proofs are given.
A list of references to the most important and most recent books in the
field is included. The report was written in German on the occasion of
giving a course at the Autumn School in Connectionism and Neural
Networks ``HeKoNN 95'' in M\"unster.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-018:
----------------------------------------
Tracking the best disjunction
by Peter Auer, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Manfred Warmuth, University of California at Santa Cruz, USA
Abstract:
Littlestone developed a simple deterministic on-line learning algorithm
for learning $k$-literal disjunctions. This algorithm (called Winnow)
keeps one weight for each of the $n$ variables and does multiplicative
updates to its weights. We develop a randomized version of Winnow and
prove bounds for an adaptation of the algorithm for the case when the
disjunction may change over time. In this case a possible target {\em
disjunction schedule} $\Tau$ is a sequence of disjunctions (one per
trial) and the {\em shift size} is the total number of literals that
are added/removed from the disjunctions as one progresses through the
sequence.
We develop an algorithm that predicts nearly as well as the best
disjunction schedule for an arbitrary sequence of examples. This
algorithm that allows us to track the predictions of the best
disjunction is hardly more complex than the original version. However
the amortized analysis needed for obtaining worst-case mistake bounds
requires new techniques. In some cases our lower bounds show that the
upper bounds of our algorithm have the right constant in front of the
leading term in the mistake bound and almost the right constant in
front of the second leading term. By combining the tracking capability
with existing applications of Winnow we are able to enhance these
applications to the shifting case as well.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-019:
----------------------------------------
Learning Nested Differences in the Presence of Malicious Noise
by Peter Auer, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Abstract:
We investigate the learnability of nested differences of
intersection-closed classes in the presence of malicious noise.
Examples of intersection-closed classes include axis-parallel
rectangles, monomials, linear sub-spaces, and so forth. We present an
on-line algorithm whose mistake bound is optimal in the sense that
there are concept classes for which each learning algorithm (using
nested differences as hypotheses) can be forced to make at least that
many mistakes. We also present an algorithm for learning in the PAC
model with malicious noise. Surprisingly enough, the noise rate
tolerable by these algorithms does not depend on the complexity of the
target class but depends only on the complexity of the underlying
intersection-closed class.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-020:
----------------------------------------
Characterizing the Learnability of Kolmogorov Easy Circuit Expressions
by Jos\'e L. Balc\'azar, Universitat Polit\'ecnica de Catalunya, Spain
Harry Buhrman, Centrum voor Wiskunde en Informatica, the Netherlands
Abstract:
We show that Kolmogorov easy circuit expressions can be learned with
membership queries in polynomial time if and only if every NE-predicate
is E-solvable. Moreover we show that the previously known algorithm,
that uses an oracle in NP, is optimal in some relativized world.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-021:
----------------------------------------
T2 - Computing optimal 2-level decision tree
by Peter Auer, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
*** Note: This is a C program available in tarred (compressed) format.
Description:
This is a short description of the T2 program discussed in
P. Auer, R.C. Holte, and W. Maass. Theory and applications of
agnostic PAC-learning with small decision trees.
In Proc. 7th Int. Machine Learning Conf., Tahoe City (USA), 1995.
Please see the paper for a description of the algorithm and a
discussion of the results.
(There is a typo in the paper in Table 2: The Sky2 value for HE is
89.0% instead of 91.0%.)
T2 calculates optimal decision trees up to depth 2. T2 accepts exactly
the same input as C4.5, consisting of a name-file, a data-file, and an
optional test-file. The output of TREE2 is a decision tree similar to
the decision trees of C4.5, but there are some differences.
T2 uses two kinds of decision nodes: (1) discrete splits on an
discrete attribute where the node has as many branches as there are
possible attribute values, and (2) interval splits of continuous
attributes. A node which performs an interval split divides the real
line into intervals and has as many branches as there are
intervals. The number of intervals is restricted to be (a) at most
MAXINTERVALS if all the branches of the decision node lead to leaves,
and to be (b) at most 2 otherwise. MAXINTERVALS can be set by the user.
The attribute value ``unknown'' is treated as a special attribute
value. Each decision node (discrete or continuous) has an additional
branch which takes care of unknown attribute values.
T2 builds the decision tree satisfying the above constraints and
minimizing the number of misclassifications of cases in the data-file.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-022:
----------------------------------------
Efficient Learning with Virtual Threshold Gates
by Wolfgang Maass, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Manfred Warmuth, University of California, Santa Cruz, USA
Abstract:
We reduce learning simple geometric concept classes to learning
disjunctions over exponentially many variables. We then apply an
on-line algorithm called Winnow whose number of prediction mistakes
grows only logarithmically with the number of variables. The
hypotheses of Winnow are linear threshold functions with one weight per
variable. We find ways to keep the exponentially many weights of
Winnow implicitly so that the time for the algorithm to compute a
prediction and update its ``virtual'' weights is polynomial.
Our method can be used to learn $d$-dimensional axis-parallel boxes
when $d$ is variable, and unions of $d$-dimensional axis-parallel boxes
when $d$ is constant. The worst-case number of mistakes of our
algorithms for the above classes is optimal to within a constant
factor, and our algorithms inherit the noise robustness of Winnow.
We think that other on-line algorithms with multiplicative weight
updates whose loss bounds grow logarithmically with the dimension are
amenable to our methods.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-023:
----------------------------------------
On learnability and predicate logic (Extended Abstract)
by Wolfgang Maass, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Gy. Tur\'{a}n, University of Illinois at Chicago, USA
No abstract available.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-024:
----------------------------------------
Lower Bounds on Identification Criteria for Perceptron-like Learning Rules
by Michael Schmitt, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Abstract:
Topic of this paper is the computational complexity of identifying
neural weights using Perceptron-like learning rules. By Perceptron-like
rules we understand instructions to modify weight vectors by adding or
subtracting constant values after occurrence of an error. By
computational complexity we mean worst-case bounds on the number of
correction steps. The training examples are taken from Boolean
functions computable by McCulloch-Pitts neurons. Exact identification
by the Perceptron rule is known to take exponential time in the worst
case. Therefore, we define identification criteria that do not require
that the learning process exactly identifies the function being
learned: PAC identification, order identification, and sign
identification. Our results show that Perceptron-like learning rules
cannot satisfy any of these criteria when the number of correction
steps is to be bounded by a polynomial. This indicates that even by
considerably lowering one's demands on the learning process one cannot
prevent Perceptron rules from being computationally infeasible.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-025:
----------------------------------------
On Methods to Keep Learning Away from Intractability (Extended abstract)
by Michael Schmitt, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Abstract:
We investigate the complexity of learning from restricted sets of
training examples. With the intention to make learning easier we
introduce two types of restrictions that describe the permitted
training examples. The strength of the restrictions can be tuned by
choosing specific parameters. We ask how strictly their values must be
limited to turn NP-complete learning problems into polynomial-time
solvable ones. Results are presented for Perceptrons with binary and
arbitrary weights. We show that there exist bounds for the parameters
that sharply separate efficiently solvable from intractable learning
problems.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-026:
----------------------------------------
Accuracy of techniques for the logical analysis of data
by Martin Anthony, London School of Economics, UK
Abstract:
We analyse the generalisation accuracy of standard techniques for the
`logical analysis of data', within a probabilistic framework.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-027:
----------------------------------------
Interpolation and Learning in Artificial Neural Networks
by Martin Anthony, London School of Economics, UK
No abstract available.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-028:
----------------------------------------
Threshold Functions, Decision Lists, and the Representation of Boolean
Functions
by Martin Anthony, London School of Economics, UK
Abstract:
We describe a geometrically-motivated technique for data
classification. Given a finite set of points in Euclidean space, each
classified according to some target classification, we use a hyperplane
to separate off a set of points all having the same classification;
these points are then deleted from the database and the procedure is
iterated until no points remain. We explain how such an iterative
`chopping procedure' leads to a type of decision list classification of
the data points and in a classification of the data by means of a
linear threshold artificial neural network with one hidden layer. In
the case where the data points are all the $2^n$ vertices of
the Boolean hypercube, the technique produces a neural network
representation of Boolean functions differing from the obvious one
based on a function's disjunctive normal formula.
----------------------------------------
NeuroCOLT Technical Report NC-TR-96-029:
----------------------------------------
Learning of Depth Two Neural Nets with Constant Fan-in at the Hidden Nodes
by Peter Auer, University of California, Santa Cruz, USA,
Stephen Kwek, University of Illinois, USA,
Wolfgang Maass, Institute for Theoretical Computer Science,
Technische Universitaet Graz, Austria
Manfred K. Warmuth, University of California, Santa Cruz, USA
Abstract:
We present algorithms for learning depth two neural networks where the
hidden nodes are threshold gates with constant fan-in. The transfer
function of the output node might be more general: in addition to the
threshold function we have results for the logistic and the linear
transfer function at the output node.
We give batch and on-line learning algorithms for these classes of
neural networks and prove bounds on the performance of our algorithms.
The batch algorithms work for real valued inputs whereas the on-line
algorithms require that the inputs are discretized. The hypotheses of
our algorithms are essentially also neural networks of depth two.
However, their number of hidden nodes might be much larger than the
number of hidden nodes of the neural network that has to be learned.
Our algorithms can handle a large number of hidden nodes since they
rely on multiplicative weight updates at the output node, and the
performance of these algorithms scales only logarithmically with the
number of hidden nodes used.
--------------------------------------------------------------------
***************** ACCESS INSTRUCTIONS ******************
The Report NC-TR-96-001 can be accessed and printed as follows
% ftp ftp.dcs.rhbnc.ac.uk (134.219.96.1)
Name: anonymous
password: your full email address
ftp> cd pub/neurocolt/tech_reports
ftp> binary
ftp> get nc-tr-96-001.ps.Z
ftp> bye
% zcat nc-tr-96-001.ps.Z | lpr -l
Similarly for the other technical reports.
Uncompressed versions of the postscript files have also been
left for anyone not having an uncompress facility.
In some cases there are two files available, for example,
nc-tr-96-002-title.ps.Z
nc-tr-96-002-body.ps.Z
The first contains the title page while the second contains the body
of the report. The single command,
ftp> mget nc-tr-96-002*
will prompt you for the files you require.
A full list of the currently available Technical Reports in the
Series is held in a file `abstracts' in the same directory.
The files may also be accessed via WWW starting from the NeuroCOLT
homepage (note that this is undergoing some corrections and may be
temporarily inaccessible):
http://www.dcs.rhbnc.ac.uk/neural/neurocolt.html
Best wishes
John Shawe-Taylor
More information about the Connectionists
mailing list