Technical Report Series in Neural and Computational Learning

Wed May 10 04:00:52 EDT 1995

The European Community ESPRIT Working Group in Neural and Computational 
       Learning Theory (NeuroCOLT): several new reports available

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-011:
----------------------------------------
Classification by Polynomial Surfaces
by Martin Anthony, London School of Economics and Political Science

Abstract:
Linear threshold functions (for real and Boolean inputs) have received much
attention, for they are the component parts of many artificial neural
networks.  Linear threshold functions are exactly those functions such that
the positive and negative examples are separated by a hyperplane.  One
extension of this notion is to allow separators to be surfaces whose
equations are polynomials of  at most a given degree (linear separation
being the degree-$1$ case).  We investigate the representational and
expressive power of polynomial separators. Restricting to the Boolean
domain, by using an upper bound on the number of functions defined on
$\{0,1\}^n$ by polynomial separators having at most a given degree, we show,
as conjectured by Wang and Williams, that for almost every Boolean function,
one needs a polynomial surface of degree at least $\left\lfloor
n/2\right\rfloor$ in order to separate the negative examples from
thepositive examples. Further, we show that, for odd $n$, at most half of
all Boolean functions are realizable by a separating surface of degree 
$\left\lfloor n/2\right\rfloor$. We then compute the Vapnik-Chervonenkis
dimension of the class of functions realized by polynomial separating
surfaces of at most a given degree, both for the case of Boolean inputs and
real inputs. In the case of linear separators, the VC dimensions coincide
for these two cases, but for surfaces of higher degree, there is a strict
divergence.  We then use these results on the VC dimension to quantify the
sample size required for valid generalization in Valiant's probably
approximately correct framework.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-013:
----------------------------------------
Learnability of Kolmogorov-Easy Circuit Expressions Via Queries
by Jos\'e L. Balc\'azar, Universitat Polit\'ecnica de Catalunya
   Harry Buhrman, CWI, Amsterdam
   Montserrat Hermo, Universidad del Pa\'\i s Vasco

Abstract:
Circuit expressions were introduced to provide a natural link between
Computational Learning and certain aspects of Structural Complexity. Upper and
lower bounds on the learnability of circuit expressions are known. We study here
the case in which the circuit expressions are of low (time-bounded) Kolmogorov
complexity. We show that these are polynomial-time learnable from membership
queries in the presence of an NP oracle.  We also exactly characterize the sets
that have such circuit expressions, and precisely identify the subclass whose
circuit expressions can be learned from membership queries alone. The extension
of the results to various Kolmogorov complexity bounds is discussed.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-017:
----------------------------------------
Identification of the Human Arm Kinetics  using Dynamic Recurrent Neural 
Networks
by Jean-Philippe DRAYE,  Facult\'{e} Polytechnique de Mons,
   Guy CHERON, University of Brussels,
   Marc BOURGEOIS, University of Brussels,
   Davor PAVISIC, Facult\'{e} Polytechnique de Mons,
   Ga\"{e}tan LIBERT, Facult\'{e} Polytechnique de Mons

Abstract:
Artificial neural networks offer an exciting alternative for modeling and
identi fying complex non-linear systems. This paper investigates the
identification of discrete-time non-linear systems using dynamic recurrent
neural networks.  We use this kind of networks to efficiently identify the
complex temporal relati onship between the patterns of muscle activation
represented by the electromyogr aphy signal (EMG) and their mechanical actions
in three-dimensional space.  The results show that dynamic neural networks
provide a successful platform for biomechanical modeling and simulation
including complex temporal relationships.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-019:
----------------------------------------
An Algebraic Characterization of Tractable Constraints
by Peter Jeavons, Royal Holloway, University of London

Abstract:
Many combinatorial search problems may be expressed as `constraint
satisfaction problems', and this class of problems is known to be
NP-complete. In this paper we investigate what restrictions must be
imposed on the allowed constraints in order to ensure tractability.  We
describe a simple algebraic closure condition, and show that this is both
necessary and sufficient to ensure tractability in Boolean valued
problems. We also demonstrate that this condition is necessary for
problems with arbitrary finite domains.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-020:
----------------------------------------
An incremental neural classifier on a MIMD computer
by  Arnulfo Azcarraga, LIFIA - IMAG - INPG, France
    H\'el\`ene Paugam-Moisy and Didier Puzenat, LIP - URA 1398 du CNRS,
    ENS Lyon, France

Abstract:
MIMD computers are among the best parallel architectures available. They
are easily scalable with numerous processors and have potentially huge
comput ing power. One area of application for such computers is the field
of neural net works. This article presents a study, and two parallel
implementations, of a spe cific neural incremental classifier of visual
patterns. This neural network is i ncremental in that network units are
created whenever the classifier is not able to recognize correctly a
pattern. The dynamic nature of the model renders the p arallel algorithms
rather complex.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-021:
----------------------------------------
Model Selection for Neural Networks: Comparing MDL and NIC
by Guido te Brake, Utrecht University,
   Joost N. Kok, Utrecht University,
   Paul M.B. Vit\'anyi, CWI, Amsterdam

Abstract:
We compare the MDL and NIC methods for determining the correct size of a
feedforward neural network.  The NIC method has to be adapted for this kind
of networks.  We include an experiment based on a small standard problem.  

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-023:
----------------------------------------
PAC Learning and Artificial Neural Networks
by Martin Anthony and Norman Biggs, London School of Economics and Political
   Science, University of London

Abstract:
In this article, we discuss the `probably approximately correct' (PAC)
learning paradigm as it applies to artificial neural networks.  The PAC
learning model is a probabilistic framework for the study of learning and
generalization. It is useful not only for neural classification problems, but
also for learning problems more often associated with mainstream artificial
intelligence, such as the inference of  Boolean functions.  In PAC theory,
the notion of succesful learning is formally defined using probability
theory. Very roughly speaking, if a large enough sample of randomly drawn
training examples is presented, then it should be likely that, after
learning, the neural network will classify most other randomly drawn examples
correctly. The PAC model formalises the terms `likely' and `most'. 
Furthermore, the learning algorithm must be expected to act quickly, since
otherwise it may be of little use in practice.

There are thus two main emphases in PAC learning theory. First, there is the
issue of how many training examples should be presented. Secondly, there is
the question of whether learning can be achieved using a fast algorithm.
These are known, respectively, as the {\it sample complexity} and {\it
computational complexity} problems. This article provides a brief
introduction to these. We highlight the importance of the Vapnik-Chervonenkis
dimension, a combinatorial parameter which measures the `expressive power' of
a neural network, and describe how this parameter quantifies fairly precisely
the sample complexity of PAC learning.  In discussing the computational
complexity of PAC learning, we shall present a result which illustrates that
in some cases the problem of PAC learning is inherently intractable.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-024:
----------------------------------------
Graphs and Artificial Neural Networks
by Martin Anthony, London School of Economics and Political Science, 
   University of London

Abstract:
`Artificial neural networks' are machines (or models of computation) based
loosely on the ways in which the brain is believed to work. In this chapter,
we discuss some links between graph theory and artificial neural networks. We
describe how some combinatorial optimisation tasks may be approached by using
a type of artificial neural network known as a Boltzmann machine. We then
focus on `learning' in feedforward artificial neural networks, explaining how
the graph structure of a network and the hardness of graph-colouring quantify
the complexity of learning.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-025:
----------------------------------------
The Vapnik-Chervonenkis Dimension of a Random Graph
by Martin Anthony, Graham Brightwell, London School of Economics and 
   Political Science, University of London
   Colin Cooper, University of North London

Abstract:
In this paper we investigate a parameter defined for any graph, known as
the {\it Vapnik-Chervonenkis dimension} (or VC dimension).  For any vertex
$x$ of a graph $G$, the closed neighbourhood $N(x)$ of $x$ is the set of
all vertices of $G$ adjacent to $x$, together with $x$. We say that a set
$D$ of vertices of $G$ is {\it shattered} if every subset $R$ of $D$ can
be realised as  $R=D \cap N(x)$ for some vertex $x$ of $G$.  The
Vapnik-Chervonenkis dimension of $G$ is defined to be the largest
cardinality of a shattered set of vertices.  This parameter can be used to
provide bounds on the complexity of a learning problem on graphs.  Our
main result gives, for each positive integer $d$, the exact threshold
function for a random graph $G(n,p)$ to have VC~dimension $d$.  

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-026:
----------------------------------------
Probabilistic Decision Trees and Multilayered Perceptrons
by Pascal Bigot and Michel Cosnard, LIP, ENS, Lyon, France

Abstract:
We propose a new algorithm to compute a multilayered perceptron for
classification problems, based on the design of a binary decision tree.
We show how to modify this algorithm for using ternary logic, introducing
a Don'tKnow class. This modification could be applied to any heuristic
based on the recursive construction of a decision tree.  Another way of
dealing with uncertainty for improving generalization performance is to
construct probabilistic decision trees.  We explain how to modify the
preceding heuristics for constructing such trees and associating
probabilistic multilayered perceptrons.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-027:
----------------------------------------
A characterization of the existence of energies for neural networks
by Michel Cosnard, LIP, ENS, Lyon, France
   Eric Gole, Universidad de Chile, Santiago, Chile

Abstract:
In this paper we give under an appropriate theoretical framework a
characterization about neural networks which admit an energy.  We prove
that a neural network admits an energy if and only if the weight matrix
verifies two conditions: the diagonal elements are non-negative and the
associated incidence graph does not admit non-quasi-symmetric circuits.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-028:
----------------------------------------
Improvement of Gradient Descent based Algorithms Training Multilayer
Perceptrons with an Evolutionnary Initialization
by C\'edric G\'egout, \'Ecole Normale Sup\'erieure de Lyon, Lyon

Abstract:
Gradient descent algorithms reducing the mean square error computed on a
training set are widely used for training real valued feedforward
networks, because of their easy implementation and their efficacy.  But
in some cases they are trapped in a local optimum and are not able to
find a good network. In order to eliminate theses limitated cases,
usually we could only restart the gradient descent or found an
initialization point constructed with unreliable and training set
dependant heuristics.  This paper presents a new method to find a good
initialization point. An evolutionary algorithm provides an individual
whose phenotype is a neural network. This individual is the best one that
makes a quick, efficient and robust gradient descent. The genotypes are
real valued vectors containing parameters of networks. Therefore we use
special genetic operators. Simulation results show that this
initialization reduces the neural network training time, the training
complexity and improves the robustness of gradient descent based
algorithms.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-029:
----------------------------------------
The Curse of Dimensionality and the Perceptron Algorithm
by Jyrki Kivinen, University of Helsinki
   Manfred K.~Warmuth, University of California, Santa Cruz

Abstract:
We give an adversary strategy that forces the Perceptron algorithm to
make $(N-k+1)/2$ mistakes when learning $k$-literal disjunctions over $N$
variables.  Experimentally we see that even for simple random data, the
number of mistakes made by the Perceptron algorithm grows almost linearly
with $N$, even if the number $k$ of relevant variable remains a small
constant.  Thus, the Perceptron algorithm suffers from the curse of
dimensionality even when the target is extremely simple and almost all of
the dimensions are irrelevant.  In contrast, Littlestone's algorithm
Winnow makes at most %$O(k(1+\log(N/k))$ mistakes for the same problem. 
$O(k\log N)$ mistakes for the same problem.  Both algorithms use linear
threshold functions as their hypotheses. However, Winnow does
multiplicative updates to its weight vector instead of the additive
updates of the Perceptron algorithm.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-030:
----------------------------------------
Identifying Regular Languages over Partially-Commutative Monoids
by Claudio Ferretti -- Giancarlo Mauri, Universit\`a di Milano - ITALY

Abstract:
We define a new technique useful in identifying a subclass of regular
languages defined on a free partially commutative monoid (regular trace
languages), using equivalence and membership queries.  Our algorithm
extends an algorithm defined by Dana Angluin in 1987 to learn DFA's.  The
words of a trace language can be seen as equivalence classes of strings.
We show how to extract, from a given equivalence class, a string of an
unknown underlying regular langu age. These strings can drive the
original learning algorithm which identify a regular string language that
defines also the target trace language.  In this way the algorithm
applies also to classes of unrecognizable regular trace languages and, as
a corollary, to a class of unrecognizable string languages.  We also
discuss bounds on the number of examples needed to identify the target
language and on the time required to process them.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-031:
----------------------------------------
A Comparative Study For Forecasting Intra-daily Exchange Rate Data
by Sabine P Toulson, London School of Economics, University of London

Abstract:
For the last few years neural nets have been applied to economic and
financial forecasting where they have shown to be increasingly
successful. This paper compares the performance of a two hidden layer
multi-layer perceptron (MLP) with conventional statistical techniques. 
The statistical techniques used here consist of a structural model (SM)
and the stochastic volatility model (SV).  After reviewing each of the
three models a comparison between the MLP and the SM is made
investigating the predictive power of both models for a one-step ahead
forecast of the Dollar-Deutschmark exchange rate. Reasons are given for
why the MLP is expected to perform better than a conventional model in
this case.  A further study gives results on the performance of an MLP
and a SV model in predicting the volatility of the Dollar- Deutschmark
exchange rate and a combination of both models is proposed to decrease
the forecasting error.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-032:
----------------------------------------
Characterizations of Learnability for Classes of 
    $\{ 0,...,n \}$-valued Functions
by Shai Ben-David, Technion, Israel
   Nicol\`o Cesa-Bianchi, Universit\`a di Milano, Italy
   David Haussler, University of California at Santa Cruz, USA
   Philip M. Long, Duke University, USA

Abstract:
We investigate the PAC learnability of classes of $\sn$-valued functions
($n < \infty$).  For $n=1$ it is known that the finiteness of the
Vapnik-Chervonenkis dimension is necessary and sufficient for learning. 
For $n > 1$ several generalizations of the VC-dimension, each yielding a
distinct characterization of learnability, have been proposed by a number
of researchers.  In this paper we present a general scheme for extending
the VC-dimension to the case $n > 1$.  Our scheme defines a wide variety
of notions of dimension in which all these variants of the VC-dimension,
previously introduced in the context of learning, appear as special
cases. Our main result is a simple condition characterizing the set of
notions of dimension whose finiteness is necessary and sufficient for
learning. This provides a variety of new tools for determining the
learnability of a class of multi-valued functions. Our characterization
is also shown to hold in the ``robust'' variant of PAC model and for any
``reasonable'' loss function.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-033:
----------------------------------------
Constructing Computationally Efficient Bayesian Models via Unsupervised 
Clustering
by Petri Myllym\"aki and Henry Tirri, University of Helsinki, Finland

Abstract:
Given a  set of samples of an unknown probability distribution, we study
the problem of constructing a good approximative Bayesian network model of
the probability distribution in question.  This task can be viewed as a
search problem, where the goal is to find a maximal probability network
model, given the data.  In this work, we do not make an attempt to learn
arbitrarily complex multi-connected Bayesian network structures, since
such resulting models can be unsuitable for practical purposes due to the
exponential amount of time required for the reasoning task. Instead, we
restrict ourselves to a special class of simple tree-structured Bayesian
networks called Bayesian prototype trees, for which a polynomial time
algorithm for Bayesian reasoning exists. We show how the probability of a
given Bayesian prototype tree model can be evaluated, given the data, and
how this evaluation criterion can be used in a stochastic simulated
annealing algorithm for searching the model space. The simulated annealing
algorithm provably finds the maximal probability model, provided that a
sufficient amount of time is used.  

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-034:
----------------------------------------
Mapping Bayesian Networks to Boltzmann Machines
by Petri Myllym\"aki, University of Helsinki, Finland

Abstract:
We study the task of finding a maximal a posteriori (MAP) instantiation
of Bayesian network variables, given a partial value assignment as an
initial constraint. This problem is known to be NP-hard, so we
concentrate on a stochastic approximation algorithm, simulated annealing.
This stochastic algorithm can be realized as a sequential process on the
set of Bayesian network variables, where only one variable is allowed to
change at a time.  Consequently, the method can become impractically slow
as the number of variables increases.  We present a method for mapping a
given Bayesian network to a massively parallel Bolztmann machine neural
network architecture, in the sense that instead of using the normal
sequential simulated annealing algorithm, we can use a massively parallel
stochastic process on the Boltzmann machine architecture. The neural
network updating process provably converges to a state which solves a
given MAP task.  

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-035:
----------------------------------------
A MINIMAL LENGTH ENCODING SYSTEM
by Tony Bellotti, London Electricity plc, UK
   Alex Gammerman, Royal Holloway, University of London, UK

Abstract:
Emily is a project to develop a computer system that can organise
symbolic knowledge given in a high-level relational language, based on
the principle of minimal length encoding (MLE). The purpose of developing
this system is to test the hypothesis that minimal length encoding can be
used as a general method for induction.  A prototype version, Emily2, has
already been implemented. It is the purpose of this paper to describe
this system, to present some of our results and to indicate future
developments.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-036:
----------------------------------------
Techniques in Neural Learning
by Pascal Koiran, DIMACS, Rutgers University
   John Shawe-Taylor, Royal Holloway, University of London

Abstract:
This paper takes ideas developed in a theoretical framework by Maass and
adapts them for a practical learning algorithm for feedforward sigmoid
neural networks.  A number of different techniques are presented which
are based loosely around the common theme of taking advantage of the
linearity of the net input to a neuron, or in other words the fact that
there is only a single non-linearity per neuron. Some experimental
results are included, though many of the ideas are as yet untested. The
paper can therefore be viewed as a tool box offering a selection of
possible techniques for incorporation in practical, heuristic learning
algorithms for multi-layer perceptrons.  

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-037:
----------------------------------------
$\P\neq \NP$ over the non standard reals implies $\P\neq \NP$ over $\R$
by Christian Michaux, University of Mons-Hainaut, Belgium

Abstract:
Blum, Shub and Smale showed the existence of a $\NP$-complete problem
over the real closed fields in the framework of their theory of
computation over the reals. This allows to ask for the $\P\neq \NP$
question over real closed fields. Here we show that $\P\neq\NP$ over a
real closed extension of the reals implies $\P\neq \NP$ over the reals.
We also discuss the converse. This leads to define some subclasses of
$\P/$poly. Finally we show that the transfer result about $\P\neq \NP$ is
a part of a very abstract result.  

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-038:
----------------------------------------
Computing with Truly Asynchronous Threshold Logic Networks
by Pekka Orponen, Technical University of Graz, Austria

Abstract:
We present simulation mechanisms by which any network of threshold logic
units with either symmetric or asymmetric interunit connections (i.e., a
symmetric or asymmetric ``Hopfield net'') can be simulated on a network
of the same type, but without any a priori constraints on the order of
updates of the units.  Together with earlier constructions, the results
show that the truly asynchronous network model is computationally
equivalent to the seemingly more powerful models with either ordered
sequential or fully parallel updates.  

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-040:
----------------------------------------
Descriptive Complexity Theory over the Real Numbers
by Erich Gr\"adel and Klaus Meer, RWTH Aachen, Germany

Abstract:
We present a logical approach to complexity over the real numbers with
respect to the model of Blum, Shub and Smale.
The logics under consideration are interpreted over a special class of
two-sorted structures, called {\em $\R$-structures}: They consist of a
finite structure together with the ordered field of reals and a finite
set of functions from the finite structure into $\R$. They are a special
case of the {\em metafinite structures} introduced recently by Gr\"adel
and Gurevich.  We argue that $\R$-structures provide the right class of
structures to develop a descriptive complexity theory over $\R$. We
substantiate this claim by a number of results that relate logical
definability on $\R$-structures with complexity of computations of
BSS-machines.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-042:
----------------------------------------
Knowledge Extraction From Neural Networks : A Survey
by  R. Baron, ENS-Lyon CNRS, France

Abstract:
Artificial neural networks may learn to solve arbitrary complex problems.
But knowledge acquired is hard to exhibit. Thus neural networks appear as
``black boxes'', the decisions of which can't be explained. In this
survey, diff erent techniques for knowledge extraction from neural
networks are presented.  Early works have shown the interest of the study
of internal representations, bu t these studies were domain specific.
Thus, authors tried to extract a more general form of knowledge, like
rules of an expert system. In a more restricted field, it is also
possible to extract automata from neural networks, likely to recognize a
formal language. Finally, numerical information may be obtained in
process modelling, and this may be of interest in industrial
applications.  

-----------------------
The Report NC-TR-95-011 can be accessed and printed as follows 

% ftp cscx.cs.rhbnc.ac.uk  (134.219.200.45)
Name: anonymous
password: your full email address
ftp> cd pub/neurocolt/tech_reports
ftp> binary
ftp> get nc-tr-95-011.ps.Z
ftp> bye
% zcat nc-tr-95-011.ps.Z | lpr -l

Similarly for the other technical report.

Uncompressed versions of the postscript files have also been
left for anyone not having an uncompress facility.

A full list of the currently available Technical Reports in the 
Series is held in a file `abstracts' in the same directory.

The files may also be accessed via WWW starting from the NeuroCOLT homepage:

http://www.dcs.rhbnc.ac.uk/neural/neurocolt.html

Best wishes
John Shawe-Taylor