Technical Report Series in Neural and Computational Learning

Wed Apr 2 04:56:14 EST 1997

The European Community ESPRIT Working Group in Neural and Computational 
Learning Theory (NeuroCOLT) has produced a set of new Technical Reports
available from the remote ftp site described below. They cover topics in
real valued complexity theory, computational learning theory, and analysis
of the computational power of continuous neural networks.  Abstracts are
included for the titles.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-015:
----------------------------------------
Exact Learning of subclasses of  CDNF formulas with membership queries
by  Carlos Domingo, Universitat Polit\`ecnica de Catalunya, Spain

Abstract:
We consider the exact learnability of subclasses of Boolean formulas
from membership queries alone. We show how to combine known learning
algorithms that use membership and equivalence queries to obtain new
learning results only with memberships. In particular we show the exact
learnability of read-$k$ monotone CDNF formulas, Sat-$k$ ${\cal O}(\log
n)$-CDNF, and ${\cal O}(\sqrt{\log n})\mbox{-size CDNF}$ from
membership queries only.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-016:
----------------------------------------
Decision Trees have Approximate Fingerprints
by  Victor Lavin, Universitat Polit\`ecnica de Catalunya, Spain
    Vijay Raghavan, Vanderbilt University, USA

Abstract:
We prove that decision trees exhibit the ``approximate fingerprint''
property, and therefore are not polynomially learnable using only
equivalence queries. A slight modification of the proof extends this
result to several other representation classes of boolean concepts
which have been studied in computational learning theory.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-017:
----------------------------------------
Learning Monotone Term Decision Lists
by  David Guijarro, Victor Lavin, Universitat Polit\`ecnica de Catalunya, Spain
    Vijay Raghavan, Vanderbilt University, USA

Abstract:
We study the learnability of monotone term decision lists in the exact
model of equivalence and membership queries. We show that, for any
constant $k \ge 0$, $k$-term monotone decision lists are exactly and
properly learnable with $n^{O(k)}$ membership queries in O($n^{k^3}$)
time.  We also show $n^{\Omega (k)}$ membership queries are necessary
for exact learning. In contrast, both $k$-term monotone decision lists
($k \ge 2$) and general monotone decision lists are not learnable with
equivalence queries alone.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-018:
----------------------------------------
Learning nearly monotone $k$-term DNF
by  Jorge Castro, David Guijarro, Victor Lavin, 
    Universitat Polit\`ecnica de Catalunya, Spain

Abstract:
This note studies the learnability of the class $k$-term DNF with a
bounded number of negations per term.  We study the case of learning
with membership queries alone, and give tight upper and lower bounds on
the number of negations that makes the learning task feasible.  We also
prove a negative result for equivalence queries.  Finally, we show that
a slight modification in our algorithm proves that the considered class
is also learnable in the Simple PAC model, extending Li and Vit\'anyi
result for monotone $k$-term DNF.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-019:
----------------------------------------
$\delta$-uniform BSS Machines
by  Paolo Boldi, Sebastiano Vigna, Universit\`a degli Studi di Milano, Italy

Abstract:
A $\delta$-uniform BSS machine is almost like a standard BSS machine,
but the negativity test is replaced by a ``smaller than $-\delta$''
test, where the threshold $\delta\in(0,1)$ is not known: in this way we
represent the impossibility of performing exact equality tests. We
prove that, for any real closed archimedean field $R$, the
$\delta$-uniform semi-decidable sets are exactly the interiors of BSS
semi-decidable sets. Then, we show that the sets semi-decidable by
Turing machines are the sets semi-decidable by $\delta$-uniform
machines with coefficients in $Q$ or $T$, the field of Turing
computable numbers.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-020:
----------------------------------------
The Computational Power of Spiking Neurons Depends on the Shape of the
Postsynaptic Potentials
by  Wolfgang Maass, Berthold Ruf, Technische Universitaet Graz, Austria

Abstract:
Recently one has started to investigate the computational power of
spiking neurons (also called ``integrate and fire neurons''). These are
neuron models that are substantially more realistic from the biological
point of view than the ones which are traditionally employed in
artificial neural nets.  It has turned out that the computational power
of networks of spiking neurons is quite large.  In particular they have
the ability to communicate and manipulate analog variables in
spatio-temporal coding, i.e.~encoded in the time points when specific
neurons ``fire'' (and thus send a ``spike'' to other neurons).
These preceding results have motivated the question which details of
the firing mechanism of spiking neurons are essential for their
computational power, and which details are ``accidental'' aspects of
their realization in biological ``wetware''.  Obviously this question
becomes important if one wants to capture some of the advantages of
computing and learning with spatio-temporal coding in a new generation
of artificial neural nets, such as for example pulse stream VLSI.
The firing mechanism of spiking neurons is defined in terms of their
postsynaptic potentials or ``response functions'', which describe the
change in their electric membrane potential as a result of the firing
of another neuron.  We consider in this article the case where the
response functions of spiking neurons are assumed to be of the
mathematically most elementary type: they are assumed to be
step-functions (i.e. piecewise constant functions). This happens to be
the functional form which has so far been adapted most frequently in
pulse stream VLSI as the form of potential changes (``pulses'') that
mimic the role of postsynaptic potentials in biological neural
systems.  We prove the rather surprising result that in models without
noise the computational power of networks of spiking neurons with
arbitrary piecewise constant response functions is strictly weaker than
that of networks where the response functions of neurons also contain
short segments where they increase respectively decrease in a linear
fashion (which is in fact biologically more realistic).  More precisely
we show for example that an addition of analog numbers is impossible
for a network of spiking neurons with piecewise constant response
functions (with any bounded number of computation steps, i.e. spikes),
whereas addition of analog numbers is easy if the response functions
have linearly increasing segments.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-021:
----------------------------------------
On the Effect of Analog Noise in Discrete-Time Analog Computations
by  Wolfgang Maass, Technische Universitaet Graz, Austria
    Pekka Orponen, University of Jyv\"askyl\"a, Finland

Abstract:
We introduce a model for analog noise in analog computation with
discrete time that is flexible enough to cover the most important
concrete cases, such as noisy analog neural nets and networks of
spiking neurons.  We show that the presence of arbitrarily small
amounts of analog noise reduces the power of analog computational
models to that of finite automata, and we also prove a new type of
upper bound for the VC-dimension of computational models with analog
noise.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-022:
----------------------------------------
Networks of Spiking Neurons Can Emulate Arbitrary Hopfield Nets in
Temporal Coding
by  Wolfgang Maass and Thomas Natschl"ager, 
    Technische Universitaet Graz, Austria

Abstract:
A theoretical model for analog computation in networks of spiking
neurons with temporal coding is introduced and tested through
simulations in GENESIS. It turns out that the use of multiple synapses
yields very noise robust mechanisms for analog computations via the
timing of single spikes in networks of detailed compartmental neuron
models.
One arrives in this way at a method for emulating arbitrary Hopfield
nets with spiking neurons in temporal coding, yielding new models for
associative recall of spatio-temporal firing patterns. We also show
that it suffices to store these patterns in the efficacies of
\emph{excitatory} synapses.
A corresponding \emph{layered} architecture yields a refinement of the
synfire-chain model that can assume a fairly large set of different
stable firing patterns for different inputs.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-023:
----------------------------------------
The Perceptron algorithm vs. Winnow:  linear vs. logarithmic mistake
bounds when few input variables are relevant
by  Jyrki Kivinen, University of Helsinki, Finland
    Manfred Warmuth, University of California, Santa Cruz, USA
    Peter Auer, Technische Universitaet Graz, Austria

Abstract:
We give an adversary strategy that forces the Perceptron algorithm to
make $\Omega(k N)$ mistakes in learning monotone disjunctions over $N$
variables with at most $k$ literals.  In contrast, Littlestone's
algorithm Winnow makes at most $O(k\log N)$ mistakes for the same
problem.  Both algorithms use thresholded linear functions as their
hypotheses.  However, Winnow does multiplicative updates to its weight
vector instead of the additive updates of the Perceptron algorithm.
The Perceptron algorithm is an example of {\em additive\/} algorithms,
which have the property that their weight vector is always a sum of a
fixed initial weight vector and some linear combination of already seen
instances.  We show that an adversary can force any additive algorithm
to make $(N+k-1)/2$ mistakes in learning a monotone disjunction of at
most $k$ literals.  Simple experiments show that for $k\ll N$, Winnow
clearly outperforms the Perceptron algorithm also on nonadversarial
random data.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-024:
----------------------------------------
Approximating Hyper-Rectangles: Learning and Pseudo-random Sets
by  Peter Auer, Technische Universitaet Graz, Austria
    Philip Long, National University of Singapore, Singapore
    Aravind Srinivasan, National University of Singapore, Singapore

Abstract:
The PAC learning of rectangles has been studied because they have been
found experimentally to yield excellent hypotheses for several applied
learning problems. Also, pseudorandom sets for rectangles have been
actively studied recently because (i) they are a subproblem common to
the derandomization of depth-2 (DNF) circuits and derandomizing
Randomized Logspace, and (ii) they approximate the distribution of $n$
independent multivalued random variables.  We present improved upper
bounds for a class of such problems of ``approximating''
high-dimensional rectangles that arise in PAC learning and
pseudorandomness.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-025:
----------------------------------------
On Learning from Multi-Instance Examples:  Empirical Evaluation of a
Theoretical Approach
by  Peter Auer, Technische Universitaet Graz, Austria

Abstract:
We describe a practical algorithm for learning axis-parallel
high-dimensional boxes from multi-instance examples.  The first
solution to this practical learning problem arising in drug design was
given by Dietterich, Lathrop, and Lozano-Perez. A theoretical analysis
was performed by Auer, Long, Srinivasan, and Tan.
In this work we derive a competitive algorithm from theoretical
considerations which is completely different from the approach taken by
Dietterich et. al. Our algorithm uses for learning only simple
statistics of the training data and avoids potentially hard
computational problems which were solved by heuristics by Dietterich
et. al. In empirical experiments our algorithm performs quite well
although it does not reach the performance of the fine-tuned algorithm
of Dietterich et. al.  We conjecture that our approach can be
fruitfully applied also to other learning problems where certain
statistical assumptions are satisfied.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-026:
----------------------------------------
Computing Functions with Spiking Neurons in Temporal Coding
by  Berthold Ruf, Technische Universitaet Graz, Austria

Abstract:
For fast neural computations within the brain it is very likely that
the timing of single firing events is relevant. Recently Maass has
shown that under certain weak assumptions functions can be computed in
temporal coding by leaky integrate-and-fire neurons. Here we
demonstrate with the help of computer simulations using GENESIS that
biologically more realistic neurons can compute linear functions in a
natural and straightforward way based on the basic principles of the
construction given by Maass. One only has to assume that a neuron
receives all its inputs in a time intervall of approximately the length
of the rising segment of its excitatory postsynaptic potentials. We
also show that under certain assumptions there exists within this
construction some type of activation function being computed by such
neurons, which allows the fast computation of arbitrary continuous
bounded functions.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-027:
----------------------------------------
Hebbian Learning in Networks of Spiking Neurons Using Temporal Coding
by  Berthold Ruf, Michael Schmitt, Technische Universitaet Graz, Austria

Abstract:
Computational tasks in biological systems that require short response
times can be implemented in a straightforward way by networks of
spiking neurons that encode analogue values in temporal coding.  We
investigate the question how spiking neurons can learn on the basis of
differences between firing times. In particular, we provide learning
rules of the Hebbian type in terms of single spiking events of the pre-
and postsynaptic neuron and show that the weights approach some value
given by the difference between pre- and postsynaptic firing times with
arbitrary high precision. Our learning rules give rise to a
straightforward possibility for realizing very fast pattern analysis
tasks with spiking neurons.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-028:
----------------------------------------
Overview of Learning Systems produced by NeuroCOLT Partners
by NeuroCOLT Partners

Abstract:
This NeuroCOLT Technical Report documents a number of systems that
have been produced withing the NeuroCOLT partnership.  It only includes
a summary of each system together with pointers to where the system is
located and more information about its performance and design can
be found. 

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-029:
----------------------------------------
On Bayesian Case Matching
by Petri Kontkanen, Petri Myllym\"aki, Tom Silander and Henry Tirri, 
   University of Helsinki, Finland

Abstract:
In this paper we present a new probabilist formalization of the
case-based reasoning paradigm.  In contrast to earlier Bayesian
approaches, the new formalization does not need a transformation step
between the original case space and the distribution space.  We
concentrate on applying this Bayesian framework to the case matching
problem, and propose a probabilistic scoring metric for this task.  In
the experimental part of the paper, the Bayesian case matching score is
evaluated empirically by using publicly available real-world case
bases.  The results show that when encountered with cases where some of
the feature values have been removed, a relatively small number of
remaining values is sufficient for retrieving the original case from
the case base by using the proposed measure.  The experiments also show
that the approach is computationally very efficient.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-030:
----------------------------------------
Batch Classifications with Discrete Finite Mixtures
by Petri Kontkanen, Petri Myllym\"aki, Tom Silander and Henry Tirri,
   University of Helsinki, Finland

Abstract:
In this paper we study batch classification problems where multiple 
predictions can be made simultaneously, instead of performing the
classifications independently one at a time.  For the predictions we
use the model family of discrete finite mixtures, where, by introducing
a hidden latent variable, we implicitly assume missing data that has to
be estimated in order to be able to construct models from sample data.
The main contribution of this paper is to demonstrate how the standard
EM algorithm can be modified for estimating both the missing latent
variable data, and the batch classification data at the same time, thus
allowing us to use the same algorithm both for constructing the models
from training data and for making predictions.  In our framework the
amount of data available for making predictions is greater than with
the traditional approach, as the algorithm can also exploit the
information available in the query vectors.  In the empirical part of
the paper, the results obtained by the batch classification approach
are compared to those obtained by standard (independent) predictions by
using public domain classification data sets.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-031:
----------------------------------------
Bayes Optimal Lazy Learning
by Petri Kontkanen, Petri Myllym\"aki, Tom Silander and Henry Tirri,
   University of Helsinki, Finland

Abstract:
In this paper we present a new probabilistic formalization of the lazy
learning approach. In our Bayesian framework, moving from the
construction of an explicit hypothesis to a lazy learning approach,
where predictions are made by combining the training data at query
time, is equivalent to integrating out all the model parameters.  Hence
in Bayesian Lazy Learning the predictions are made by using all the
(infinitely many) models.  We present the formalization of this general
framework, and illustrate its use in practice in the case of the Naive
Bayes classifier model family.  The Bayesian lazy learning approach is
validated empriically with public domain data sets and the results are
compared to the performance of the traditional, single model Naive
Bayes.  The general framework described in this paper can be applied
with any formal model family, and to any discrete prediction task where
the number of simultaneously predicted attributes is small, which
includes for example all classification tasks prevalent in the machine
learning literature.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-032:
----------------------------------------
On Predictive Distributions and Bayesian Networks
by Petri Kontkanen, Petri Myllym\"aki, Tom Silander and Henry Tirri,
   University of Helsinki, Finland

Abstract:
In this paper we are interested in discrete prediction problems for a
decision-theoretic setting, where the task is to compute the predictive
distribution for a finite set of possible alternatives.  This question
is first addressed in a general framework, where we consider a set of
probability distributions defined by some parametric model class.  The
standard Bayesian approach is to compute the posterior probability for
the model parameters, given a prior distribution and sample data, and
fix the parameters to the instantiation with the {\em maximum a
posteriori} probability.  A more accurate predictive distribution can
be obtained by comupting the {\em evidence}, i.e., the integral over
all the individual parameter instantiations. As an alternative to these
two approaches, we demonstrate how to use Rissanen's new definition of 
{\em stochastic complexity} for determining predictive distributions.
We then describe how these predictive inference methods can be realized
in the case of Bayesian networks.  In particular, we demonstrate the
use of Jeffrey's prior as the prior distribution for computing the
evidence predictive distribution. It can be shown that the evidence
predictive distribution with Jeffrey's prior approaches the new
stochastic complexity predictive distribution in the limit with
increasing amount of sample data. For computational reasons in the
experimental part of the paper the three predictive distributions are
compared by using the tree-structures simple Naive Bayes model.  The
experimentation with several public domain classification datasets
suggest that the evidence approach produces the most accurate
predictions in the log-score sense, especially with small training
sets.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-033:
----------------------------------------
Partial Occam's Razor and its Applications
by Carlos Domingo, Tatsuie Tsukiji and Osamu Watanabe,
   Tokyo Institute of Technology, Japan

Abstract:
We introduce the notion of ``partial Occam algorithm''.  A partial
Occam algorithm produces a succinct hypothesis that is partially
consistent with given examples, where the proportion of consistent
examples is a bit more than half.  By using this new notion, we propose
one approach for obtaining a PAC learning algorithm.  First, as shown
in this paper, a partial Occam algorithm is equivalent to a weak PAC
learning algorithm.  Then by using boosting techniques of Schapire or
Freund, we can obtain an ordinary PAC learning algorithm from this weak
PAC learning algorithm.  We demonstrate with some examples that some
improvement is possible by this approach, in particular in the
hypothesis size.  First, we obtain a (non-proper) PAC learning
algorithm for $k$-DNF, which has similar sample complexity as
Littlestone's Winnow, but produces hypothesis of size polynomial in $d$
and $\log k$ for a $k$-DNF target with $n$ variables and $d$ terms
({\it Cf.}~ The hypothesis size of Winnow is $\CO(n^k)$).  Next we show
that 1-decision lists of length $d$ with $n$ variables are (non-proper)
PAC learnable by using $\dsp{\CO\rpr{\frac{1}{\epsilon} \rpr{\log
\frac{1}{\delta}+16^d\log n(d+\log \log n)^2}}}$ examples within
polynomial time w.r.t.\ $n$, $2^d$, $1/\epsilon$, and $\log 1/\delta$.
Again, we obtain a sample complexity similar to Winnow for the same
problem but with a much smaller hypothesis size.  We also show that our
algorithms are robust against random classification noise.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-034:
----------------------------------------
Algorithms for Learning Finite Automata from Queries: A Unified View
by Jos\'e Balc\'azar, Josep D\'iaz, Ricard Gavalda
   Universitat Polit\`ecnica de Catalunya, Spain
   Osamu Watanabe, Tokyo Institute of Technology, Japan

Abstract:
In this survey we compare several known variants of the algorithm for
learning deterministic finite automata via membership and equivalence
queries. We believe that our presentation makes it easier to understand
what is going on and what the differences between the various
algorithms mean. We also include the comparative analysis of the
algorithms, review some known lower bounds, prove a new one, and
discuss the question of parallelizing this sort of algorithms.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-035:
----------------------------------------
Using Fewer Examples to Simulate Equivalence Queries
by Ricard Gavalda, Universitat Polit\`ecnica de Catalunya, Spain

Abstract:
It is well known that an algorithm that learns exactly using
Equivalence queries can be transformed into a PAC algorithm that asks
for random labelled examples.  The first transformation due to Angluin
(1988) uses a number of examples quadratic in the number of queries.
Later, Littlestone (1989) and Schuurmans and Greiner (1995) gave
transformations using linearly many examples. We present here another
analysis of Littlestone's transformation which is both simpler and
gives better leading constants.  Our constants are still worse than
Schuurmans and Greiner's, but while ours is a worst-case bound on the
number of examples to achieve PAC learning, theirs is only an expected
one.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-036:
----------------------------------------
A Dichotomy Theorem for Learning Quantified Boolean Formulas
by Victor Dalmau, Universitat Polit\`ecnica de Catalunya, Spain

Abstract:
We consider the following classes of quantified boolean formulas. Fix a
finite set of basic boolean functions. Take conjunctions of these basic
functions applied to variables and constants in arbitrary way. Finally
quantify existentially or universally some of the variables. We prove
the following {\em dichotomy theorem}: For any set of basic boolean
functions, the resulting set of formulas is either polynomially
learnable from equivalence queries alone or else it is not
PAC-predictable even with membership queries under cryptographic
assumptions. Furthermore we identify precisely which sets of basic
functions are in which of the two cases.

----------------------------------------
NeuroCOLT Technical Report NC-TR-97-037:
----------------------------------------
Discontinuities in Recurrent Neural Networks
by Ricard Gavald\`a, Universitat Polit\`ecnica de Catalunya, Spain
   Hava Siegelmann, Technion, Israel

Abstract:
This paper studies the computational power of various discontinuous
real computational models that are based on the classical analog
recurrent neural network (ARNN).  This ARNN consists of finite number
of neurons; each neuron computes a polynomial net-function and a
sigmoid-like continuous activation-function.
The authors introduce ``arithmetic networks'' as ARNN augmented with a
few simple discontinuous (eg., threshold) neurons.  They argue that
even with weights restricted to polynomial time computable reals,
arithmetic networks are able to compute arbitrary complex recursive
functions. A proof is provided to show that arithmetic networks are
computationally equivalent to networks comprised of neurons that
compute divisions and polynomials net-functions inside sigmoid-like
continuous activation functions.  Further, the authors prove that these
arithmetic networks are equivalent to the Blum-Shub-Smale (BSS) model,
when the latter is restricted to a bounded number of registers.
With regards to implementation on digital computers, the authors
demonstrate that arithmetic networks with rational weights require
exponential precision; but even with very simple real weights
arithmetic networks are not subject to precision bounds. As such, they
can not be approximated on digital machines. This is in contrast with
the ARNN that are known to demand only precision that is linear in the
computation time.
When complex periodic discontinuous neurons (eg. sine, tangent,
fractional parts) are augmented to arithmetic networks, the resulting
networks are computationally equivalent to a massively parallel
machine.  Thus, this highly discontinuous network can solve the
presumably intractable class of PSPACE-complete problems in polynomial
time.

--------------------------------------------------------------------

***************** ACCESS INSTRUCTIONS ******************

The Report NC-TR-97-001 can be accessed and printed as follows 

% ftp ftp.dcs.rhbnc.ac.uk  (134.219.96.1)
Name: anonymous
password: your full email address
ftp> cd pub/neurocolt/tech_reports
ftp> binary
ftp> get nc-tr-97-001.ps.Z
ftp> bye
% zcat nc-tr-97-001.ps.Z | lpr -l

Similarly for the other technical reports.

Uncompressed versions of the postscript files have also been
left for anyone not having an uncompress facility. 

In some cases there are two files available, for example,
nc-tr-97-002-title.ps.Z
nc-tr-97-002-body.ps.Z
The first contains the title page while the second contains the body 
of the report. The single command,
ftp> mget nc-tr-97-002*
will prompt you for the files you require.

A full list of the currently available Technical Reports in the 
Series is held in a file `abstracts' in the same directory.

The files may also be accessed via WWW starting from the NeuroCOLT 
homepage:

http://www.dcs.rhbnc.ac.uk/research/compint/neurocolt

or directly to the archive:
ftp://ftp.dcs.rhbnc.ac.uk/pub/neurocolt/tech_reports

Best wishes
John Shawe-Taylor