Technical Report Series in Neural and Computational Learning

Sat Mar 25 11:56:08 EST 1995

The European Community ESPRIT Working Group in Neural and Computational 
       Learning Theory (NeuroCOLT): several new reports available

----------------------------------------
NeuroCOLT Technical Report NC-TR-94-018:
----------------------------------------
On the Complexity of Function Learning
by Peter Auer, Technische Universitaet Graz,
   Philip M. Long, Duke University,
   Wolfgang Maass, Technische Universitaet Graz,
   Gerhard J. Woeginger, Technische Universitaet Graz

Abstract:
The majority of results in computational learning theory are concerned
with concept learning, i.e. with the special case of function learning
for classes of functions with range $\{ 0,1 \}$. Much less is known
about the theory of learning functions with a larger range such as N or
R. In particular relatively few results exist about the general
structure of common models for function learning, and there are only
very few nontrivial function classes for which positive learning
results have been exhibited in any of these models.

We introduce in this paper the notion of a binary branching adversary
tree for function learning, which allows us to give a somewhat
surprising equivalent characterization of the optimal learning cost for
learning a class of real-valued functions (in terms of a max-min
definition which does not involve any ``learning'' model).

Another general structural result of this paper relates the cost for
learning a union of function classes to the learning costs for the
individual function classes.

Furthermore, we exhibit an efficient learning algorithm for learning
convex piecewise linear functions from $R^d$ into $R$.  Previously, the
class of linear functions from $R^d$ into $R$ was the only class of
functions with multi-dimensional domain that was known to be learnable
within the rigorous framework of a formal model for on-line learning.

Finally we give a sufficient condition for an arbitrary class $\F$ of
functions from $R$ into $R$ that allows us to learn the class of all
functions that can be written as the pointwise maximum of $k$ functions
from $\F$.  This allows us to exhibit a number of further nontrivial
classes of functions from $R$ into $R$ for which there exist efficient
learning algorithms.

----------------------------------------
NeuroCOLT Technical Report NC-TR-94-019:
----------------------------------------
Neural Nets with Superlinear VC-Dimension
by Wolfgang Maass, Institute for Theoretical Computer Science,
   Technische Universitaet Graz, Klosterwiesgasse 32/2, A-8010 Graz,

Abstract:
It has been known for quite a while that the Vapnik-Chervonenkis
dimension (VC-dimension) of a feedforward neural net with linear
threshold gates is at most $O(w \cdot \log w)$, where $w$ is the total
number of weights in the neural net. We show in this paper that this
bound is in fact asymptotically optimal.  More precisely, we exhibit
for any depth $d\geq 3$ a large class of feedforward neural nets of
depth $d$ with $w$ weights that have VC-dimension $\Omega(w\cdot \log
w)$. This lower bound holds even if the inputs are restricted to
boolean values.  The proof of this result relies on a new method that
allows us to encode more ``program-bits'' in the weights of a neural
net than previously thought possible.

----------------------------------------
NeuroCOLT Technical Report NC-TR-94-020:
----------------------------------------
Efficient Agnostic PAC-Learning with Simple Hypotheses
by Wolfgang Maass, Institute for Theoretical Computer Science,
   Technische Universitaet Graz, Klosterwiesgasse 32/2, A-8010 Graz,

Abstract:
We exhibit efficient algorithms for agnostic PAC-learning with
rectangles, unions of two rectangles, and unions of $k$ intervals as
hypotheses. These hypothesis  classes are of some interest from the
point of view of applied machine learning, because empirical studies
show that hypotheses of this simple type (in just one or two of the
attributes) provide good prediction rules for various real-world
classification problems. In addition, optimal hypotheses of this type
may provide valuable heuristic insight into the structure of a
real-world classification problem.

The algorithms that are introduced in this paper make it feasible to
compute optimal hypotheses of this type for a training set of several
hundred examples. We also exhibit an approximation algorithm that can
compute near optimal hypotheses for much larger datasets.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-002:
----------------------------------------
Agnostic PAC-Learning of Functions on Analog Neural Nets
by Wolfgang Maass, Institute for Theoretical Computer Science,
   Technische Universitaet Graz, Klosterwiesgasse 32/2, A-8010 Graz,
   Austria

Abstract:
We consider learning on multi-layer neural nets with piecewise
polynomial activation functions and a fixed number $k$ of numerical
inputs. We exhibit arbitrarily large network architectures for which
efficient and provably successful learning algorithms exist in the
rather realistic refinement of Valiant's model for probably
approximately correct learning (``PAC-learning'')  where no a-priori
assumptions are required about the ``target function'' (agnostic
learning), arbitrary noise is permitted in the training sample, and the
target outputs as well as the network outputs may be arbitrary reals.
The number of computation steps of the learning  algorithm LEARN that
we construct is bounded by a polynomial in the bit-length $n$ of the
fixed number of input variables, in the bound $s$ for the allowed
bit-length of weights, in $\frac{1} {\varepsilon}$, where $\varepsilon$
is some arbitrary given bound for the true error of the neural net
after training, and in $\frac{1}{\delta}$ where ${\delta}$ is some
arbitrary given bound for the probability that the learning algorithm
fails for a randomly drawn training sample.  However the computation
time of LEARN is exponential in the number of weights of the considered
network architecture, and therefore only of interest for neural nets of
small size.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-003:
----------------------------------------
Perspectives of Current Research about the Complexity of Learning on
Neural Nets
by Wolfgang Maass, Institute for Theoretical Computer Science,
   Technische Universitaet Graz, Klosterwiesgasse 32/2, A-8010 Graz,
   Austria

Abstract:
This paper discusses within the framework of computational learning
theory the current state of knowledge and some open problems
in three areas of research about learning on feedforward neural nets:
\begin{itemize}
\item[--]Neural nets that learn from mistakes
\item[--]Bounds for the Vapnik-Chervonenkis dimension of neural nets
\item[--]Agnostic PAC-learning of functions on neural nets.
\end{itemize}

All relevant definitions are given in this paper, and no previous
knowledge about computational learning theory or neural nets is
required.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-005:
----------------------------------------
Simulating Access to Hidden Information while Learning
by Peter Auer, Technische Universit\"{a}t Graz,
   Philip M. Long, Duke University

Abstract:
We introduce a new technique which enables a learner without access to
hidden information to learn nearly as well as a learner with access to
hidden information.  We apply our technique to solve an open problem of
Maass and Tur\'{a}n, showing that for any concept class $F$, the least
number of queries sufficient for learning $F$ by an algorithm which has
access only to arbitrary equivalence queries is at most a factor of
$1/\log_2 (4/3)$ more than the least number of queries sufficient for
learning $F$ by an algorithm which has access to both arbitrary
equivalence queries and membership queries.  Previously known results
imply that the $1/\log_2 (4/3)$ in our bound is best possible.  We
describe analogous results for two generalizations of this model to
function learning, and apply those results to bound the difficulty of
learning in the harder of these models in terms of the difficulty of
learning in the easier model.  We bound the difficulty of learning
unions of $k$ concepts from a class $F$ in terms of the difficulty of
learning $F$.  We bound the difficulty of learning in a noisy
environment for deterministic algorithms in terms of the difficulty of
learning in a noise-free environment.  We apply a variant of our
technique to develop an algorithm transformation that allows
probabilistic learning algorithms to nearly optimally cope with noise.
A second variant enables us to improve a general lower bound of
Tur\'{a}n for the PAC-learning model (with queries).  Finally, we show
that logarithmically many membership queries never help to obtain
computationally efficient learning algorithms.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-006:
----------------------------------------
 A Stop Criterion for the Boltzmann Machine Learning Algorithm
by Berthold Ruf, Technical University Graz

Abstract:
Ackley, Hinton and Sejnowski introduced a very interesting and
versatile learning algorithm for the Boltzmann machine (BM). However it
is difficult to decide when to stop the learning procedure. Experiments
have shown that the BM may destroy previously achieved results when the
learning process is executed for too long.  This paper introduces a new
quantity, the conditional divergence, measuring the learning success
for the inputs of the data set. To demonstrate its use, some
experiments are presented, based on the Encoder Problem.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-007:
----------------------------------------
VC-Dimensions for Graphs
by Evangelos Kranakis, Carleton University,
   Danny Krizanc, Carleton University,
   Berthold Ruf, Technical University Graz,
   Jorge Urrutia, University of Ottawa,
   Gerhard J. Woeginger, Technical University Graz

Abstract:
We study set systems over the vertex set (or edge set) of some graph
that are induced by special graph properties like clique,
connectedness, path, star, tree, etc.  We derive a variety of
combinatorial and computational results on the $\vc$
(Vapnik-Chervonenkis) dimension of these set systems.

For most of these set systems (e.g.\ for the systems induced by trees,
connected sets, or paths), computing the $\vc$-dimension is an
$\np$-hard problem.  Moreover, determining the $\vc$-dimension for set
systems induced by neighborhoods of single vertices is complete for the
class $\lognp$.  In contrast to these intractability results, we show
that the $\vc$-dimension for set systems induced by stars is computable
in polynomial time.  For set systems induced by paths or cycles, we
determine the extremal graphs $G$ with the minimum number of edges such
that $\vc_{{\cal P}}(G)\ge k$.  Finally, we show a close relation
between the $\vc$-dimension of set systems induced by connected sets of
vertices and the $\vc$ dimension of set systems induced by connected
sets of edges; the argument is done via the line graph of the
corresponding graph.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-008:
----------------------------------------
Computing the Maximum Bichromatic Discrepancy, with applications to 
Computer Graphics and Machine Learning
by David P. Dobkin, Princeton University,
   Dimitrios Gunopulos, Princeton University,
   Wolfgang Maass, Technische Universitaet Graz,

Abstract:
Computing the maximum bichromatic discrepancy is an interesting
theoretical problem with important applications in computational
learning theory, computational geometry and computer graphics.  In this
paper we give algorithms to compute the maximum bichromatic discrepancy
for simple geometric ranges, including rectangles and halfspaces.  In
addition, we give extensions to other discrepancy problems.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-009:
----------------------------------------
 A Finite Automaton Learning System using Genetic Programming
by Herman Ehrenburg, CWI,
   Jeroen van Maanen, CWI

Abstract:
This report describes the Finite Automaton Learning System (FALS), an
evolutionary system that is designed to find small digital circuits that
duplicate the behaviour of a given finite automaton.  FALS is developed 
with the aim to get a better insight in learning systems. It is also
targeted to become a general purpose automatic programming system.

The system is based on the genetic programming approach to evolve
programs for tasks instead of explicitly programming them. A
representation of digital circuits suitable for genetic programming is
given as well as an extended crossover operator that alleviates the need
to specify an upper bound for the number of states in advance.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-010:
----------------------------------------
On Specifying Boolean Functions by Labelled Examples
by Martin Anthony, London School of Economics,
   Graham Brightwell, London School of Economics,
   John Shawe-Taylor, Royal Holloway, University of London

Abstract:
We say  a function $t$ in a set $H$ of $\{0,1\}$-valued functions
defined on a set $X$ is {\it specified} by $S \subseteq X$ if the only
function in $H$ which agrees with $t$ on $S$ is $t$ itself. The {\it
specification number} of $t$ is the least cardinality of such an $S$.
For a general finite class of functions, we show that the specification
number of any function in the class is at least equal to a parameter
from~\cite{RS} known as the testing dimension of the class.  We
investigate in some detail the specification numbers of functions  in
the set of linearly separable Boolean functions of $n $
variables---those functions $f$ such that $f^{-1}(\{0\})$ and
$f^{-1}(\{1\})$ can be separated by a hyperplane.  We present general
methods for finding upper bounds on these specification numbers and we
characterise those functions which have largest specification number.
We obtain a general lower bound on the specification number and we show
that for all {\it nested} functions, this lower bound is attained. We
give a simple proof of the fact that for any linearly separable Boolean
function, there is exactly one set of examples of minimal cardinality
which specifies the function. We discuss those functions  which have
limited dependence, in the sense that some of the variables are
redundant (that is, there are irrelevant attributes), giving tight
upper and lower bounds on the specification numbers of such functions.
We then bound the average, or expected, number of examples needed to
specify a linearly separable Boolean function.  In the final section of
the paper, we address the complexity of computing specification numbers
and related parameters.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-012:
----------------------------------------
On the relations between discrete and continuous complexity theory
by Klaus Meer, RWTH Aachen

Abstract:
Relations between discrete and continuous complexity models are
considered.  The present paper is devoted to combine both models. In
particular we analyze the 3-Satisfiability problem.  The existence of
fast decision procedures for this problem over the reals is examined
based on certain conditions on the discrete setting.  Moreover we study
the behaviour of exponential time computations over the reals depending
on the real complexity of 3-Satisfiability. This will be done using
tools from complexity theory over the integers.

----------------------------------------
NeuroCOLT Technical Report NC-TR-95-014:
----------------------------------------
Grundlagen der reellen Komplexit\"atstheorie
by Klaus Meer, RWTH Aachen

Abstract: (in English - text is in German)
Complexity theory deals with the question of classifying mathematical
problems according to the difficulty they provide for algorithmic
solutions. This is generally related to
\begin{itemize}
\item finding efficient solution-algorithms,
\item analyzing structural properties which make problems difficult to
solve and
\item comparing problems.
\end{itemize}
Contrary to the situation in classical complexity theory the real
approach is interested in studying problems defined on continuous
structures.  Starting point for the present lecture notes will be the
model of a real Turing-machine as it was introduced 1989 by Blum, Shub,
and Smale. We will begin with a formal definition of notions like
computability, decidability and efficiency. This gives rise to consider
the complexity classes $P_{\R}$ and $NP_{\R}$. After analyzing basic
properties (reducibility, $NP_{\R}-$completeness,existence of complete
problems) we'll care about decidability of problems in class
$NP_{\R}$. To this aim results on quantifier elimination and on the
structure of semialgebraic sets are investigated. Finally, methods for
proving lower bounds are presented. For this purpose we show a real
version of Hilbert's Nullstellensatz.

Table of contents:
0. Introduction
1. The computational model of Blum, Shub, and Smale
2. Complexity theory for the BSS-model
3. Existential theory over the reals
4. Lower bounds
   References

-----------------------
The Report NC-TR-94-018 can be accessed and printed as follows 

% ftp cscx.cs.rhbnc.ac.uk  (134.219.200.45)
Name: anonymous
password: your full email address
ftp> cd pub/neurocolt/tech_reports
ftp> binary
ftp> get nc-tr-94-018.ps.Z
ftp> bye
% zcat nc-tr-94-018.ps.Z | lpr -l

Similarly for the other technical report.

Uncompressed versions of the postscript files have also been
left for anyone not having an uncompress facility.

A full list of the currently available Technical Reports in the 
Series is held in a file `abstracts' in the same directory.

The files may also be accessed via WWW starting from the NeuroCOLT homepage:

http://www.dcs.rhbnc.ac.uk/neurocolt.html

Best wishes
John Shawe-Taylor