Thesis on Query Learning available
P Sollich
pkso at castle.ed.ac.uk
Sat Dec 16 10:06:41 EST 1995
FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/Thesis/sollich.thesis.tar.Z
Dear fellow connectionists,
the following Ph.D. thesis is now available for copying from the
neuroprose archive:
ASKING INTELLIGENT QUESTIONS ---
THE STATISTICAL MECHANICS OF QUERY LEARNING
Peter Sollich
Department of Physics
University of Edinburgh, U.K.
Abstract:
This thesis analyses the capabilities and limitations of query learning
by using the tools of statistical mechanics to study learning in
feed-forward neural networks.
In supervised learning, one of the central questions is the issue of
generalization: Given a set of training examples in the form of
input-output pairs produced by an unknown {\em teacher} rule, how can
one generate a {\em student} which {\em generalizes}, i.e., which
correctly predicts the outputs corresponding to inputs not contained in
the training set? The traditional paradigm has been to study learning
from {\em random examples}, where training inputs are sampled randomly
from some given distribution. However, random examples contain
redundant information, and generalization performance can thus be
improved by {\em query learning}, where training inputs are chosen such
that each new training example will be maximally `useful' as measured by
a given {\em objective function}.
We examine two common kinds of queries, chosen to optimize the objective
functions, generalization error and entropy (or information),
respectively. Within an extended Bayesian framework, we use the
techniques of statistical mechanics to analyse the average case
generalization performance achieved by such queries in a range of
learning scenarios, in which the functional forms of student and teacher
are inspired by models of neural networks. In particular, we study how
the efficacy of query learning depends on the form of teacher and
student, on the training algorithm used to generate students, and on the
objective function used to select queries. The learning scenarios
considered are simple but sufficiently generic to allow general
conclusions to be drawn.
We first study perfectly learnable problems, where the student can
reproduce the teacher exactly. From an analysis of two simple model
systems, the high-low game and the linear perceptron, we conclude that
query learning is much less effective for rules with continuous outputs
-- provided they are `invertible' in the sense that they can essentially
be learned from a finite number of training examples -- than for rules
with discrete outputs. Queries chosen to minimize the entropy generally
achieve generalization performance close to the theoretical optimum
afforded by minimum generalization error queries, but can perform worse
than random examples in scenarios where the training algorithm is
under-regularized, i.e., has too much `confidence' in corrupted training
data.
For imperfectly learnable problems, we first consider linear students
learning from nonlinear perceptron teachers and show that in this case
the structure of the student space determines the efficacy of queries
chosen to minimize the entropy in {\em student} space. Minimum {\em
teacher} space queries, on the other hand, perform worse than random
examples due to lack of feedback about the progress of the student. For
students with discrete outputs, we find that in the absence of
information about the teacher space, query learning can lead to
self-confirming hypotheses far from the truth, misleading the student to
such an extent that it will not approximate the teacher optimally even
for an infinite number of training examples. We investigate how this
problem depends on the nature of the noise process corrupting the
training data, and demonstrate that it can be alleviated by combining
query learning with Bayesian techniques of model selection. Finally, we
assess which of our conclusions carry over to more realistic neural
networks, by calculating finite size corrections to the thermodynamic
limit results and by analysing query learning in a simple two-layer
neural network. The results suggest that the statistical mechanics
analysis is often relevant to real-world learning problems, and that the
potentially significant improvements in generalization performance
achieved by query learning can be made available, in a computationally
cheap manner, for realistic multi-layer neural networks.
Criticism, comments and suggestions are welcome.
Merry Christmas everyone!
Peter Sollich
--------------------------------------------------------------------------
Peter Sollich Department of Physics
University of Edinburgh
e-mail: P.Sollich at ed.ac.uk Kings Buildings
phone: +44 - (0)131 - 650 5236 Mayfield Road
Edinburgh EH9 3JZ, U.K.
--------------------------------------------------------------------------
RETRIEVAL INSTRUCTIONS: Get `sollich.thesis.tar.Z' from the `Thesis'
subdirectory of the neuroprose archive. Uncompress, and unpack the
resulting tar file (on UNIX: uncompress sollich.thesis.tar.Z; tar xf - <
sollich.thesis.tar). This will yield the postscript files listed below.
Contact me if there are any problems with retrieval and or printing.
QUICK GUIDE for busy readers: For a first look, see sollich_title.ps (has
abstract and table of contents). File sollich_chapter1.ps contains a
general introduction to query learning and an overview of the
literature. Finally, for a summary of the main results and open
questions, see sollich_chapter9.ps.
LIST OF FILES:
------------------------------------------------------------------------------
Filename No of Size in KB Contents
pages (compressed/
uncompressed)
------------------------------------------------------------------------------
sollich_title.ps 8 37/ 75 Title, Declaration,
Acknowledgements, Publications,
Abstract, Table of contents
------------------------------------------------------------------------------
sollich_chapter1.ps 8 48/ 98 Introduction
------------------------------------------------------------------------------
sollich_chapter2.ps 10 48/ 101 A probabilistic framework for
query selection
------------------------------------------------------------------------------
sollich_chapter3.ps 21 128/ 376 Perfectly learnable problems:
Two simple examples
------------------------------------------------------------------------------
sollich_chapter4.ps 19 135/ 337 Imperfectly learnable problems:
Linear students
------------------------------------------------------------------------------
sollich_chapter5.ps 40 228/ 565 Query learning assuming the
inference model is correct
------------------------------------------------------------------------------
sollich_chapter6.ps 12 244/1050 Combining query learning and
model selection
------------------------------------------------------------------------------
sollich_chapter7.ps 20 217/ 558 Towards realistic neural networks I:
Finite size effects
------------------------------------------------------------------------------
sollich_chapter8.ps 24 136/ 299 Towards realistic neural networks II:
Multi-layer networks
------------------------------------------------------------------------------
sollich_chapter9.ps 5 31/ 59 Summary and Outlook
------------------------------------------------------------------------------
sollich_bib.ps 8 37/ 68 Bibliography
------------------------------------------------------------------------------
More information about the Connectionists
mailing list