Thesis on Query Learning available

Sat Dec 16 10:06:41 EST 1995

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/Thesis/sollich.thesis.tar.Z

Dear fellow connectionists,

the following Ph.D. thesis is now available for copying from the
neuroprose archive:

                      ASKING INTELLIGENT QUESTIONS ---
                THE STATISTICAL MECHANICS OF QUERY LEARNING

                              Peter Sollich
                          Department of Physics
                       University of Edinburgh, U.K.

                                 Abstract:		

  This thesis analyses the capabilities and limitations of query learning
  by using the tools of statistical mechanics to study learning in
  feed-forward neural networks.

  In supervised learning, one of the central questions is the issue of
  generalization: Given a set of training examples in the form of
  input-output pairs produced by an unknown {\em teacher} rule, how can
  one generate a {\em student} which {\em generalizes}, i.e., which
  correctly predicts the outputs corresponding to inputs not contained in
  the training set? The traditional paradigm has been to study learning
  from {\em random examples}, where training inputs are sampled randomly
  from some given distribution.  However, random examples contain
  redundant information, and generalization performance can thus be
  improved by {\em query learning}, where training inputs are chosen such
  that each new training example will be maximally `useful' as measured by
  a given {\em objective function}. 

  We examine two common kinds of queries, chosen to optimize the objective
  functions, generalization error and entropy (or information),
  respectively.  Within an extended Bayesian framework, we use the
  techniques of statistical mechanics to analyse the average case
  generalization performance achieved by such queries in a range of
  learning scenarios, in which the functional forms of student and teacher
  are inspired by models of neural networks.  In particular, we study how
  the efficacy of query learning depends on the form of teacher and
  student, on the training algorithm used to generate students, and on the
  objective function used to select queries.  The learning scenarios
  considered are simple but sufficiently generic to allow general
  conclusions to be drawn. 

  We first study perfectly learnable problems, where the student can
  reproduce the teacher exactly.  From an analysis of two simple model
  systems, the high-low game and the linear perceptron, we conclude that
  query learning is much less effective for rules with continuous outputs
  -- provided they are `invertible' in the sense that they can essentially
  be learned from a finite number of training examples -- than for rules
  with discrete outputs.  Queries chosen to minimize the entropy generally
  achieve generalization performance close to the theoretical optimum
  afforded by minimum generalization error queries, but can perform worse
  than random examples in scenarios where the training algorithm is
  under-regularized, i.e., has too much `confidence' in corrupted training
  data. 

  For imperfectly learnable problems, we first consider linear students
  learning from nonlinear perceptron teachers and show that in this case
  the structure of the student space determines the efficacy of queries
  chosen to minimize the entropy in {\em student} space.  Minimum {\em
  teacher} space queries, on the other hand, perform worse than random
  examples due to lack of feedback about the progress of the student.  For
  students with discrete outputs, we find that in the absence of
  information about the teacher space, query learning can lead to
  self-confirming hypotheses far from the truth, misleading the student to
  such an extent that it will not approximate the teacher optimally even
  for an infinite number of training examples.  We investigate how this
  problem depends on the nature of the noise process corrupting the
  training data, and demonstrate that it can be alleviated by combining
  query learning with Bayesian techniques of model selection.  Finally, we
  assess which of our conclusions carry over to more realistic neural
  networks, by calculating finite size corrections to the thermodynamic
  limit results and by analysing query learning in a simple two-layer
  neural network.  The results suggest that the statistical mechanics
  analysis is often relevant to real-world learning problems, and that the
  potentially significant improvements in generalization performance
  achieved by query learning can be made available, in a computationally
  cheap manner, for realistic multi-layer neural networks. 

Criticism, comments and suggestions are welcome.
Merry Christmas everyone!

Peter Sollich

--------------------------------------------------------------------------
 Peter Sollich                           Department of Physics
                                         University of Edinburgh
 e-mail: P.Sollich at ed.ac.uk              Kings Buildings
 phone: +44 - (0)131 - 650 5236          Mayfield Road
                                         Edinburgh EH9 3JZ, U.K.
--------------------------------------------------------------------------

RETRIEVAL INSTRUCTIONS: Get `sollich.thesis.tar.Z' from the `Thesis'
subdirectory of the neuroprose archive.  Uncompress, and unpack the
resulting tar file (on UNIX: uncompress sollich.thesis.tar.Z; tar xf - <
sollich.thesis.tar).  This will yield the postscript files listed below. 
Contact me if there are any problems with retrieval and or printing. 

QUICK GUIDE for busy readers: For a first look, see sollich_title.ps (has
abstract and table of contents).  File sollich_chapter1.ps contains a
general introduction to query learning and an overview of the
literature.  Finally, for a summary of the main results and open
questions, see sollich_chapter9.ps.

LIST OF FILES:
------------------------------------------------------------------------------
Filename             No of  Size in KB   Contents
                     pages  (compressed/
                            uncompressed)
------------------------------------------------------------------------------
sollich_title.ps     8       37/  75     Title, Declaration, 
                                              Acknowledgements, Publications, 
                                              Abstract, Table of contents
------------------------------------------------------------------------------
sollich_chapter1.ps  8       48/  98     Introduction
------------------------------------------------------------------------------
sollich_chapter2.ps  10      48/ 101     A probabilistic framework for 
                                              query selection
------------------------------------------------------------------------------
sollich_chapter3.ps  21     128/ 376     Perfectly learnable problems: 
                                              Two simple examples
------------------------------------------------------------------------------
sollich_chapter4.ps  19     135/ 337     Imperfectly learnable problems: 
                                              Linear students
------------------------------------------------------------------------------
sollich_chapter5.ps  40     228/ 565     Query learning assuming the 
                                              inference model is correct
------------------------------------------------------------------------------
sollich_chapter6.ps  12     244/1050     Combining query learning and 
                                              model selection
------------------------------------------------------------------------------
sollich_chapter7.ps  20     217/ 558     Towards realistic neural networks I:
                                              Finite size effects
------------------------------------------------------------------------------
sollich_chapter8.ps  24     136/ 299     Towards realistic neural networks II:
                                              Multi-layer networks
------------------------------------------------------------------------------
sollich_chapter9.ps  5       31/  59     Summary and Outlook
------------------------------------------------------------------------------
sollich_bib.ps       8       37/  68     Bibliography
------------------------------------------------------------------------------