TR Available: Learning Internal Representations

Fri Dec 23 06:58:43 EST 1994

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/Thesis/baxter.thesis.ps.Z

--------------------------------------------------------

The following paper is now available:

Learning Internal Representations  [112 pages]
Jonathan Baxter

This is a preliminary draft of my PhD thesis. Note that it is in the
Thesis subdirectory of the neuroprose archive. It is in the process 
of being broken into several pieces for submission to Information and
Computation, Machine Learning and next year's COLT.

I'm afraid I cannot offer hard-copies.

ABSTRACT:
Most machine learning theory and practice is concerned with learning a
single task. In this thesis it is argued that in general there is insufficient
information in a single task for a learner to generalise well and that what
is required for good generalisation is information about {\em many
similar learning tasks}. The information about similar learning tasks forms a
body of prior information that can be used to constrain the hypothesis space
of the learner and cause it to generalise better.
Typical learning scenarios in which there are many similar tasks are image
recognition and speech recognition.

After proving that learning without prior information is impossible except in
the simplest of situations, the concept of the {\em environment} of a learner 
is introduced as a probability measure over the set of learning problems the
learner might be expected to learn. It is shown how a sample from such an
environment can be used to learn a {\em representation}, or recoding of 
the input space that is appropriate for the environment. Learning a
representation can equivalently be thought of as learning the
appropriate features of the environment.

Using Haussler's statistical decision theory framework for
machine learning, rigorous bounds are derived on the sample size
required to ensure good generalisation from a representation learning process.
These bounds show that under certain circumstances learning
a representation appropriate for $n$ tasks reduces the number of examples
required of each task by a factor of $n$. It is argued that environments
such as character recognition and speech recognition fall into the category
of learning problems for which such a reduction is possible.

Once a representation is learnt it can be used to learn {\em novel} tasks
from the same environment, with the result that far fewer examples are
required of the new tasks to ensure good generalisation. Rigorous bounds are
given on the number of tasks and the number of samples from each task required
to ensure that a representation will be a good one for learning novel tasks.

All the results on representation learning are generalised to cover any form
of automated hypothesis space bias that utilises information from similar
learning problems. 

It is shown how gradient-descent based
procedures for training Artificial Neural Networks can be generalised to cover
representation learning. Two experiments using the
new procedure are performed. Both experiments fully support the
theoretical results.

The concept of the environment of a learning process
is applied to the problem of {\em vector quantization} with the result that
a {\em canonical} distortion measure for the quantization process emerges.
This distortion measure is proved to be optimal if the task is to
approximate the functions in the environment.

Finally, the results on vector quantization are reapplied to
representation learning to yield an improved error measure for learning in
classifier environments. An experiment is presented demonstrating the
improvement. 

-------------

Retrieval Intructions:

unix> ftp archive.cis.ohio-state.edu
ftp> Login: anonymous
ftp> Password: e-mail address
ftp> cd pub/neuroprose/Thesis
ftp> binary
ftp> get baxter.thesis.ps.Z
ftp> quit
unix> uncompress baxter.thesis.ps.Z
unix> lpr -s baxter.thesis.ps
---------------

Jonathan Baxter
School of Information Science and Technology,
The Flinders University of South Australia.
jon at maths.flinders.edu.au