batch & on-line training

Sat Oct 19 19:50:16 EDT 1991

On the topic of batch versus on-line training, Kamil at apple.com writes:

>  ... there doesn't seem to be a
> compelling argument for preferring one or the other IN PRINCIPLE.

I would like to turn the dichotomy into a trichotomy and argue that there
is an 'in principle' reason for a preference.

I want to add one-shot learning, which I define (on the spur of the moment)
to be successful learning from one occasion of exposure to the input.

This phenomenon is known to happen in animals (e.g. it can happen in taste
aversion conditioning) and can happen in humans (e.g. recognition of an
abstract painting seen only once before).  One-shot learning becomes critical
if you are trying to perform 'cognitive' tasks - when you learn the route to
a new office you don't need hundreds or thousands of exposures to get it
right.

Obviously, one-shot learning can't be expected to happen in all circumstances:
you have to be working in a constrained problem domain that can support it
and the learner has to have the background knowledge that will support what is
to be learned.  Most of the work that is done with backprop and its relatives
starts with near to a tabula rasa and all the time and effort goes into
creating the universe from only the input data.

Obviously, techniques do exist for one-shot learning: e.g. simple delta rule
with a learning rate of 1.  The problem is that they fail on the problems
that people regard as interesting - inputs non-orthogonal and hidden units
required.  The challenge is to find a one-shot learning algorithm that can
work on interesting problems.  I believe that this will require strong
architectural and problem data constraints.

I see the current heavy use of gradient-descent techniques as analogous to
the period in the history of AI when researchers looked for general problem
solving techniques that were universally applicable.  General techniques
worked on toy problems but rapidly bogged down on real problems.  In BP,
we have a technique for learning arbitrary mappings, and we pay for it with
excruciatingly slow learning.

To summarise: IF you want to perform cognitive tasks THEN 'in principle' one
shot learning is the only training regime that is acceptable (although slower
learning may be required to get the net to the point where it can learn in
one shot).  All you have to do is invent a good one-shot learning scheme :-).

Ross Gayler
ross at psych.psy.uq.oz.au