Terminology (was: batch-mode parallel implementations)

prechelt@ira.uka.de prechelt at ira.uka.de
Thu Oct 24 11:16:36 EDT 1991


I noticed a lot of inconsistent use of terminology concerning 
the frequency of weight update in Backprop learning.

I would like to make a suggestion for the meaning of 
certain terms, that is not based on the democratic aspect
of what is used most often, but on investigations in 
a dictionary:

There are three cases:

(a) update after only ONE single example has been seen
(b) update after ALL of the examples have been seen
(c) something in between


The terms used are epoch, block, batch, sample, continuous, on-line.

An EPOCH is (thus saith my dictionary) not only a section of 
time or history (an "era"), but also a turning point.
This should make EPOCH the preferred term for case (b),
because the end of the training set clearly is such a
turning point.

A BATCH is a set of some size, a pile of things or so; with
some inherent need for the information about its size.
Thus it is a good candidate for case (c) and there should 
always be some indication of the size either as an absolute
number, as a fraction of training set size or by some 
qualitative criterion.

BLOCK could be a perhaps even better word for the same, for
computer scientists, because blocks are always groups of a certain 
number of similar objects and the word does not have the danger of
misunderstanding that stems from the term "batch-processing" from
the early days of data processing, where everything was being
executed completely, before you received the results.
Unfortunately, for reasons of other connotations, confusion of 
Block with Epoch is nevertheless very likely.

A SAMPLE is a part picked from a whole, usually for test purposes.
Although it is not absolutely clear, that a sample is just a
single object, in my ears the word tends to sound so.
Thus it should be indicating case (a).

CONTINOUS is a bad term to use, because the individual 
examples are not cut into parts, so BP is always discrete.

ON-LINE usually means something like "available without physical
action, merely by execution of software" and is of course
completely inappropriate to learning, except perhaps where there is 
an infinite training set constantly floating through the machine.


SUMMARY:
--------

Let us use 'Epoch' for (b), 'Batch' for (c) and 'Sample' for (a).
Let us avoid 'continous', 'on-line' and 'block' as much as possible.


I think as scientists we should exercise some discipline in the use
of language, especially when confusion is as close as in the
area of learning systems...  :->

Please direct all comments and flames to me.

  Lutz


Lutz Prechelt   (email: prechelt at ira.uka.de)            | Whenever you 
Institut fuer Programmstrukturen und Datenorganisation  | complicate things,
Universitaet Karlsruhe;  D-7500 Karlsruhe 1;  Germany   | they get
(Voice: ++49/721/608-4317, FAX: ++49/721/697760)        | less simple.


More information about the Connectionists mailing list