distributed vs. local encoding schemes

Fri Jun 7 17:26:05 EDT 1991

I ran some back-prop spelling correction experiments a few years ago
in which one of the control variables was the use of distributed vs.
local encoding schemes for both input and output representations. 
Local encodings schemes were clear winners in both speed of learning
and performance (correction accuracy for novel misspellings).

To clarify, a local output scheme was simply a 1-of-n vector (n=200)
where each node represented one word in the lexicon;  a "semi-local"
input scheme was a 15*30=450-unit vector where each 30-unit block 
locally encoded one letter in a word of up to 15 characters.  This 
positionally-encoded input scheme was thus local w.r.t individual
letters in a word but distributed w.r.t the whole word.  (Incidentally,
the nets took slightly longer to learn to correct the shift-variant
insertion and deletion errors, but they eventually learned them as
well as the shift-invariant substitution and transposition errors.) 
The distributed encoding schemes were m-distance lexicodes, where
m is the Hamming distance btwn codes.  Thus lexicode-1 is just a
binary number code.  I tried lexicodes of m=1,2,3 and 4 for both
output words and input letters.  Both speed of learning and correction 
accuracy improved linearly with increasing m.  These results were 
published in a paper that appeared in the U.S. Post Office Avanced 
Technology Conference in May of 1988.  My only interpretation of the 
results is that local encoding schemes simplify the learning task 
for nets; I'm convinced that distributed schemes are essential
for cognitive processes such as semantic representation at least,
due to the need for multi-dimensional semantic access and association.

As an epilog, I ran a few more experiments afterword that left
me with a small puzzle.  In the above experiments I had also found 
that performance improved as the number of hidden nodes increased up
to about n(=200) and then leveled off.  Afterwords, I tested the
local net with the 450-unit positionally-encoded input scheme and 
NO hidden nodes and found performance equal to or better than any net 
with a hidden layer and much faster learning.  But when I tried a 
shift-invariant input encoding scheme, in which misspellings were
encoded by a 420-unit vector representing letter bigrams and unigrams,
I found similarly good performance for nets with hidden layers but 
miserable performance for a net with no hidden layer.  Apparently, 
the positionally-encoded input scheme yields a set of linearly-
separable input classes but the shift-invariant scheme does not.
It's still not clear to me why this is?

Karen Kukich
kukich at bellcore.com