Identity Mappings

Geoffrey Hinton hinton at ai.toronto.edu
Fri Feb 10 22:49:34 EST 1989


The potential advantage of using "encoder" networks is that the code in the
middle can be developed without any supervision.

If the output and hidden units are non-linear, the codes do NOT just span the
same subspace as the principal components.  The difference between a linear
approach like principal components and a non-linear approach is especially
significant if there is more than one hidden layer.

If the codes from several encoder networks are then used as the input vector
for a "higher level" network, one can get a multilayer, modular, unsupervised
learning procedure that should scale up better to really large problems.
Ballard (AAAI proceedings, 1987) has investigated this approach for a simple
problem and has introduced the interesting idea that as the learning proceeds,
the central code of each encoder module should give greater weight to the
error feedback coming from higher level modules that use this code as input
and less weight to the error feedback coming from the output of the code's own
module.

However, to the best of my knowledge, nobody has yet shown that it really
works well for a hard task.  One problem, pointed out by Steve Nowlan, is that
the codes formed in a bottleneck tend to "encrypt" the information in a
compact form that is not necessarily helpful for further processing.  It may
be worth exploring encoders in nets with many hidden layers that are given
inputs from real domains, but my own current view is that to achieve modular
unsupervised learning we probably need to optimize some other function which
does not simply ensure good reconstruction of the input vector.

Geoff Hinton


More information about the Connectionists mailing list