Encoding missing values

Fri Feb 4 10:25:51 EST 1994

    There is at least one kind of network that has no problem (in
    principle) with missing inputs, namely a Boltzmann machine.
    You just refrain from clamping the input node whose value is
    missing, and treat it like an output node or hidden unit.

    This may seem to be irrelevant to anything other than Boltzmann
    machines, but I think it could be argued that nothing very much
    simpler is capable of dealing with the problem.  When you ask
    a network to handle missing inputs, you are in effect asking it
    to do pattern completion on the input layer, and for this a
    Boltzmann machine or some other sort of attractor network would
    seem to be required. 

Good point, but perhaps in need of clarification for some readers:

There are two ways of training a Boltzmann machine.  In one (the original
form), there is no distinction between input and output units.  During
training we alternate between an instruction phase, in which all of the
externally visible units are clamped to some pattern, and a normalization
phase, in which the whole network is allow to run free.  The idea is to
modify the weights so that, when running free, the external units assume
the various pattern values in the training set in their proper frequencies.
If only some subset of the externally visible units are clamped to certain
values, the net will produce compatible completions in the other units,
again with frequencies that match this part of the training set.

A net trained in this way will (in principle -- it might take a *very* long
time for anything complicated) do what you suggest: Complete an "input"
pattern and produce a compatible output at the same time.  This works even
if the input is *totally* missing.

I believe it was Geoff Hinton who realized that a Boltzmann machine could
be trained more efficiently if you do make a distinction between input and
output units, and don't waste any of the training effort learning to
reconstruct the input.  In this model, the instruction phase clamps both
input and output units to some pattern, while the normalization phase
clamps only the input units.  Since the input units are correct in both
cases, all of the networks learning power (such as it is) goes into
producing correct patterns on the output units.  A net trained in this way
will not do input-completion.

I bring this up because I think many people will only have seen the latter
kind of Boltzmann training, and will therefore misunderstand your
observation.

By the way, one alternative method I have seen proposed for reconstructing
missing input values is to first train an auto-encoder (with some degree of
bottleneck to get generalization) on the training set, and then feed the
output of this auto-encoder into the classification net.  The auto-encoder
should be able to replace any missing values with some degree of accuracy.
I haven't played with this myself, but it does sound plausible.  If anyone
can point to a good study of this method, please post it here or send me
E-mail.

-- Scott

===========================================================================
Scott E. Fahlman			Internet:  sef+ at cs.cmu.edu
Senior Research Scientist		Phone:     412 268-2575
School of Computer Science              Fax:       412 681-5739
Carnegie Mellon University		Latitude:  40:26:33 N
5000 Forbes Avenue			Longitude: 79:56:48 W
Pittsburgh, PA 15213
===========================================================================