Encoding missing values

Zoubin Ghahramani zoubin at psyche.mit.edu
Fri Feb 4 11:04:32 EST 1994


Dear Lutz, Thierry, Karun, and connectionists,

I have also been looking into the issue of encoding and learning from
missing values in a neural network. The issue of handling missing
values has been addressed extensively in the statistics literature for
obvious reasons.  To learn despite the missing values the data has to
be filled in, or the missing values integrated over. The basic
question is how to fill in the missing data. There are many different
methods for doing this in stats (mean imputation, regression
imputation, Bayesian methods, EM, etc.). For good reviews see (Little
and Rubin 1987; Little, 1992).

I do not in general recommend encoding "missing" as yet another value
to be learned over. Missing means something in a statistical sense --
that the input could be any of the values with some probability
distribution. You could, for example, augment the original data
filling in different values for the missing data points according to a
prior distribution. Then the training would assign different weights
to the artificially filled-in data points depending on how well they
predict the output (their posterior probability). This is essentially
the method proposed by Buntine and Weigand (1991). Other approaches
have been proposed by Tresp et al. (1993) and Ahmad and Tresp (1993).

I have just written a paper on the topic of learning from incomplete
data. In this paper I bring a statistical algorithm for learning from
incomplete data, called EM, into the framework of nonlinear function
approximation and classification with missing values. This approach
fits the data iteratively with a mixture model and uses that same
mixture model to effectively fill in any missing input or output
values at each step. 

You can obtain the preprint by 
	ftp psyche.mit.edu
	login: anonymous
	cd pub
	get zoubin.nips93.ps
To obtain code for the algorithm please contact me directly.

Zoubin Ghahramani
zoubin at psyche.mit.edu

-----------------------------------------------------------------------
Ahmad, S and Tresp, V (1993) "Some Solutions to the Missing Feature
Problem in Vision." In Hanson, S.J., Cowan, J.D., and Giles, C.L.,
editors, Advances in Neural Information Processing Systems 5. Morgan
Kaufmann Publishers, San Mateo, CA.

Buntine, WL, and Weigand, AS (1991) "Bayesian back-propagation." Complex
Systems. Vol 5 no 6 pp 603-43

Ghahramani, Z and Jordan MI (1994) "Supervised learning from
incomplete data via an EM approach" To appear in Cowan, J.D., Tesauro,
G., and Alspector,J. (eds.). Advances in Neural Information Processing
Systems 6.  Morgan Kaufmann Publishers, San Francisco, CA, 1994.

Little, RJA (1992) "Regression With Missing X's:  A Review." Journal of the
American Statistical Association.  Volume 87, Number 420. pp.
1227-1237

Little, RJA. and Rubin, DB (1987). Statistical Analysis with Missing
Data. Wiley, New York.

Tresp, V, Hollatz J, Ahmad S (1993) "Network structuring and training
using rule-based knowledge." In Hanson, S.J., Cowan, J.D., and
Giles, C.~L., editors,  Advances in Neural Information Processing
Systems 5. Morgan Kaufmann Publishers, San Mateo, CA.


More information about the Connectionists mailing list