Encoding missing values

Fri Feb 4 15:19:44 EST 1994

regarding this missing value question raised thusly ....
  by Thierry Denoeux, Lutz Prechelt, and others

>>>>>>>>>>>>>>>
> So far to my considerations. Now to my questions.
> 
> a) Can you think of other encoding methods that seem reasonable ?  Which ?
>  
> b) Do you have experience with some of these methods that is worth sharing ?
> 
> c) Have you compared any of the alternatives directly ?
> 
>   Lutz
 + 
>   I have not found a simple solution that is general. I think
>  representation in general and the missing information in specific
>  are open problems within connectionist research. I am not sure we will
>  have a magic bullet for all problems. The best approach is to come up
>  with a specific solution for a given problem.

->  Karun
>>>>>>>>>>

This missing value problem is of course shared amongst all the
learning communities, artificial intelligence, statistics, pattern
recognition, etc., not just neural networks.

A classic study in this area, which includes most suggestions
I've read here so far, is
inproceedings{quinlan:ml6,
        AUTHOR = "J.R. Quinlan",
        TITLE = "Unknown Attribute Values in Induction",
        YEAR = 1989,
        BOOKTITLE = "Proceedings of the Sixth International
                        Machine Learning Workshop",
        PUBLISHER = "Morgan Kaufmann",
        ADDRESS = "Cornell, New York"}

The most frequently cited methods I've seen, and they're so common 
amongst the different communities its hard to lay credit:
  1)	 replace missings by their some best guess
  2)     fracture the example into a set of fractional examples
		each with the missing value filled in somehow
  3)     call the missing value another input value

3 is a good thing to do if they are "informative" missing,
i.e.  if someone leaves the entry "telephone number" blank in a 
	questionaire, then maybe they don't have a telephone,
	but probably not good otherwise unless you
	have loads of data and don't mind all the extra
	example types generated (as already mentioned)

1 is a quick and dirty hack at 2.  How good depends on your
application.

2 is an approximation to the "correct" approach for handling
"non-informative" missing values according to the standard
"mixture model".  The mathematics for this is general and applies
to virtually any learning algorithm trees, feed-forward nets,
linear regression, whatever.  We do it for feed-forward nets in
@article{buntine.weigend:bbp,
        AUTHOR = "W.L. Buntine and A.S. Weigend",
        TITLE =  "Bayesian Back-Propagation",
        JOURNAL = "Complex Systems",
        Volume = 5,
        PAGES = "603--643",
        Number = 1,
        YEAR = "1991" }
and see Tresp, Ahmad & Neuneier in NIPS'94 for an implementation.
But no doubt someone probably published the general idea back in
the 50's.

I certainly wouldn't call missing values an open problem.
Rather, "efficient implementations of the standard approaches"
is, in some cases, an open problem.

Wray Buntine
NASA Ames Research Center                 phone:  (415) 604 3389
Mail Stop 269-2                           fax:    (415) 604 3594
Moffett Field, CA, 94035-1000 		  email:  wray at kronos.arc.nasa.gov