Encoding missing values
Wray Buntine
wray at ptolemy-ethernet.arc.nasa.gov
Fri Feb 4 15:19:44 EST 1994
regarding this missing value question raised thusly ....
by Thierry Denoeux, Lutz Prechelt, and others
>>>>>>>>>>>>>>>
> So far to my considerations. Now to my questions.
>
> a) Can you think of other encoding methods that seem reasonable ? Which ?
>
> b) Do you have experience with some of these methods that is worth sharing ?
>
> c) Have you compared any of the alternatives directly ?
>
> Lutz
+
> I have not found a simple solution that is general. I think
> representation in general and the missing information in specific
> are open problems within connectionist research. I am not sure we will
> have a magic bullet for all problems. The best approach is to come up
> with a specific solution for a given problem.
-> Karun
>>>>>>>>>>
This missing value problem is of course shared amongst all the
learning communities, artificial intelligence, statistics, pattern
recognition, etc., not just neural networks.
A classic study in this area, which includes most suggestions
I've read here so far, is
inproceedings{quinlan:ml6,
AUTHOR = "J.R. Quinlan",
TITLE = "Unknown Attribute Values in Induction",
YEAR = 1989,
BOOKTITLE = "Proceedings of the Sixth International
Machine Learning Workshop",
PUBLISHER = "Morgan Kaufmann",
ADDRESS = "Cornell, New York"}
The most frequently cited methods I've seen, and they're so common
amongst the different communities its hard to lay credit:
1) replace missings by their some best guess
2) fracture the example into a set of fractional examples
each with the missing value filled in somehow
3) call the missing value another input value
3 is a good thing to do if they are "informative" missing,
i.e. if someone leaves the entry "telephone number" blank in a
questionaire, then maybe they don't have a telephone,
but probably not good otherwise unless you
have loads of data and don't mind all the extra
example types generated (as already mentioned)
1 is a quick and dirty hack at 2. How good depends on your
application.
2 is an approximation to the "correct" approach for handling
"non-informative" missing values according to the standard
"mixture model". The mathematics for this is general and applies
to virtually any learning algorithm trees, feed-forward nets,
linear regression, whatever. We do it for feed-forward nets in
@article{buntine.weigend:bbp,
AUTHOR = "W.L. Buntine and A.S. Weigend",
TITLE = "Bayesian Back-Propagation",
JOURNAL = "Complex Systems",
Volume = 5,
PAGES = "603--643",
Number = 1,
YEAR = "1991" }
and see Tresp, Ahmad & Neuneier in NIPS'94 for an implementation.
But no doubt someone probably published the general idea back in
the 50's.
I certainly wouldn't call missing values an open problem.
Rather, "efficient implementations of the standard approaches"
is, in some cases, an open problem.
Wray Buntine
NASA Ames Research Center phone: (415) 604 3389
Mail Stop 269-2 fax: (415) 604 3594
Moffett Field, CA, 94035-1000 email: wray at kronos.arc.nasa.gov
More information about the Connectionists
mailing list