Paper: High Performance Named-Entity Extraction
sbaluja@lycos.com
sbaluja at lycos.com
Tue Jun 29 18:24:10 EDT 1999
Paper:
Applying Machine Learning for High Performance
Named-Entity Extraction
Authors:
Shumeet Baluja, Vibhu Mittal, Rahul Sukthankar
Available from:
http://www.cs.cmu.edu/~baluja
Abstract:
This paper describes a machine learning approach to build an
efficient, accurate and fast name spotting system. Finding
names in free text is an important task in addressing
real-world text based applications. Most previous approaches
have been based on carefully hand-crafted modules encoding
linguistic knowledge specific to the language and document
genre. Such approaches have two drawbacks: they require
large amounts of time and linguistic expertise to develop,
and they are not easily portable to new languages and
genres. This paper describes an extensible system which
automatically combines weak evidence for name
extraction. This evidence is gathered from easily available
sources: part-of-speech tagging, dictionary lookups, and
textual information such as capitalization and
punctuation. Individually, each piece of evidence is
insufficient for robust name detection. However, the
combination of evidence, through standard machine learning
techniques, yields a system that achieves performance
equivalent to the best existing hand-crafted approaches.
Contact:
sbaluja at lycos.com, mittal at jprc.com, rahuls at jprc.com
Questions and comments are welcome!
More information about the Connectionists
mailing list