Paper: High Performance Named-Entity Extraction

Tue Jun 29 18:24:10 EDT 1999

Paper:
  Applying Machine Learning for High Performance
  Named-Entity Extraction

Authors:
  Shumeet Baluja, Vibhu Mittal, Rahul Sukthankar

Available from:
  http://www.cs.cmu.edu/~baluja

Abstract:
  This paper describes a machine learning approach to build an
  efficient, accurate and fast name spotting system. Finding
  names in free text is an important task in addressing
  real-world text based applications. Most previous approaches
  have been based on carefully hand-crafted modules encoding
  linguistic knowledge specific to the language and document
  genre. Such approaches have two drawbacks: they require
  large amounts of time and linguistic expertise to develop,
  and they are not easily portable to new languages and
  genres. This paper describes an extensible system which
  automatically combines weak evidence for name
  extraction. This evidence is gathered from easily available
  sources: part-of-speech tagging, dictionary lookups, and
  textual information such as capitalization and
  punctuation. Individually, each piece of evidence is
  insufficient for robust name detection. However, the
  combination of evidence, through standard machine learning
  techniques, yields a system that achieves performance
  equivalent to the best existing hand-crafted approaches.

Contact:
  sbaluja at lycos.com, mittal at jprc.com, rahuls at jprc.com

Questions and comments are welcome!