paper available: fast dictionary search

Lucas, Simon M sml at essex.ac.uk
Wed Oct 18 09:07:57 EDT 2000


 Dear All,

 The following paper on graph-based dictionary
 search (due to appear in Pattern Recognition
 Letters) is available at:

  http://algoval.essex.ac.uk/papers/dictionary/prl.ps

 As far as I am aware, the method described is 
 unique in that the retrieval speed is independent
 of the size (number of entries) in the dictionary.

 The core of the method is based on a lazy matrix
 parser for probabilistic context-free grammars,
 which are a type of probabilistic
 graphical model.  Therefore, it may be interesting
 to explore other possible uses of the method in that
 area.

 As always, comments are welcome.

 Best regards,

   Simon Lucas

----------------------------------------------------
 Title:  Efficient graph-based dictionary search
         and its application to text-image searching

 Keywords: dictionary search, text image indexing,
           graph search

 Abstract
 ---------
  This paper describes a novel method for applying
  dictionary knowledge to optimally interpret the
  confidence-rated hypothesis sets produced 
  by lower-level pattern classifiers. 

  This problem arises whenever image or video
  databases need to be scanned for textual content,
  and where some of the text strings are expected
  to be strings from a dictionary.  The method
  is especially appropriate for large dictionaries,
  as might occur in vehicle registration number
  recognition for example.

  The problem is cast as enumerating the paths in
  a graph in best-first order given the constraint
  that each complete path is a word in
  some specified dictionary.  The solution
  described here is of particular interest
  due to its generality, flexibility and
  because the time to retrieve each path is
  independent of the size of the dictionary.

  Synthetic results are presented for searching
  dictionaries of up to 1 million UK postcodes
  given graphs that correspond to insertion,
  deletion and substitution errors.
  We also present initial results from processing
  real noisy text images.

--------------------------------------------------
Dr. Simon Lucas
Senior Lecturer and MSc E-commerce Director 
Department of Computer Science
University of Essex
Colchester CO4 3SQ
United Kingdom
Email:  sml at essex.ac.uk
http://cswww.essex.ac.uk
--------------------------------------------------






More information about the Connectionists mailing list