PhD Thesis available

Menno van Zaanen mvzaanen at science.uva.nl
Wed May 8 03:44:49 EDT 2002


Dear Connectionists,

My PhD thesis, which I hope might be of any interest to some of you,
has been made available:

Bootstrapping Structure into Language:
Alignment-Based Learning

Menno M. van Zaanen
School of Computing
University of Leeds
Leeds, UK

It can be found at:
http://www.science.uva.nl/~mvzaanen/docs/t_leeds.ps
http://www.science.uva.nl/~mvzaanen/docs/t_leeds.ps.gz
or via my homepage:
http://www.science.uva.nl/~mvzaanen/

Abstract:
This thesis introduces a new unsupervised learning framework, called
Alignment-Based Learning, which is based on the alignment of sentences
and Harris's (1951) notion of substitutability.  Instances of the
framework can be applied to an untagged, unstructured corpus of
natural language sentences, resulting in a labelled, bracketed version
of that corpus.

Firstly, the framework aligns all sentences in the corpus in pairs,
resulting in a partition of the sentences consisting of parts of the
sentences that are equal in both sentences and parts that are unequal.
Unequal parts of sentences can be seen as being substitutable for each
other, since substituting one unequal part for the other results in
another valid sentence.  The unequal parts of the sentences are thus
considered to be possible (possibly overlapping) constituents, called
hypotheses.

Secondly, the selection learning phase considers all hypotheses found
by the alignment learning phase and selects the best of these.  The
hypotheses are selected based on the order in which they were found,
or based on a probabilistic function.

The framework can be extended with a grammar extraction phase.  This
extended framework is called parseABL.  Instead of returning a
structured version of the unstructured input corpus, like the ABL
system, this system also returns a stochastic context-free or tree
substitution grammar.

Different instances of the framework have been tested on the English
ATIS corpus, the Dutch OVIS corpus and the Wall Street Journal corpus.
One of the interesting results, apart from the encouraging numerical
results, is that all instances can (and do) learn recursive
structures.
 


Best regards,

Menno van Zaanen

+-------------------------------------+
| Menno van Zaanen                    | "The more it stays the same,
| mvzaanen at science.uva.nl             |  the less it changes."
| http://www.science.uva.nl/~mvzaanen |                           -Spinal Tap









More information about the Connectionists mailing list