PhD Dissertation available

Mon Jun 7 10:41:00 EDT 1993

===================================================================
       As I had to disapoint many people because I run out of
       copies in the first batch, a high-quality reprint has
       been made from.......................................

                 ........REPRINT........

                Ph.D. DISSERTATION AVAILABLE

                           on

Neural Networks, Natural Language Processing, Information Retrieval

                292 pages and over 350 references

===================================================================

A Copy of the dissertation "Neural Networks in Natural Language Processing
and Information Retrieval" by Johannes C. Scholtes can be obtained for
cost price and fast airmail- delivery at US$ 25,-.

Payment by Major Creditcards (VISA, AMEX, MC, Diners) is accepted and
encouraged. Please include Name on Card, Number and Exp. Date. Your Credit
card will be charged for Dfl. 47,50.

Within Europe one can also send a Euro-Cheque for Dfl. 47,50 to:

(include 4 or 5 digit number on back of cheque!)

    University of Amsterdam
    J.C. Scholtes
    Dufaystraat 1
    1075 GR Amsterdam
    The Netherlands
    scholtes at alf.let.uva.nl

Do not forget to mention a surface shipping address. Please allow 2-4
weeks for delivery.

                            Abstract

1.0  Machine Intelligence

For over fifty years the two main directions in machine intelligence (MI),
neural networks (NN) and artificial intelligence (AI), have been studied
by various persons with many dfferent backgrounds. NN and AI seemed
to conflict with many of the traditional sciences as well as with each other.
The lack of a long research history and well defined foundations
has always been an obstacle for the general acceptance of machine
intelligence by other fields.

At the same time, traditional schools of science such as mathematics and
physics developed their own tradition of new or "intelligent" algorithms.
Progress made in the field of statistical reestimation techniques such as the
Hidden Markov Models (HMM) started a new phase in speech recognition.
Another application of the progress of mathematics can be found in the
application of the Kalman filter in the interpretation of sonar and radar
signals. Much more examples of such "intelligent" algorithms can be found in
the statistical classification en filtering techniques of the study of
pattern recognition (PR).

Here, the field of neural networks is studied with that of pattern
recognition in mind. Although only global qualitative comparisons are made,
the importance of the relation between them is not to be underestimated. In
addition it is argued that neural networks do indeed add something to the
fields of MI and PR, instead of competing or conflicting with them.

2.0  Natural Language Processing

The study of natural language processing (NLP) exists even longer than that
of MI. Already in the beginning of this century people tried to analyse
human language with machines. However, serious efforts had to wait until
the development of the digital computer in the 1940s, and even then,
the possibilities were limited. For over 40 years, symbolic AI has been the
most important approach in the study of NLP. That this has not always
been the case, may be concluded from the early work on NLP by Harris. As a
matter of fact, Chomsky's Syntactic Structures was an attack on the lack of
structural proper-ties in the mathematical methods used in those days. But,
as the latter's work remained the standard in NLP, the former has been
forgotten completely until recently. As the scientific community in NLP
devoted all its attention to the symbolic AI-like theories, the only use-
ful practical implementation of NLP systems were those that were based on
statistics rather than on linguistics. As a result, more and more scientists
are redirecting their attention towards the statistical techniques a
vailable in NLP. The field of connectionist NLP can be considered as a
special case of these mathematical methods in NLP.

More than one reason can be given to explain this turn in approach. On the
one hand, many problems in NLP have never been addressed properly by
symbolic AI. Some examples are robust behavior in noisy environments,
disambiguation driven by different kinds of knowledge, commensense
generalizations, and learning (or training) abilities. On the other hand,
mathematical methods have become much stronger and more sensitive to spe-
cific properties of language such as hierarchical structures.

Last but not least, the relatively high degree of success of mathematical
techniques in commercial NLP systems might have set the trend towards the
implementation of simple, but straightforward algorithms.

In this study, the implementation of hierarchical structures and semantical
features in mathematical objects such as vectors and matrices is given much
attention. These vectors can then be used in models such as neural networks,
but also in sequential statistical procedures implementing similar
characteristics.

3.0  Information Retrieval

The study of information retrieval (IR) was traditionally related to
libraries on the one hand and military applications on the other. However,
as PC's grew more popular, most common users loose track of the data they
produced over the last couple of years. This, together with the introduction
of various "small platform" computer programs made the field of IR relevant
to ordinary users.

However, most of these systems still use techniques that have been developed
over thirty years ago and that implement nothing more than a global
surface analysis of the textual (layout) properties. No deep structure
whatsoever, is incorporated in the decision whether or not to retrieve a
text.

There is one large dilemma in IR research. On the one hand, the data
collections are so incredibly large, that any method other than a global
surface analysis would fail. On the other hand, such a global analysis could
never implement a contextually sensitive method to restrict the number of
possible candidates returned by the retrieval system. As a result, all
methods that use some linguistic knowledge exist only in laboratories and
not in the real world. Conversely, all methods that are used in the real
world are based on technological achievements from twenty to thirty
years ago.

Therefore, the field of information retrieval would be greatly indebted
to a method that could incorporate more context without slowing down. As
computers are only capable of processing numbers within reasonable time
limits, such a method should be based on vectors of numbers rather than
on symbol manipulations. This is exactly where the challenge is: on the
one hand keep up the speed, and on the other hand incorporate more context.
If possible, the data representation of the contextual information must not
be restricted to a single type of media. It should be possible to
incorporate symbolic language as well as sound, pictures and video
concurrently in the retrieval phase, although one does not know exactly
how yet...

Here, the emphasis is more on real-time filtering of large amounts of
dynamic data than on document retrieval from large (static) data bases.
By incorporating more contextual information, it should be possible to
implement a model that can process large amounts of unstructured text
without providing the end-user with an overkill of information.

4.0  The Combination

As this study is a very multi-disciplinary one, the risk exists that it
remains restricted to a surface discussion of many different problems
without analyzing one in depth. To avoid this, some central themes,
applications and tools are chosen. The themes in this work are self-
organization, distributed data representations and context. The
applications are NLP and IR, the tools are (variants of) Kohonen feature
maps, a well known model from neural network research.

Self-organization and context are more related to each other than one may
suspect. First, without the proper natural context, self-organization shall
not be possible. Next, self-organization enables one to discover contextual
relations that were not known before.

Distributed data representation may solve many of the unsolved problems in
NLP and IR by introducing a powerful and efficient knowledge integration
and generalization tool. However, distributed data representation and
self-organization trigger new problems that should be solved in an
elegant manner.

Both NLP and IR work on symbolic language. Both have properties in common
but both focus on different features of language. In NLP hierarchical
structures and semantical features are important. In IR the amount of
data sets the limitations of the methods used. However, as computers grow
more powerful and the data sets get larger and larger, both approaches
get more and more common ground. By using the same models on both
applications, a better understanding of both may be obtained.

Both neural networks and statistics would be able to implement
self-organization, distributed data and context in the same manner.
In this thesis, the emphasis is on Kohonen feature maps rather than on
statistics. However, it may be possible to implement many of the
techniques used with regular sequential mathematical algorithms.

So, the true aim of this work can be formulated as the understanding of
self-organization, distributed data representation, and context in NLP and
IR, by in depth analysis of Kohonen feature maps.

==============================================================================