paper on distributed semantic representations

Hinrich Schuetze schuetze at csli.stanford.edu
Sun Jan 24 14:16:02 EST 1993


The following paper is now available in the connectionist archive,
archive.cis.ohio-state.edu (128.146.8.52), in pub/neuroprose under the
name:   schuetze.wordspace.ps.Z


                              WORD SPACE

                           Hinrich Schuetze
                      CSLI, Stanford University


                               ABSTRACT
This paper describes an efficient, corpus-based method for inducing
distributed semantic representations for a large number of words
(50,000) from lexical cooccurrence statistics.  Each word is
represented by a 97-dimensional vector that is computed by means of a
singular-value decomposition of a 5000-by-5000 matrix recording
cooccurrence in a large text corpus (The New York Times).  The
representations are successfully applied to word sense disambiguation
using a nearest neighbor method.


to appear in:

S.~J. Hanson, J.~D. Cowan, and C.~L. Giles (Eds.), {\em Advances in
Neural Information Processing Systems 5}. San Mateo CA: Morgan
Kaufmann.


author's address:

Hinrich Schuetze
CSLI, Ventura Hall
Stanford, CA 94305-4115
schuetze at csli.stanford.edu


More information about the Connectionists mailing list