paper on distributed semantic representations
Hinrich Schuetze
schuetze at csli.stanford.edu
Sun Jan 24 14:16:02 EST 1993
The following paper is now available in the connectionist archive,
archive.cis.ohio-state.edu (128.146.8.52), in pub/neuroprose under the
name: schuetze.wordspace.ps.Z
WORD SPACE
Hinrich Schuetze
CSLI, Stanford University
ABSTRACT
This paper describes an efficient, corpus-based method for inducing
distributed semantic representations for a large number of words
(50,000) from lexical cooccurrence statistics. Each word is
represented by a 97-dimensional vector that is computed by means of a
singular-value decomposition of a 5000-by-5000 matrix recording
cooccurrence in a large text corpus (The New York Times). The
representations are successfully applied to word sense disambiguation
using a nearest neighbor method.
to appear in:
S.~J. Hanson, J.~D. Cowan, and C.~L. Giles (Eds.), {\em Advances in
Neural Information Processing Systems 5}. San Mateo CA: Morgan
Kaufmann.
author's address:
Hinrich Schuetze
CSLI, Ventura Hall
Stanford, CA 94305-4115
schuetze at csli.stanford.edu
More information about the Connectionists
mailing list