Connectionist symbol processing: any progress?

Sun Aug 16 04:30:59 EDT 1998

Work has been progressing on higher-level connectionist
processing, but progress has not been blindingly fast.  As
others have noted, it is a difficult area.

One of things that has recently renewed my interest in the
idea of using distributed representations for processing
complex information was finding out about Latent Semantic
Analysis/Indexing (LSA/LSI) at NIPS*97.  LSA is a method
for taking a large corpus of text and constructing vector
representations for words in such a way that similar words
are represented by similar vectors.  LSA works by
representing a word by its context (harkenning back to a
comment I recently saw attributed to Firth 1957: "You shall
know a word by the company it keeps" :-), and then reducing
the dimensionality of the context using singular value
decomposition (SVD) (v. closely related to principal
component analysis (PCA)).  The vectors constructed by LSA
can be of any size, but it seems that moderately high
dimensions work best: 100 to 300 elements.  It turns out
that one can do all sorts of surprising things with these
vectors.  One can construct vectors which represent
documents and queries by merely summing the vectors for
their words and do information retrieval, automatically
getting around the problem of synonyms (since synonyms tend
to have similar vectors).  One can do the same thing with
questions and multiple choice answers and pass exams (e.g.,
first year psychology exams, TOEFL tests).  And all this
just treating texts as unordered bags of words.  While these
results are intriguing, they don't achieve the goal of
complex connectionist reasoning.  However, they could
provide an excellent source of representations for use in a
more complex connectionist system (using connectionist in a
very broad sense here).  LSA is fast enough that it can be
used on 10s of thousands of documents to derive vectors for
thousands of words.  This is exiting because it could allow
one to start building connectionist systems which deal with
full-range vocabularies and large varied task sets (as in
info. retrieval and related tasks), and which do more
interesting processing than just forming the bag-of-words
content of a document a la vanilla-LSA.

As Ross Gayler mentioned,  analogy processing is a very
promising area for application of connectionist ideas.
There are a few reasons for this being interesting: people
do it all the time, structural relationships are important
to the task, no explicit variables need be involved, and
rule-based reasoning can be seen as a very specialized
version of the task.  One very interesting model of
analogical processing that was presented at the workshop in
Bulgaria (in July) was John Hummel and Kieth Holyoak's LISA
model (ref at end).  This model uses distributed
representations for roles and fillers, binding them together
with temporal synchrony, and achieves quite impressive
results (John, in case you're listening, this is not to say
that I think temporal binding is the right way to go, but
it's an impressive model and presents a good challenge to
other approaches.)

I have to disagree with two of the technical comments made
in this discussion:

Jerry Feldman wrote:
    "Parallel Distributed Processing is a contradiction in
     terms. To the extent that representing a concept involves
     all of the units in a system, only one concept can be
     active at a time."

One can easily represent more than one concept at a time in
distributed representations.  One of their beauties is the
soft limit on the number of concepts that can be represented
at once.  This limit depends on the dimensionality of the
system, the redundancy in representations, the similarity
structure of the concepts, and so forth.  All of the units
in the system might be involved in representing a concept,
but redundancy makes none essential.  And of course one can
also have different modules within a system.  But, my point
is that even within a single PDP module, one can still
represent (and process) multiple concepts at once.

Mitsu Hadeishi wrote:
    "an input space which consists of a fixed number of
     dimensions cannot handle recursive combinations"

A number of people, including myself, have shown that it is
possible to represent arbitrarily nested concepts in space
with a fixed number of dimensions.  Furthermore, the
resulting representations have interesting and useful
properties not shared by their symbolic counterparts.  Very
briefly, the way one can do this is by using vector-space
operations for addition and multiplication to implement the
conceptual operations of forming collections and binding
concepts, respectively.  For example, one can build a
distributed representation for a shape configuration#33 of
"circle above triangle" as:

  config33 = vertical + circle + triangle
             + ontop*circle + below*triangle

By using an appropriate multiplication operation (I used
circular, or wrapped, convolution), the reduced
representation of the compositional concept (e.g., config33)
has the same dimension as its components, and can readily be
used as a component in other higher-level relations.  Quite
a few people have devised schemes for this type of
representation, e.g., Paul Smolensky (Tensor Products),
Jordan Pollack (RAAMs), Allesandro Sperduti (LRAAMs), Pentti
Kanerva (Binary Spatter Codes).  Another related scheme that
uses distributed representations and tensor product bindings
(but not role-filler bindings) is Halford, Wilson and
Philips STAR model.

Some of the useful properties that of these types of
distributed representations are as follows:
(a) The reduced, distributed representation (e.g., config33)
    functions like a pointer, but is more that a mere pointer
    in that information about it contents is available directly
    without having to "follow" the "pointer."  This makes
    it possible to do some types processing without having to
    unpack the structures.
(b) The vector-space similarity of representations (i.e., the
    dot-product) reflects both superficial and structural
    similarity of structures.
(c) There are fast, approximate, vector-space techniques for
    doing "structural" computations like finding corresponding
    objects in two analogies, or doing structural transformations.

Some references:

(Lots of LSA-related papers at:
http://lsa.colorado.edu/
http://superbook.bellcore.com/~std/LSI.papers.html )

@article{deerwester-dumais-landauer-furnas-harshman-90,
  author =       "S. Deerwester and S. T. Dumais and T. K. Landauer and G. W.
                  Furnas and R. A. Harshman",
  year =         "1990",
  title =        "Indexing by latent semantic analysis",
  journal =      "Journal of the Society for Information Science",
  volume =       "41",
  number =       "6",
  pages =        "391-407",
  annote =       "first technical LSI paper; good background."
}

@inproceedings{landauer-laham-foltz-98,
  author =       "T. K. Landauer and D. Laham and P. W. Foltz",
  title =        "Learning Human-like Knowledge with Singular Value
                  Decomposition: A Progress Report",
  booktitle =    "Neural Information Processing Systems (NIPS*97)",
  year =         "1998"
}

@article{landauer-dumais-97,
  author =       "T. K. Landauer and S. T. Dumais",
  year =         "1997",
  title =        "Solution to Plato's Problem: The Latent Semantic Analysis
                  Theory of Acquisition, Induction and Representation of
                  Knowledge",
  journal =      "Psychological Review",
  pages =        "211-240",
  volume =       "104",
  number =       "2"
}

@inproceedings{bartell-cottrell-belew-92,
  author =       "B.T. Bartell and G.W. Cottrell and R.K. Belew",
  year =         "1992",
  title =        "{Latent Semantic Indexing} is an optimal special case of
                  multidimensional scaling",
  booktitle =      "Proc SIGIR-92",
  publisher =    "ACM Press",
  address =      "New York"
}

@article{hummel-holyoak-97,
  author =       "J. E. Hummel and K. J. Holyoak",
  title =        "Distributed representations of structure: {A} theory of
                  analogical access and mapping",
  journal =      "Psychological Review",
  year =         1997,
  volume =       104,
  number =       3,
  pages =        "427--466",
  annote =       "LISA paper"
}

@inproceedings{kanerva-96,
  author =       "P. Kanerva",
  year =         1996,
  title =        "Binary spatter-coding of ordered K-tuples",
  volume =       1112,
  pages =        "869-873",
  publisher =    "Springer",
  editor =       "C. von der Malsburg and W. von Seelen and
                  J.C. Vorbruggen and B. Sendhoff",
  booktitle =    "Artificial Neural Networks--ICANN Proceedings",
  series =       "Lecture Notes in Computer Science",
  address =      "Berlin",
  keywords =     "HRRs, distributed representations"
}

@unpublished{halford-wilson-phillips-bbs98,
  author =       "Halford, Graeme and Wilson, William H. and Phillips, Steven",
  title =        "Processing Capacity Defined by Relational
                  Complexity: Implications for Comparative,
                  Developmental, and Cognitive Psychology",
  note =         "Behavioral and Brain Sciences",
  year =         "to appear"
}

@InBook{plate-97c,
  author =       "Tony A. Plate",
  chapter =      "A Common Framework for Distributed
                  Representation Schemes for Compositional
                  Structure",
  title =        "Connectionist Systems for Knowledge
                  Representation and Deduction",
  publisher =    "Queensland University of Technology",
  year =         "1997",
  editor =       "Fr\'ed\'eric Maire and Ross Hayward and
                  Joachim Diederich",
  pages =        "15-34"
}

@incollection{plate-98,
  author =       "Tony Plate",
  title =        "Analogy retrieval and processing with distributed
                  represenations",
  year =         "1998",
  booktitle =    "Advances in Analogy Research: Integration of Theory and
                  Data from the Cognitive, Computational, and Neural
                  Sciences",
  pages =        "154--163",
  editor =       "Keith Holyoak and Dedre Gentner and Boicho Kokinov",
  publisher =    "NBU Series in Cognitive Science, New Bugarian University, Sofia."
}

Tony Plate, Computer Science                   Voice: +64-4-495-5233 ext 8578
School of Mathematical and Computing Sciences             Fax: +64-4-495-5232
Victoria University, PO Box 600, Wellington, New Zealand    tap at mcs.vuw.ac.nz
http://www.mcs.vuw.ac.nz/~tap