Pinker & Prince on Rules & Learning

Wed Aug 24 14:15:27 EDT 1988

On Pinker & Prince on Rules & Learning

Steve: Having read your Cognition paper and twice seen your talk
(latest at cogsci-88), I thought I'd point out what look like some
problems with the argument (as I understand it). In reading my comments,
please bear in mind that I am NOT a connectionist; I am on record as a
sceptic about connectionism's current accomplishments (and how they are
being interpreted and extrapolated) and as an agnostic about its future
possibilities.  (Because I think this issue is of interest to the
connectionist/AI community as a whole, I am branching a copy of this
challenge to connectionists and comp.ai.)

(1) An argument that pattern-associaters (henceforth "nets") cannot do
something in principle cannot be based on the fact that a particular
net (Rumelhart & McClelland 86/87) has not done it in practice.

(2) If the argument is that nets cannot learn past tense forms (from
ecologically valid samples) in principle, then it's the "in principle"
part that seems to be missing. For it certainly seems incorrect that past
tense formation is not learnable in principle. I know of no
poverty-of-the-stimulus argument for past tense formation. On the
contrary, the regularities you describe -- both in the irregulars and
the regulars -- are PRECISELY the kinds of invariances you would
expect a statistical pattern learner that was sensitive to higher
order correlations to be able to learn successfully. In particular, the
form-independent default option for the regulars should be readily
inducible from a representative sample. (This is without even
mentioning that surely no one imagines that past-tense formation is an
independent cognitive module; it is probably learned jointly with
other morphological regularities and irregularities, and there may
well be degrees-of-freedom-reducing cross-talk.)

(3) If the argument is only that nets cannot learn past tense forms without
rules, then the matter is somewhat vaguer and more equivocal, for
there are still ambiguities about what it is to be or represent a "rule."
At the least, there is the issue of "explicit" vs. "implicit"
representation of a rule, and the related Wittgensteinian distinction
between "knowing" a rule and merely being describable as behaving in
accordance with a rule. These are not crisp issues, and hence not a
solid basis for a principled critique. For example, it may well be
that what nets learn in order to form past tenses correctly is
describable as a rule, but not explicitly represented as one (as it
would be in a symbolic program); the rule may simple operate as a causal
I/O constraint. Ultimately, even conditional branching in a symbolic
program is implemented as a causal constraint; "if/then" is really
just an interpretation we can make of the software. The possibility of
making such systematic, decomposable semantic intrepretations is, of course,
precisely what distinguishes the symbolic approach from the
connectionistic one (as Fodor/Pylyshyn argue). But at the level of a few
individual "rules," it is not clear that the higher-order interpretation AS
a formal rule, and all of its connotations, is justified. In any case, the
important distinction is that the net's "rules" are LEARNED from statistical
regularities in the data, rather than BUILT IN (as they are,
coincidentally, in both symbolic AI and poverty-of-the-stimulus-governed
linguistics). [The intermediate case of formally INFERRED rules does
not seem to be at issue here.]

So here are some questions:

(a) Do you believe that English past tense formation is NOT learnable
(except as "parameter settings" on an innate structure, from
impoverished data)? If so, what are the supporting arguments for that?

(b) If past tense formation IS learnable in the usual sense (i.e.,
by trial-and-error induction of regularities from the data sample), then do
you believe that it is specifically unlearnable by nets? If so, what
are the supporting arguments for that?

(c) If past tense formation IS learnable by nets, but only if the
invariance that the net learns and that comes to causally constrain its
successful performance is describable as a "rule," what's wrong with that?

Looking forward to your commentary on Lightfoot, where
poverty-of-the-stimulus IS the explicit issue, -- best wishes, Stevan Harnad