Connectionists: Stephen Hanson in conversation with Geoff Hinton
Dietterich, Thomas
tgd at oregonstate.edu
Mon Jul 18 09:49:31 EDT 2022
This depends crucially on the vocabulary (representation). If I look at Fourier components, I can generalize in one way; if I treat each input vector as unique (as in a lookup table), I can't generalize at all. People are able to represent inputs in a wide variety of ways, as shown by their performance on Bongard problems, for example. These can involve relationships over relationships, and other recursive structures. Representation learning still has a ways to go.
--Tom
Thomas G. Dietterich, Distinguished Professor Voice: 541-737-5559
School of Electrical Engineering FAX: 541-737-1300
and Computer Science URL: eecs.oregonstate.edu/~tgd
US Mail: 1148 Kelley Engineering Center
Office: 2067 Kelley Engineering Center
Oregon State Univ., Corvallis, OR 97331-5501
-----Original Message-----
From: Connectionists <connectionists-bounces at mailman.srv.cs.cmu.edu> On Behalf Of Gary Marcus
Sent: Monday, July 18, 2022 06:00
To: Barak A. Pearlmutter <barak at pearlmutter.net>
Cc: Gary Cottrell <gary at ucsd.edu>; AIhub <aihuborg at gmail.com>; connectionists at mailman.srv.cs.cmu.edu
Subject: Re: Connectionists: Stephen Hanson in conversation with Geoff Hinton
[This email originated from outside of OSU. Use caution with links and attachments.]
identity is just as “easy”. but as I showed long ago (1998 and 2001), what is learned is specific to a space of training examples. there is interpolation in that space, but no reliable extrapolation outside that space. eg if you arrange the problem as training over even numbers represented in binary digits in a standard multi-layer perceptron, the system will not generalize properly to odd numbers.
nowadays Bengio and others call this the problem of distribution shift, and you would get the same sort of thing with parity (with a slightly different example), because what is learned and described as “parity” is fairly superficial, example-based rather than fully abstract.
gary
> On Jul 18, 2022, at 05:46, Barak A. Pearlmutter <barak at pearlmutter.net> wrote:
>
> On Mon, 18 Jul 2022 at 08:28, Danko Nikolic <danko.nikolic at gmail.com> wrote:
>> In short, learning mechanisms cannot discover generalized XOR functions with simple connectivity -- only with complex connectivity. This problem results in exponential growth of needed resources as the number of bits in the generalized XOR increases.
>
> Assuming that "generalized XOR" means parity, this must rely on some
> unusual definitions which you should probably state in order to avoid
> confusion.
>
> Parity is a poster boy for an *easy* function to learn, albeit a
> nonlinear one. This is because in the (boolean) Fourier domain its
> spectrum consists of a single nonzero coefficient, and functions that
> are sparse in that domain are very easy to learn. See N. Linial, Y.
> Mansour, and N. Nisan, "Constant depth circuits, Fourier Transform and
> learnability", FOCS 1989, or Mansour, Y. (1994). Learning Boolean
> Functions via the Fourier Transform. Theoretical Advances in Neural
> Computation and Learning, 391–424. doi:10.1007/978-1-4615-2696-4_11
>
> --Barak Pearlmutter
>
More information about the Connectionists
mailing list