Connectionists: Stephen Hanson in conversation with Geoff Hinton
    Gary Marcus 
    gary.marcus at nyu.edu
       
    Mon Jul 18 10:18:51 EDT 2022
    
    
  
completely agree on all points
> On Jul 18, 2022, at 6:49 AM, Dietterich, Thomas <tgd at oregonstate.edu> wrote:
> 
> This depends crucially on the vocabulary (representation). If I look at Fourier components, I can generalize in one way; if I treat each input vector as unique (as in a lookup table), I can't generalize at all. People are able to represent inputs in a wide variety of ways, as shown by their performance on Bongard problems, for example. These can involve relationships over relationships, and other recursive structures. Representation learning still has a ways to go. 
> 
> --Tom
> 
> Thomas G. Dietterich, Distinguished Professor Voice: 541-737-5559
> School of Electrical Engineering              FAX: 541-737-1300
>  and Computer Science                        URL: eecs.oregonstate.edu/~tgd
> US Mail: 1148 Kelley Engineering Center 
> Office: 2067 Kelley Engineering Center
> Oregon State Univ., Corvallis, OR 97331-5501
> 
> -----Original Message-----
> From: Connectionists <connectionists-bounces at mailman.srv.cs.cmu.edu> On Behalf Of Gary Marcus
> Sent: Monday, July 18, 2022 06:00
> To: Barak A. Pearlmutter <barak at pearlmutter.net>
> Cc: Gary Cottrell <gary at ucsd.edu>; AIhub <aihuborg at gmail.com>; connectionists at mailman.srv.cs.cmu.edu
> Subject: Re: Connectionists: Stephen Hanson in conversation with Geoff Hinton
> 
> [This email originated from outside of OSU. Use caution with links and attachments.]
> 
> identity is just as “easy”. but as I showed long ago (1998 and 2001), what is learned is specific to a space of training examples. there is interpolation in that space, but no reliable extrapolation outside that space. eg if you arrange the problem as training over even numbers represented in binary digits in a standard multi-layer perceptron, the system will not generalize properly to odd numbers.
> 
> nowadays Bengio and others call this the problem of distribution shift, and you would get the same sort of thing with parity (with a slightly different example), because what is learned and described as “parity” is fairly superficial, example-based rather than fully abstract.
> 
> gary
> 
>>> On Jul 18, 2022, at 05:46, Barak A. Pearlmutter <barak at pearlmutter.net> wrote:
>>> 
>>> On Mon, 18 Jul 2022 at 08:28, Danko Nikolic <danko.nikolic at gmail.com> wrote:
>>> In short, learning mechanisms cannot discover generalized XOR functions with simple connectivity -- only with complex connectivity. This problem results in exponential growth of needed resources as the number of bits in the generalized XOR increases.
>> 
>> Assuming that "generalized XOR" means parity, this must rely on some
>> unusual definitions which you should probably state in order to avoid
>> confusion.
>> 
>> Parity is a poster boy for an *easy* function to learn, albeit a
>> nonlinear one. This is because in the (boolean) Fourier domain its
>> spectrum consists of a single nonzero coefficient, and functions that
>> are sparse in that domain are very easy to learn. See N. Linial, Y.
>> Mansour, and N. Nisan, "Constant depth circuits, Fourier Transform and
>> learnability", FOCS 1989, or Mansour, Y. (1994). Learning Boolean
>> Functions via the Fourier Transform. Theoretical Advances in Neural
>> Computation and Learning, 391–424. doi:10.1007/978-1-4615-2696-4_11
>> 
>> --Barak Pearlmutter
>> 
> 
    
    
More information about the Connectionists
mailing list