Connectionists: weight guessing quickly solves n-bit parity

Thu Jul 21 05:07:22 EDT 2022

Dear Juergen,

Symbols do not, by themselves, represent any utilizable knowledge, they cannot be used for a definition of similarity criteria between themselves. The use of symbols in algorithms which imitate human intelligent behavior led to the famous physical symbol system hypothesis  \index{physical symbol system hypothesis} by Newell and Simon (1976)   ``The necessary and sufficient condition for a physical system to exhibit intelligence is that it be a physical symbol system.''  Symbols  are not present in the world; they are the constructs of a human mind and simplify the process of representation used in communication and problem solving.  

A Turing machine can simulate any algorithm, the same with RNN, but they do not explain the human problem solving.

There is a difference, for example human problem solving can be described by production systems. The most successful model is the SOAR architecture. Such systems were very successful, they learn and can give you an explanation for their doing.

in August 1998 Dave Touretzky asked on the connectionistic e-mailing list:
``Is connectionist symbol processing dead?’'

From ml-connectionists-request$@$mlist-1.sp.cs.cmu.edu  Tue Aug 11 17:35:10 1998
From: Dave$\_$Touretzky$@$cs.cmu.edu
To: connectionists$@$cs.cmu.edu
Subject: Connectionist symbol processing: any progress?
Date: Tue, 11 Aug 1998 $03:34:27$ -0400

I'd like to start a debate on the current state of connectionist symbol
processing?  Is it dead?  Or does progress continue?  ...  People had gotten
some interesting effects with localist networks, by doing spreading activation
and a simple form of constraint satisfaction....  This approach does not create
new structure on the fly, or deal with structured representations or variable
binding.  Those localist networks that did attempt to implement variable binding
did so in a discrete, symbolic way that did not advance the parallel constraint
satisfaction/heuristic reasoning agenda of earlier spreading activation
research.  ...  So I concluded that connectionist symbol processing had reached
a plateau, and further progress would have to await some revolutionary new
insight about representations.  ...  The problems of structured representations
and variable binding have remained unsolved.  No one is trying to build
distributed connectionist reasoning systems any more, like the connectionist
production system I built with Geoff Hinton...

24 years past and not much progress was done. It seems that the progress is only related to pure brute force of computers, but not much insight beside a wishful thinking. The whole DL movements stops the progress in understanding how the brain works. We need some new fresh ideas beside error minimization.

I think one of the main problems is the publish or perish altitude, the famous Google impact factor. One does not care what some one is doing, one just checks his Google impact factor.. This is like the saying, eat more shit, one million flies cannot be wrong. 
Like some mathematician said, computer science is not science at all, but it force us to follow its ideas.

Andreas

--------------------------------------------------------------------------------------------------
Prof. Auxiliar Andreas Wichert   

http://web.tecnico.ulisboa.pt/andreas.wichert/
-
https://www.amazon.com/author/andreaswichert

Instituto Superior Técnico - Universidade de Lisboa
Campus IST-Taguspark 
Avenida Professor Cavaco Silva                 Phone: +351  214233231
2744-016 Porto Salvo, Portugal

> On 20 Jul 2022, at 15:55, Schmidhuber Juergen <juergen at idsia.ch> wrote:
> 
> 
> I have never understood the difference between "symbolic" and "sub-symbolic" reasoning. A recurrent neural net (RNN) is a general computer that can do both. n-bit parity for arbitrary n can be solved by a tiny RNN with few connections, sequentially reading bits one by one [1]. The best learning algorithm is NOT gradient descent. Instead keep randomly initializing the RNN weights between -100 and +100 until the RNN solves parity for a few training examples of various large sizes n (this will take just 1000 trials or so). Now the RNN will probably generalize to ANY n. BTW, try that with a Transformer - it will never generalize like that.
> 
> [1] J. Schmidhuber and S. Hochreiter. Guessing can outperform many long time lag algorithms. Technical Note IDSIA-19-96, IDSIA, 1996
> 
> Jürgen
> 
> 
> 
> 
> On 19 Jul 2022, at 23:08, Danko Nikolic <danko.nikolic at gmail.com> wrote:
> 
> Dear Barak,
> 
> Thank you for the pointers. I have to read the papers. I need to understand then why, if parity is so easy to learn, my deep learning models had such a hard time that it led to an exponential growth in the number of needed parameters with each additional bit added to the input. Strange.
> 
> I will report back.
> 
> Best
> 
> Danko
> 
> Dr. Danko Nikolić
> www.danko-nikolic.com
> https://www.linkedin.com/in/danko-nikolic/
> -- I wonder, how is the brain able to generate insight? --
> 
> 
> On Mon, Jul 18, 2022 at 1:12 PM Barak A. Pearlmutter <barak at pearlmutter.net> wrote:
> On Mon, 18 Jul 2022 at 08:28, Danko Nikolic <danko.nikolic at gmail.com> wrote:
> In short, learning mechanisms cannot discover generalized XOR functions with simple connectivity -- only with complex connectivity. This problem results in exponential growth of needed resources as the number of bits in the generalized XOR increases.
> 
> Assuming that "generalized XOR" means parity, this must rely on some
> unusual definitions which you should probably state in order to avoid
> confusion.
> 
> Parity is a poster boy for an *easy* function to learn, albeit a
> nonlinear one. This is because in the (boolean) Fourier domain its
> spectrum consists of a single nonzero coefficient, and functions that
> are sparse in that domain are very easy to learn. See N. Linial, Y.
> Mansour, and N. Nisan, "Constant depth circuits, Fourier Transform and
> learnability", FOCS 1989, or Mansour, Y. (1994). Learning Boolean
> Functions via the Fourier Transform. Theoretical Advances in Neural
> Computation and Learning, 391–424. doi:10.1007/978-1-4615-2696-4_11
> 
> --Barak Pearlmutter
> 
> 
> 
> On 18 Jul 2022, at 18:01, Gary Marcus <gary.marcus at nyu.edu> wrote:
> 
> sure, but a person can learn the idea for n-bits from a few examples with a small number of bits, generalizing it to large values of n. most current systems learn it for a certain number of bits and don’t generalize beyond that number of bits.
> 
> On Jul 18, 2022, at 7:17 AM, Barak A. Pearlmutter <barak at pearlmutter.net> wrote:
> 
> 
> 
> On Mon, 18 Jul 2022 at 14:43, Danko Nikolic <danko.nikolic at gmail.com> wrote:
> <image.png>
> 
> It is a hard problem to learn for a connectionist network.
> 
> We don't need to invent new terminology, like "inverters problem" or "generalized xor." This is parity. Four (4) bit parity.
> 
> https://en.wikipedia.org/wiki/Parity_function
> 
> Parity is *not* a hard function to learn. Even for a connectionist network.
> 
> It is an interesting function for historic reasons (n-bit parity cannot be loaded by a k-th order perceptron, for k<n, although there are loopholes if a random bits are available or if you are allowed to only almost load it) and because it's an interesting function for many mathematical constructions. See the above wikipedia page for some details. But it is not super difficult to learn.
> 
> --Barak Pearlmutter.
> 
> 
> 
> 
> On 18 Jul 2022, at 00:58, gary at ucsd.edu <gary at eng.ucsd.edu> wrote:
> 
> Sorry, I can't let this go by:
> And it is good so because generalize XOR scales worse than power law. It scales exponentially! This a more agressive form of explosion than power law. 
> I'm not sure exactly what you mean by this, but a single-hidden layer network with N inputs and N hidden units can solve N-bit parity. Each unit has an increasing threshold, so, one turns on if there is one unit on in the input, and then turns on the output with a weight of +1. If two units are on in the input, then a second unit comes on and cancels the activation of the first unit via a weight of -1. Etc.
> 
> g.
> 
> 
> On Sat, Jul 16, 2022 at 12:03 AM Danko Nikolic <danko.nikolic at gmail.com> wrote:
> Dear Thomas,
> 
> Thank you for reading the paper and for the comments.
> 
> I cite: "In my experience, supervised classification scales linearly in the number of classes."
> This would be good to quantify as a plot. Maybe a research paper would be a good idea. The reason is that it seems that everyone else who tried to quantify that relation found a power law. At this point, it would be surprising to find a linear relationship. And it would probably make a well read paper.
> 
> But please do not forget that my argument states that even a linear relationship is not good enough to match bilogical brains. We need something more similar to a power law with exponent zero when it comes to the model size i.e., a constant number of parameters in the model. And we need linear relationship when it comes to learning time: Each newly learned object should needs about as much of learning effort as was needed for each previous object. 
> 
> I cite: "The real world is not dominated by generalized XOR problems."
> Agreed. And it is good so because generalize XOR scales worse than power law. It scales exponentially! This a more agressive form of explosion than power law. 
> Importantly, a generalized AND operation also scales exponentially (with a smaller exponent, though). I guess we would agree that the real world probably encouners a lot of AND problems. The only logical operaiton that could be learned with a linear increase in the number of parameters was a generalized OR. Finally, I foiund that a mixure of AND and OR resulted in a power law-like scaling of the number of parameters. So, a mixture of AND and OR seemed to scale as good (or as bad) as the real world. I have put this information into Supplementary Materials.
> 
> The conclusion that I derived from those analyses is: connectionism is not sustainable to reach human (or animal) levels of intelligence. Therefore, I hunted for an alternative pradigm.
> 
> Greetings,
> 
> Danko
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20220721/642b96c6/attachment.html>