Connectionists: weight guessing quickly solves n-bit parity

Redd, Emmett R EmmettRedd at MissouriState.edu
Thu Jul 21 11:53:13 EDT 2022


Since a Turing machine cannot "not explain the human problem solving", super-Turing computation should be considered.  There are two recent papers exploring computation consistent with super-Turing operation:   https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.3.013120 and https://link.springer.com/chapter/10.1007/978-3-030-86380-7_38 .

The APS paper is open source.  The studies include noise which the brain also has.  They simulate a quantal nature.  Neuron excitation is also quantal.

[https://cdn.journals.aps.org/journals/PRRESEARCH/key_images/10.1103/PhysRevResearch.3.013120.png]<https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.3.013120>
Noise optimizes super-Turing computation in recurrent neural networks<https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.3.013120>
The authors use noise to restore chaotic behavior and show consistency with super-Turing theory and operation in neural networks.
journals.aps.org

[https://static-content.springer.com/cover/book/978-3-030-86380-7.jpg]<https://link.springer.com/chapter/10.1007/978-3-030-86380-7_38>
Noise Quality and Super-Turing Computation in Recurrent Neural Networks - SpringerLink<https://link.springer.com/chapter/10.1007/978-3-030-86380-7_38>
Figures 3 and 4 show the number of output sequences which remain consistent with chaos (y-axis) vs. \(log_2\) of the repeat length of the noise (x-axis) for the logistic and Hénon maps digital RNNs for both optimum and near-optimum noise magnitudes and the two noise types (MATLAB vs. LFSR). As mentioned in Sec. 2, noise magnitudes of four times LSB is regarded as optimum for the Hénon map ...
link.springer.com




Emmett Redd Ph.D.   mailto:EmmettRedd at missouristate.edu
Professor                                 (417)836-5221
Department of Physics, Astronomy, and Materials Science
Missouri State University             Fax (417)836-6226
901 SOUTH NATIONAL
SPRINGFIELD, MO  65897   USA    Dept (417)836-5131


Perfect logic and faultless deduction make a pleasant theoretical structure, but it may be right or wrong; the experimenter is the only one to decide, and he is always right. –Léon Brillouin, Scientific Uncertainty, and Information (1964).


________________________________
From: Connectionists <connectionists-bounces at mailman.srv.cs.cmu.edu> on behalf of Andrzej Wichert <andreas.wichert at tecnico.ulisboa.pt>
Sent: Thursday, July 21, 2022 4:07 AM
To: Schmidhuber Juergen <juergen at idsia.ch>
Cc: connectionists at cs.cmu.edu <connectionists at cs.cmu.edu>
Subject: Re: Connectionists: weight guessing quickly solves n-bit parity

CAUTION: External Sender

Dear Juergen,

Symbols do not, by themselves, represent any utilizable knowledge, they cannot be used for a definition of similarity criteria between themselves. The use of symbols in algorithms which imitate human intelligent behavior led to the famous physical symbol system hypothesis  \index{physical symbol system hypothesis} by Newell and Simon (1976)   ``The necessary and sufficient condition for a physical system to exhibit intelligence is that it be a physical symbol system.''  Symbols  are not present in the world; they are the constructs of a human mind and simplify the process of representation used in communication and problem solving.

A Turing machine can simulate any algorithm, the same with RNN, but they do not explain the human problem solving.

There is a difference, for example human problem solving can be described by production systems. The most successful model is the SOAR architecture. Such systems were very successful, they learn and can give you an explanation for their doing.

in August 1998 Dave Touretzky asked on the connectionistic e-mailing list:
``Is connectionist symbol processing dead?’'


From ml-connectionists-request$@$mlist-1.sp.cs.cmu.edu<mailto:ml-connectionists-request$@$mlist-1.sp.cs.cmu.edu>  Tue Aug 11 17:35:10 1998
From: Dave$\_$Touretzky$@$cs.cmu.edu<mailto:Touretzky$@$cs.cmu.edu>
To: connectionists$@$cs.cmu.edu<mailto:connectionists$@$cs.cmu.edu>
Subject: Connectionist symbol processing: any progress?
Date: Tue, 11 Aug 1998 $03:34:27$ -0400

I'd like to start a debate on the current state of connectionist symbol
processing?  Is it dead?  Or does progress continue?  ...  People had gotten
some interesting effects with localist networks, by doing spreading activation
and a simple form of constraint satisfaction....  This approach does not create
new structure on the fly, or deal with structured representations or variable
binding.  Those localist networks that did attempt to implement variable binding
did so in a discrete, symbolic way that did not advance the parallel constraint
satisfaction/heuristic reasoning agenda of earlier spreading activation
research.  ...  So I concluded that connectionist symbol processing had reached
a plateau, and further progress would have to await some revolutionary new
insight about representations.  ...  The problems of structured representations
and variable binding have remained unsolved.  No one is trying to build
distributed connectionist reasoning systems any more, like the connectionist
production system I built with Geoff Hinton...


24 years past and not much progress was done. It seems that the progress is only related to pure brute force of computers, but not much insight beside a wishful thinking. The whole DL movements stops the progress in understanding how the brain works. We need some new fresh ideas beside error minimization.

I think one of the main problems is the publish or perish altitude, the famous Google impact factor. One does not care what some one is doing, one just checks his Google impact factor.. This is like the saying, eat more shit, one million flies cannot be wrong.
Like some mathematician said, computer science is not science at all, but it force us to follow its ideas.

Andreas



--------------------------------------------------------------------------------------------------
Prof. Auxiliar Andreas Wichert

http://web.tecnico.ulisboa.pt/andreas.wichert/
-
https://www.amazon.com/author/andreaswichert

Instituto Superior Técnico - Universidade de Lisboa
Campus IST-Taguspark
Avenida Professor Cavaco Silva                 Phone: +351  214233231
2744-016 Porto Salvo, Portugal

On 20 Jul 2022, at 15:55, Schmidhuber Juergen <juergen at idsia.ch<mailto:juergen at idsia.ch>> wrote:


I have never understood the difference between "symbolic" and "sub-symbolic" reasoning. A recurrent neural net (RNN) is a general computer that can do both. n-bit parity for arbitrary n can be solved by a tiny RNN with few connections, sequentially reading bits one by one [1]. The best learning algorithm is NOT gradient descent. Instead keep randomly initializing the RNN weights between -100 and +100 until the RNN solves parity for a few training examples of various large sizes n (this will take just 1000 trials or so). Now the RNN will probably generalize to ANY n. BTW, try that with a Transformer - it will never generalize like that.

[1] J. Schmidhuber and S. Hochreiter. Guessing can outperform many long time lag algorithms. Technical Note IDSIA-19-96, IDSIA, 1996

Jürgen




On 19 Jul 2022, at 23:08, Danko Nikolic <danko.nikolic at gmail.com<mailto:danko.nikolic at gmail.com>> wrote:

Dear Barak,

Thank you for the pointers. I have to read the papers. I need to understand then why, if parity is so easy to learn, my deep learning models had such a hard time that it led to an exponential growth in the number of needed parameters with each additional bit added to the input. Strange.

I will report back.

Best

Danko

Dr. Danko Nikolić
www.danko-nikolic.com<http://www.danko-nikolic.com>
https://www.linkedin.com/in/danko-nikolic/
-- I wonder, how is the brain able to generate insight? --


On Mon, Jul 18, 2022 at 1:12 PM Barak A. Pearlmutter <barak at pearlmutter.net> wrote:
On Mon, 18 Jul 2022 at 08:28, Danko Nikolic <danko.nikolic at gmail.com> wrote:
In short, learning mechanisms cannot discover generalized XOR functions with simple connectivity -- only with complex connectivity. This problem results in exponential growth of needed resources as the number of bits in the generalized XOR increases.

Assuming that "generalized XOR" means parity, this must rely on some
unusual definitions which you should probably state in order to avoid
confusion.

Parity is a poster boy for an *easy* function to learn, albeit a
nonlinear one. This is because in the (boolean) Fourier domain its
spectrum consists of a single nonzero coefficient, and functions that
are sparse in that domain are very easy to learn. See N. Linial, Y.
Mansour, and N. Nisan, "Constant depth circuits, Fourier Transform and
learnability", FOCS 1989, or Mansour, Y. (1994). Learning Boolean
Functions via the Fourier Transform. Theoretical Advances in Neural
Computation and Learning, 391–424. doi:10.1007/978-1-4615-2696-4_11

--Barak Pearlmutter



On 18 Jul 2022, at 18:01, Gary Marcus <gary.marcus at nyu.edu> wrote:

sure, but a person can learn the idea for n-bits from a few examples with a small number of bits, generalizing it to large values of n. most current systems learn it for a certain number of bits and don’t generalize beyond that number of bits.

On Jul 18, 2022, at 7:17 AM, Barak A. Pearlmutter <barak at pearlmutter.net> wrote:



On Mon, 18 Jul 2022 at 14:43, Danko Nikolic <danko.nikolic at gmail.com> wrote:
<image.png>

It is a hard problem to learn for a connectionist network.

We don't need to invent new terminology, like "inverters problem" or "generalized xor." This is parity. Four (4) bit parity.

https://en.wikipedia.org/wiki/Parity_function

Parity is *not* a hard function to learn. Even for a connectionist network.

It is an interesting function for historic reasons (n-bit parity cannot be loaded by a k-th order perceptron, for k<n, although there are loopholes if a random bits are available or if you are allowed to only almost load it) and because it's an interesting function for many mathematical constructions. See the above wikipedia page for some details. But it is not super difficult to learn.

--Barak Pearlmutter.




On 18 Jul 2022, at 00:58, gary at ucsd.edu <gary at eng.ucsd.edu> wrote:

Sorry, I can't let this go by:
And it is good so because generalize XOR scales worse than power law. It scales exponentially! This a more agressive form of explosion than power law.
I'm not sure exactly what you mean by this, but a single-hidden layer network with N inputs and N hidden units can solve N-bit parity. Each unit has an increasing threshold, so, one turns on if there is one unit on in the input, and then turns on the output with a weight of +1. If two units are on in the input, then a second unit comes on and cancels the activation of the first unit via a weight of -1. Etc.

g.


On Sat, Jul 16, 2022 at 12:03 AM Danko Nikolic <danko.nikolic at gmail.com> wrote:
Dear Thomas,

Thank you for reading the paper and for the comments.

I cite: "In my experience, supervised classification scales linearly in the number of classes."
This would be good to quantify as a plot. Maybe a research paper would be a good idea. The reason is that it seems that everyone else who tried to quantify that relation found a power law. At this point, it would be surprising to find a linear relationship. And it would probably make a well read paper.

But please do not forget that my argument states that even a linear relationship is not good enough to match bilogical brains. We need something more similar to a power law with exponent zero when it comes to the model size i.e., a constant number of parameters in the model. And we need linear relationship when it comes to learning time: Each newly learned object should needs about as much of learning effort as was needed for each previous object.

I cite: "The real world is not dominated by generalized XOR problems."
Agreed. And it is good so because generalize XOR scales worse than power law. It scales exponentially! This a more agressive form of explosion than power law.
Importantly, a generalized AND operation also scales exponentially (with a smaller exponent, though). I guess we would agree that the real world probably encouners a lot of AND problems. The only logical operaiton that could be learned with a linear increase in the number of parameters was a generalized OR. Finally, I foiund that a mixure of AND and OR resulted in a power law-like scaling of the number of parameters. So, a mixture of AND and OR seemed to scale as good (or as bad) as the real world. I have put this information into Supplementary Materials.

The conclusion that I derived from those analyses is: connectionism is not sustainable to reach human (or animal) levels of intelligence. Therefore, I hunted for an alternative pradigm.

Greetings,

Danko


This message originated outside Missouri State University. Please use caution when opening attachments, clicking links, or replying.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20220721/f79c24ec/attachment.html>


More information about the Connectionists mailing list