<div dir="ltr"><div><div><div><div><div><div><div><div><div><div><div><div><div>This is widely known and frequently ignored!<br><br></div>I am very familiar with both developmental and evolutionary robotics, but neural modelers often ignore their lessons.<br>
<br></div>Evolution is blind but also full of surprises and far smarter than engineers in its blind way, so even human designers have much to learn from it. Sure, they can design systems far faster than evolution - and it would be folly to truly try to evolve complex robots for all applications - but the true value of thinking from an evolutionary viewpoint is that it opens up whole new mechanisms, which can then be incorporated, albeit in simplified form, in non-evolutionary engineering. A careful study of phylogenetic histories illuminates many new things about systems that we might think we already understand. Ironically, the person who taught this lesson best was Herb Simon, one of the founders of symbolic AI!<br>
<br></div>One big problem with the symbolic approach to intelligence was that it assumed mechanisms (algorithms) and just sought to discover how brains instantiated them. Well, it turns out that brains have their own mechanisms which do not necessarily correspond to the abstractions of logic or automata theory. I think that many of us (including myself) make the same error when we assume we understand abstract "neural" mechanisms underlying mental functions, and just try to instantiate them with abstract neural building blocks like attractor networks, feature maps, neuron layers, etc. That is a fine enterprise as long as we acknowledge what we're doing and proceed with humility. In this, I always think of Emily Dickinson's wonderful lines (which I first heard from Lynn Margulis, who did discover one of evolution's big surprises):<br>
<br></div>But Nature is a stranger yet;<br></div>the ones who cite her most<br></div>have never passed her haunted house,<br></div>or simplified her ghost.<br><br></div>To pity those who know her not<br></div>is helped by the regret,<br>
</div>that those who know her, know her less<br></div>the nearer her they get.<br><br></div>Apologies for errors - I am quoting from memory.<br><br></div>Ali<br><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Mon, Feb 10, 2014 at 4:51 PM, Brian J Mingus <span dir="ltr"><<a href="mailto:brian.mingus@colorado.edu" target="_blank">brian.mingus@colorado.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">fyi, there is a field called Developmental Robotics which takes this perspective seriously. For example, an infant goes through the following developmental trajectory over the first several months of life:<div>
<br></div><div>- Born with nice looking reaches but can't reach to target</div><div><br></div><div>- Locks the elbow to limit the number of degrees of freedom and practices pointing to a target</div><div><br></div><div>
- Slowly starts to unlock the elbow, exposing more degrees of freedom, and practices reaching to a target</div><div><br></div><div>The infant does not need to learn how to solve a fully unconstrained inverse kinematics problem. It is born with reaching affordances and a musculoskeletal system which constrain the space into something computationally feasible.</div>
<div><br></div><div>Likewise, if you hold an infant's feet in warm water it will vigorously try to walk. </div><div><br></div><div>etc. etc. etc. This general pattern of evolved affordances being used to bootstrap intelligence is extremely widespread in the brain.</div>
<div><br></div><div>Anyone who doesn't take this into consideration when modeling the brain isn't creating a human being, but rather something else.<br><div class="gmail_extra"><br></div><div class="gmail_extra">
That said, evolution is a blind designer. A human being can out-design billions of years of evolution in a few years with nice supercomputer and plenty of lab subjects. So, if your goal is to understand exactly what a human being is, you might study human development. But if your goal is to create something more sophisticated than a human without the annoyance of studying exactly how a human develops intelligence, you might use deep networks with pretraining that automatically extract features that evolution baked in.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">btw, this is all widely known.. no?</div><div class="gmail_extra"><br></div><div class="gmail_extra">Brian Mingus</div><div class="gmail_extra"><a href="http://grey.colorado.edu/mingus" target="_blank">http://grey.colorado.edu/mingus</a><div>
<div class="h5"><br>
<br><div class="gmail_quote">On Mon, Feb 10, 2014 at 2:37 PM, Ali Minai <span dir="ltr"><<a href="mailto:minaiaa@gmail.com" target="_blank">minaiaa@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr"><div>I think Gary's last paragraph is absolutely key. Unless we take both the evolutionary and the developmental processes into account, we will neither understand complex brains fully nor replicate their functionality too well in our robots etc. We build complex robots that know nothing and then ask them to learn complex things, setting up a hopelessly difficult learning problem. But that isn't how animals learn, or why animals have the brains and bodies they have. A purely abstract computational approach to neural models makes the same category error that connectionists criticized symbolists for making, just at a different level.<span><font color="#888888"><br>
<br></font></span></div><span><font color="#888888"><div>Ali<br></div></font></span></div><div class="gmail_extra"><div><div><br><br><div class="gmail_quote">On Mon, Feb 10, 2014 at 11:38 AM, Gary Marcus <span dir="ltr"><<a href="mailto:gary.marcus@nyu.edu" target="_blank">gary.marcus@nyu.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word">Juergen and others,<div>
<br></div><div>I am with John on his two basic concerns, and think that your appeal to computational universality is a red herring; I cc the entire group because I think that these issues lay at the center of why many of the hardest problems in AI and neuroscience continue to lay outside of reach, despite in-principle proofs about computational universality. </div>
<div><br></div><div>John’s basic points, which I have also made before (e.g. in my books The Algebraic Mind and The Birth of the Mind and in my periodic New Yorker posts) are two</div><div><br></div><div>a. It is unrealistic to expect that hierarchies of pattern recognizers will suffice for the full range of cognitive problems that humans (and strong AI systems) face. Deep learning, to take one example, excels at classification, but has thus far had relatively little to contribute to inference or natural language understanding. Socher et al’s impressive CVG work, for instance, is parasitic on a traditional (symbolic) parser, not a soup-to-nuts neural net induced from input. </div>
<div><br></div><div>b. it is unrealistic to expect that all the relevant information can be extracted by any general purpose learning device.</div><div><br></div><div>Yes, you can reliably map any arbitrary input-output relation onto a multilayer perceptron or recurrent net, but <i>only</i> if you know the complete input-output mapping in advance. Alas, you can’t be guaranteed to do that in general given arbitrary subsets of the complete space; in the real world, learners see subsets of possible data and have to make guesses about what the rest will be like. Wolpert’s No Free Lunch work is instructive here (and also in line with how cognitive scientists like Chomsky, Pinker, and myself have thought about the problem). For any problem, I presume that there exists an appropriately-configured net, but there is no guarantee that in the real world you are going to be able to correctly induce the right system via general-purpose learning algorithm given a finite amount of data, with a finite amount of training. Empirically, neural nets of roughly the form you are discussing have worked fine for some problems (e.g. backgammon) but been no match for their symbolic competitors in other domains (chess) and worked only as an adjunct rather than an central ingredient in still others (parsing, question-answering a la Watson, etc); in other domains, like planning and common-sense reasoning, there has been essentially no serious work at all.</div>
<div><br></div><div>My own take, informed by evolutionary and developmental biology, is that no single general purpose architecture will ever be a match for the endproduct of a billion years of evolution, which includes, I suspect, a significant amount of customized architecture that need not be induced anew in each generation. We learn as well as we do precisely because evolution has preceded us, and endowed us with custom tools for learning in different domains. Until the field of neural nets more seriously engages in understanding what the contribution from evolution to neural wetware might be, I will remain pessimistic about the field’s prospects.</div>
<div><br></div><div>Best,</div><div>Gary Marcus</div><div><br></div><font face="HelveticaNeue-Light"><span>Professor of Psychology</span><br></font><div><div style="text-align:-webkit-auto;word-wrap:break-word"><div style="text-align:-webkit-auto;word-wrap:break-word">
<div style="text-align:-webkit-auto;word-wrap:break-word"><font face="HelveticaNeue-Light"><div><div><div style="margin:0in 0in 0.0001pt">New York University</div><div style="margin:0in 0in 0.0001pt">Visiting Cognitive Scientist</div>
<div style="margin:0in 0in 0.0001pt">Allen Institute for Brain Science</div><div style="margin:0in 0in 0.0001pt">Allen Institute for Artiificial Intelligence</div></div><div><div style="margin:0in 0in 0.0001pt"><a href="http://twitter.com/GaryMarcus" target="_blank"></a></div>
</div></div>co-edited book coming late 2014:</font></div><div style="text-align:-webkit-auto;word-wrap:break-word"><span style="text-align:-webkit-auto"><font face="HelveticaNeue-Light">The Future of the Brain: Essays By The World’s Leading Neuroscientists</font></span></div>
<div style="text-align:-webkit-auto;word-wrap:break-word"><a href="http://garymarcus.com/" style="font-family:HelveticaNeue-Light" target="_blank">http://garymarcus.com/</a></div><div><br></div></div></div></div><div><div>
<div><div>On Feb 10, 2014, at 10:26 AM, Juergen Schmidhuber <<a href="mailto:juergen@idsia.ch" target="_blank">juergen@idsia.ch</a>> wrote:</div><br><blockquote type="cite">John,<br><br>perhaps your view is a bit too pessimistic. Note that a single RNN already is a general computer. In principle, dynamic RNNs can map arbitrary observation sequences to arbitrary computable sequences of motoric actions and internal attention-directing operations, e.g., to process cluttered scenes, or to implement development (the examples you mentioned). From my point of view, the main question is how to exploit this universal potential through learning. A stack of dynamic RNN can sometimes facilitate this. What it learns can later be collapsed into a single RNN [3].<br>
<br>Juergen<br><br><a href="http://www.idsia.ch/~juergen/whatsnew.html" target="_blank">http://www.idsia.ch/~juergen/whatsnew.html</a><br><br><br><br>On Feb 7, 2014, at 12:54 AM, Juyang Weng <<a href="mailto:weng@cse.msu.edu" target="_blank">weng@cse.msu.edu</a>> wrote:<br>
<br><blockquote type="cite">Juergen:<br><br>You wrote: A stack of recurrent NN. But it is a wrong architecture as far as the brain is concerned.<br><br>Although my joint work with Narendra Ahuja and Thomas S. Huang at UIUC was probably the first<br>
learning network that used the deep Learning idea for learning from clutter scenes (Cresceptron ICCV 1992 and IJCV 1997),<br>I gave up this static deep learning idea later after we considered the Principle 1: Development.<br>
<br>The deep learning architecture is wrong for the brain. It is too restricted, static in architecture, and cannot learn directly from cluttered scenes required by Principle 1. The brain is not a cascade of recurrent NN.<br>
<br>I quote from Antonio Damasio "Decartes' Error": p. 93: "But intermediate communications occurs also via large subcortical nuclei such as those in the thalamas and basal ganglia, and via small nulei such as those in the brain stem."<br>
<br>Of course, the cerebral pathways themselves are not a stack of recurrent NN either.<br><br>There are many fundamental reasons for that. I give only one here base on our DN brain model: Looking at a human, the brain must dynamically attend the tip of the nose, the entire nose, the face, or the entire human body on the fly. For example, when the network attend the nose, the entire human body becomes the background! Without a brain network that has both shallow and deep connections (unlike your stack of recurrent NN), your network is only for recognizing a set of static patterns in a clean background. This is still an overworked pattern recognition problem, not a vision problem.<br>
<br>-John<br><br>On 2/6/14 7:24 AM, Schmidhuber Juergen wrote:<br><blockquote type="cite">Deep Learning in Artificial Neural Networks (NN) is about credit assignment across many subsequent computational stages, in deep or recurrent NN.<br>
<br>A popluar Deep Learning NN is the Deep Belief Network (2006) [1,2]. A stack of feedforward NN (FNN) is pre-trained in unsupervised fashion. This can facilitate subsequent supervised learning.<br><br>Let me re-advertise a much older, very similar, but more general, working Deep Learner of 1991. It can deal with temporal sequences: the Neural Hierarchical Temporal Memory or Neural History Compressor [3]. A stack of recurrent NN (RNN) is pre-trained in unsupervised fashion. This can greatly facilitate subsequent supervised learning.<br>
<br>The RNN stack is more general in the sense that it uses sequence-processing RNN instead of FNN with unchanging inputs. In the early 1990s, the system was able to learn many previously unlearnable Deep Learning tasks, one of them requiring credit assignment across 1200 successive computational stages [4].<br>
<br>Related developments: In the 1990s there was a trend from partially unsupervised [3] to fully supervised recurrent Deep Learners [5]. In recent years, there has been a similar trend from partially unsupervised to fully supervised systems. For example, several recent competition-winning and benchmark record-setting systems use supervised LSTM RNN stacks [6-9].<br>
<br><br>References:<br><br>[1] G. E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 2006. <a href="http://www.cs.toronto.edu/~hinton/science.pdf" target="_blank">http://www.cs.toronto.edu/~hinton/science.pdf</a><br>
<br>[2] G. W. Cottrell. New Life for Neural Networks. Science, Vol. 313. no. 5786, pp. 454-455, 2006. <a href="http://www.academia.edu/155897/Cottrell_Garrison_W._2006_New_life_for_neural_networks" target="_blank">http://www.academia.edu/155897/Cottrell_Garrison_W._2006_New_life_for_neural_networks</a><br>
<br>[3] J. Schmidhuber. Learning complex, extended sequences using the principle of history compression, Neural Computation, 4(2):234-242, 1992. (Based on TR FKI-148-91, 1991.) <a href="ftp://ftp.idsia.ch/pub/juergen/chunker.pdf" target="_blank">ftp://ftp.idsia.ch/pub/juergen/chunker.pdf</a> Overview: <a href="http://www.idsia.ch/~juergen/firstdeeplearner.html" target="_blank">http://www.idsia.ch/~juergen/firstdeeplearner.html</a><br>
<br>[4] J. Schmidhuber. Habilitation thesis, TUM, 1993. <a href="ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf" target="_blank">ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf</a> . Includes an experiment with credit assignment across 1200 subsequent computational stages for a Neural Hierarchical Temporal Memory or History Compressor or RNN stack with unsupervised pre-training [2] (try Google Translate in your mother tongue): <a href="http://www.idsia.ch/~juergen/habilitation/node114.html" target="_blank">http://www.idsia.ch/~juergen/habilitation/node114.html</a><br>
<br>[5] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. Based on TR FKI-207-95, 1995. <a href="ftp://ftp.idsia.ch/pub/juergen/lstm.pdf" target="_blank">ftp://ftp.idsia.ch/pub/juergen/lstm.pdf</a> . Lots of of follow-up work on LSTM under <a href="http://www.idsia.ch/~juergen/rnn.html" target="_blank">http://www.idsia.ch/~juergen/rnn.html</a><br>
<br>[6] S. Fernandez, A. Graves, J. Schmidhuber. Sequence labelling in structured domains with hierarchical recurrent neural networks. In Proc. IJCAI'07, p. 774-779, Hyderabad, India, 2007. <a href="ftp://ftp.idsia.ch/pub/juergen/IJCAI07sequence.pdf" target="_blank">ftp://ftp.idsia.ch/pub/juergen/IJCAI07sequence.pdf</a><br>
<br>[7] A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. NIPS'22, p 545-552, Vancouver, MIT Press, 2009. <a href="http://www.idsia.ch/~juergen/nips2009.pdf" target="_blank">http://www.idsia.ch/~juergen/nips2009.pdf</a><br>
<br>[8] 2009: First very deep (and recurrent) learner to win international competitions with secret test sets: deep LSTM RNN (1995-) won three connected handwriting contests at ICDAR 2009 (French, Arabic, Farsi), performing simultaneous segmentation and recognition. <a href="http://www.idsia.ch/~juergen/handwriting.html" target="_blank">http://www.idsia.ch/~juergen/handwriting.html</a><br>
<br>[9] A. Graves, A. Mohamed, G. E. Hinton. Speech Recognition with Deep Recurrent Neural Networks. ICASSP 2013, Vancouver, 2013. <a href="http://www.cs.toronto.edu/~hinton/absps/RNN13.pdf" target="_blank">http://www.cs.toronto.edu/~hinton/absps/RNN13.pdf</a><br>
<br><br><br>Juergen Schmidhuber<br><a href="http://www.idsia.ch/~juergen/whatsnew.html" target="_blank">http://www.idsia.ch/~juergen/whatsnew.html</a><br></blockquote><br>-- <br>--<br>Juyang (John) Weng, Professor<br>Department of Computer Science and Engineering<br>
MSU Cognitive Science Program and MSU Neuroscience Program<br>428 S Shaw Ln Rm 3115<br>Michigan State University<br>East Lansing, MI 48824 USA<br>Tel: <a href="tel:517-353-4388" value="+15173534388" target="_blank">517-353-4388</a><br>
Fax: <a href="tel:517-432-1061" value="+15174321061" target="_blank">517-432-1061</a><br>Email: <a href="mailto:weng@cse.msu.edu" target="_blank">weng@cse.msu.edu</a><br>URL: <a href="http://www.cse.msu.edu/~weng/" target="_blank">http://www.cse.msu.edu/~weng/</a><br>
----------------------------------------------<br><br></blockquote><br><br></blockquote></div><br></div></div></div></blockquote></div><br><br clear="all"><br></div></div><div>-- <br>Ali A. Minai, Ph.D.<br>Professor<br>
Complex Adaptive Systems Lab<br>
Department of Electrical Engineering & Computing Systems<br>University of Cincinnati<br>Cincinnati, OH 45221-0030<br><br>Phone: <a href="tel:%28513%29%20556-4783" value="+15135564783" target="_blank">(513) 556-4783</a><br>
Fax: <a href="tel:%28513%29%20556-7326" value="+15135567326" target="_blank">(513) 556-7326</a><br>Email: <a href="mailto:Ali.Minai@uc.edu" target="_blank">Ali.Minai@uc.edu</a><br>
<a href="mailto:minaiaa@gmail.com" target="_blank">minaiaa@gmail.com</a><br><br>WWW: <a href="http://www.ece.uc.edu/%7Eaminai/" target="_blank">http://www.ece.uc.edu/~aminai/</a>
</div></div>
</blockquote></div><br></div></div></div></div></div>
</blockquote></div><br><br clear="all"><br>-- <br>Ali A. Minai, Ph.D.<br>Professor<br>Complex Adaptive Systems Lab<br>Department of Electrical Engineering & Computing Systems<br>University of Cincinnati<br>Cincinnati, OH 45221-0030<br>
<br>Phone: (513) 556-4783<br>Fax: (513) 556-7326<br>Email: <a href="mailto:Ali.Minai@uc.edu" target="_blank">Ali.Minai@uc.edu</a><br> <a href="mailto:minaiaa@gmail.com" target="_blank">minaiaa@gmail.com</a><br><br>
WWW: <a href="http://www.ece.uc.edu/%7Eaminai/" target="_blank">http://www.ece.uc.edu/~aminai/</a>
</div>