<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    I really like Gary Marcus' last paragraph as well.  It ties in with

    what I have been hoping to see for some time - vast networks of

    ensembles (NOT just hierarchical - which is a very limited view)

    that use "multiple conflicting hypothesis" involving : <br>

    <ul>

      <li>data -  sensory, learned/ evolved.  In other words, both

        personal and evolutionary (extra-generational) sources of a huge

        amount of information <br>

      </li>

      <li>functional or associative structures - high specialized (not

        just sensory, motor, etc) to highly generalized (not just logic,

        specialized classifiers, etc) <br>

      </li>

      <li>architectures - static (often used, always available),

        adaptive (for new challenges that can use existing

        capabilities), dynamic (newly develop for one-time or future

        use)<br>

      </li>

    </ul>

    and "brought to life" through :<br>

    <ul>

      <li>processes - memory, parameter adjustment, learning, evolution

        (David Fogel has suggested that at some point learning requires

        evolutionary processes, Nik Kasabovs Evolving connectionists

        systems, etc), goal-setting, decisions, optimisations  (NOTE

        that it isn't necessary to always pick a winning idea, even for

        actions!).  </li>

      <li>systems </li>

      <li>capabilities - and their adaptation to other situations,

        sometimes through analogy/metaphor/abduction</li>

      <li>consciousness</li>

      <li>goal-setting <br>

      </li>

      <li>behaviors</li>

      <li>personalities (human, or in a very limited way, what computer

        systems do)</li>

    </ul>

    These are all there (and much more!) biologically, and all have been

    addressed, but the integration of effective overall systems is

    perhaps more limited.  The ability to utilise and morph specialized

    functional nets to new or dynamic challenges, an the development of

    overall architectures, seem to me to be less developed at present.  

    Some of the cognitive work seems to address this more than the

    "engineering" type of ANN applications.  <br>

    <br>

    It seems to me that biology, through the dramatic but mostly ignored

    example of "instinct" goes far beyond current thinking.  How genetic

    & non-genetic DNA (and other "molecules) and epigenetics can

    code all of this in the form of "Mendellian heredity", and how there

    may also be "Lamarckian heredity" coding (perhaps suggested by work

    or Michael Meaney and others) are questions that biologists look at,

    but this seems to be only partly addressed in the area of

    [evolutionary computation-ANNs] (eg genetic algorithms).    It is

    the coding of all of the above that actually interests me the most

    (mostly related to ANNs, at present I'm too confused about the

    biological coding!).   I keep thinking that the "unused DNA" alone

    in each cell provides a huge repository of powerful programming

    available for other biological applications, and that includes brain

    & mind. <br>

    <br>

    It is natural to focus on one aspect to advance with one's research,

    and to favour one approach or the other.  It is perhaps too easy to

    ignore the greater picture of what we can possibly infer at present,

    and to expect that the overall capabilities result from an enormous,

    powerful, and beautiful collection of diverse options that grow as

    we learn more about this area over the decades.  <br>

    <br>

    <br>

    <a class="moz-txt-link-abbreviated" href="http://www.BillHowell.ca">www.BillHowell.ca</a><br>

    <br>

    <br>

    <div class="moz-cite-prefix">Ali Minai wrote, On 14-02-10 02:37 PM:<br>

    </div>

    <blockquote

cite="mid:CABG3s4uaC6yetV+-u+rsHsJbvEv4jCATFJrVdcQbsoAmhvMusw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>I think Gary's last paragraph is absolutely key. Unless we

          take both the evolutionary and the developmental processes

          into account, we will neither understand complex brains fully

          nor replicate their functionality too well in our robots etc.

          We build complex robots that know nothing and then ask them to

          learn complex things, setting up a hopelessly difficult

          learning problem. But that isn't how animals learn, or why

          animals have the brains and bodies they have. A purely

          abstract computational approach to neural models makes the

          same category error that connectionists criticized symbolists

          for making, just at a different level.<br>

          <br>

        </div>

        <div>Ali<br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <br>

        <div class="gmail_quote">On Mon, Feb 10, 2014 at 11:38 AM, Gary

          Marcus <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:gary.marcus@nyu.edu" target="_blank">gary.marcus@nyu.edu</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div style="word-wrap:break-word">Juergen and others,

              <div><br>

              </div>

              <div>I am with John on his two basic concerns, and think

                that your appeal to computational universality is a red

                herring; I cc the entire group because I think that

                these issues lay at the center of why many of the

                hardest problems in AI and neuroscience continue to lay

                outside of reach, despite in-principle proofs about

                computational universality. </div>

              <div><br>

              </div>

              <div>John’s basic points, which I have also made before

                (e.g. in my books The Algebraic Mind and The Birth of

                the Mind and in my periodic New Yorker posts) are two</div>

              <div><br>

              </div>

              <div>a. It is unrealistic to expect that hierarchies of

                pattern recognizers will suffice for the full range of

                cognitive problems that humans (and strong AI systems)

                face. Deep learning, to take one example, excels at

                classification, but has thus far had relatively little

                to contribute to inference or natural language

                understanding.  Socher et al’s impressive CVG work, for

                instance, is parasitic on a traditional (symbolic)

                parser, not a soup-to-nuts neural net induced from

                input. </div>

              <div><br>

              </div>

              <div>b. it is unrealistic to expect that all the relevant

                information can be extracted by any general purpose

                learning device.</div>

              <div><br>

              </div>

              <div>Yes, you can reliably map any arbitrary input-output

                relation onto a multilayer perceptron or recurrent net,

                but <i>only</i> if you know the complete input-output

                mapping in advance. Alas, you can’t be guaranteed to do

                that in general given arbitrary subsets of the complete

                space; in the real world, learners see subsets of

                possible data and have to make guesses about what the

                rest will be like. Wolpert’s No Free Lunch work is

                instructive here (and also in line with how cognitive

                scientists like Chomsky, Pinker, and myself have thought

                about the problem). For any problem, I presume that

                there exists an appropriately-configured net, but there

                is no guarantee that in the real world you are going to

                be able to correctly induce the right system via

                general-purpose learning algorithm given a finite amount

                of data, with a finite amount of training. Empirically,

                neural nets of roughly the form you are discussing have

                worked fine for some problems (e.g. backgammon) but been

                no match for their symbolic competitors in other domains

                (chess) and worked only as an adjunct rather than an

                central ingredient in still others (parsing,

                question-answering a la Watson, etc); in other domains,

                like planning and common-sense reasoning, there has been

                essentially no serious work at all.</div>

              <div><br>

              </div>

              <div>My own take, informed by evolutionary and

                developmental biology, is that no single general purpose

                architecture will ever be a match for the endproduct of

                a billion years of evolution, which includes, I suspect,

                a significant amount of customized architecture that

                need not be induced anew in each generation.  We learn

                as well as we do precisely because evolution has

                preceded us, and endowed us with custom tools for

                learning in different domains. Until the field of neural

                nets more seriously engages in understanding what the

                contribution from evolution to neural wetware might be,

                I will remain pessimistic about the field’s prospects.</div>

              <div><br>

              </div>

              <div>Best,</div>

              <div>Gary Marcus</div>

              <div><br>

              </div>

              <font face="HelveticaNeue-Light"><span>Professor of

                  Psychology</span><br>

              </font>

              <div>

                <div

                  style="text-align:-webkit-auto;word-wrap:break-word">

                  <div

                    style="text-align:-webkit-auto;word-wrap:break-word">

                    <div

                      style="text-align:-webkit-auto;word-wrap:break-word"><font

                        face="HelveticaNeue-Light">

                        <div>

                          <div>

                            <div style="margin:0in 0in 0.0001pt">New

                              York University</div>

                            <div style="margin:0in 0in 0.0001pt">Visiting

                              Cognitive Scientist</div>

                            <div style="margin:0in 0in 0.0001pt">Allen

                              Institute for Brain Science</div>

                            <div style="margin:0in 0in 0.0001pt">Allen

                              Institute for Artiificial Intelligence</div>

                          </div>

                          <div>

                          </div>

                        </div>

                        co-edited book coming late 2014:</font></div>

                    <div

                      style="text-align:-webkit-auto;word-wrap:break-word"><span

                        style="text-align:-webkit-auto"><font

                          face="HelveticaNeue-Light">The Future of the

                          Brain: Essays By The World’s Leading

                          Neuroscientists</font></span></div>

                    <div

                      style="text-align:-webkit-auto;word-wrap:break-word"><a

                        moz-do-not-send="true"

                        href="http://garymarcus.com/"

                        style="font-family:HelveticaNeue-Light"

                        target="_blank">http://garymarcus.com/</a></div>

                    <div><br>

                    </div>

                  </div>

                </div>

              </div>

              <div>

                <div class="h5">

                  <div>

                    <div>On Feb 10, 2014, at 10:26 AM, Juergen

                      Schmidhuber <<a moz-do-not-send="true"

                        href="mailto:juergen@idsia.ch" target="_blank">juergen@idsia.ch</a>>

                      wrote:</div>

                    <br>

                    <blockquote type="cite">John,<br>

                      <br>

                      perhaps your view is a bit too pessimistic. Note

                      that a single RNN already is a general computer.

                      In principle, dynamic RNNs can map arbitrary

                      observation sequences to arbitrary computable

                      sequences of motoric actions and internal

                      attention-directing operations, e.g., to process

                      cluttered scenes, or to implement development (the

                      examples you mentioned). From my point of view,

                      the main question is how to exploit this universal

                      potential through learning. A stack of dynamic RNN

                      can sometimes facilitate this. What it learns can

                      later be collapsed into a single RNN [3].<br>

                      <br>

                      Juergen<br>

                      <br>

                      <a moz-do-not-send="true"

                        href="http://www.idsia.ch/%7Ejuergen/whatsnew.html"

                        target="_blank">http://www.idsia.ch/~juergen/whatsnew.html</a><br>

                      <br>

                      <br>

                      <br>

                      On Feb 7, 2014, at 12:54 AM, Juyang Weng <<a

                        moz-do-not-send="true"

                        href="mailto:weng@cse.msu.edu" target="_blank">weng@cse.msu.edu</a>>

                      wrote:<br>

                      <br>

                      <blockquote type="cite">Juergen:<br>

                        <br>

                        You wrote: A stack of recurrent NN.  But it is a

                        wrong architecture as far as the brain is

                        concerned.<br>

                        <br>

                        Although my joint work with Narendra Ahuja and

                        Thomas S. Huang at UIUC was probably the first<br>

                        learning network that used the deep Learning

                        idea for learning from clutter scenes

                        (Cresceptron ICCV 1992 and IJCV 1997),<br>

                        I gave up this static deep learning idea later

                        after we considered the Principle 1:

                        Development.<br>

                        <br>

                        The deep learning architecture is wrong for the

                        brain.  It is too restricted, static in

                        architecture, and cannot learn directly from

                        cluttered scenes required by Principle 1.  The

                        brain is not a cascade of recurrent NN.<br>

                        <br>

                        I quote from Antonio Damasio "Decartes' Error":

                        p. 93: "But intermediate communications occurs

                        also via large subcortical nuclei such as those

                        in the thalamas and basal ganglia, and via small

                        nulei such as those in the brain stem."<br>

                        <br>

                        Of course, the cerebral pathways themselves are

                        not a stack of recurrent NN either.<br>

                        <br>

                        There are many fundamental reasons for that.  I

                        give only one here base on our DN brain model:

                         Looking at a human, the brain must dynamically

                        attend the tip of the nose, the entire nose, the

                        face, or the entire human body on the fly.  For

                        example, when the network attend the nose, the

                        entire human body becomes the background!

                         Without a brain network that has both shallow

                        and deep connections (unlike your stack of

                        recurrent NN), your network is only for

                        recognizing a set of static patterns in a clean

                        background.  This is still an overworked pattern

                        recognition problem, not a vision problem.<br>

                        <br>

                        -John<br>

                        <br>

                        On 2/6/14 7:24 AM, Schmidhuber Juergen wrote:<br>

                        <blockquote type="cite">Deep Learning in

                          Artificial Neural Networks (NN) is about

                          credit assignment across many subsequent

                          computational stages, in deep or recurrent NN.<br>

                          <br>

                          A popluar Deep Learning NN is the Deep Belief

                          Network (2006) [1,2].  A stack of feedforward

                          NN (FNN) is pre-trained in unsupervised

                          fashion. This can facilitate subsequent

                          supervised learning.<br>

                          <br>

                          Let me re-advertise a much older, very

                          similar, but more general, working Deep

                          Learner of 1991. It can deal with temporal

                          sequences: the Neural Hierarchical Temporal

                          Memory or Neural History Compressor [3]. A

                          stack of recurrent NN (RNN) is pre-trained in

                          unsupervised fashion. This can greatly

                          facilitate subsequent supervised learning.<br>

                          <br>

                          The RNN stack is more general in the sense

                          that it uses sequence-processing RNN instead

                          of FNN with unchanging inputs. In the early

                          1990s, the system was able to learn many

                          previously unlearnable Deep Learning tasks,

                          one of them requiring credit assignment across

                          1200 successive computational stages [4].<br>

                          <br>

                          Related developments: In the 1990s there was a

                          trend from partially unsupervised [3] to fully

                          supervised recurrent Deep Learners [5]. In

                          recent years, there has been a similar trend

                          from partially unsupervised to fully

                          supervised systems. For example, several

                          recent competition-winning and benchmark

                          record-setting systems use supervised LSTM RNN

                          stacks [6-9].<br>

                          <br>

                          <br>

                          References:<br>

                          <br>

                          [1] G. E. Hinton, R. R. Salakhutdinov.

                          Reducing the dimensionality of data with

                          neural networks. Science, Vol. 313. no. 5786,

                          pp. 504 - 507, 2006. <a

                            moz-do-not-send="true"

                            href="http://www.cs.toronto.edu/%7Ehinton/science.pdf"

                            target="_blank">http://www.cs.toronto.edu/~hinton/science.pdf</a><br>

                          <br>

                          [2] G. W. Cottrell. New Life for Neural

                          Networks. Science, Vol. 313. no. 5786, pp.

                          454-455, 2006. <a moz-do-not-send="true"

href="http://www.academia.edu/155897/Cottrell_Garrison_W._2006_New_life_for_neural_networks"

                            target="_blank">http://www.academia.edu/155897/Cottrell_Garrison_W._2006_New_life_for_neural_networks</a><br>

                          <br>

                          [3] J. Schmidhuber. Learning complex, extended

                          sequences using the principle of history

                          compression, Neural Computation, 4(2):234-242,

                          1992. (Based on TR FKI-148-91, 1991.)  <a

                            moz-do-not-send="true"

                            href="ftp://ftp.idsia.ch/pub/juergen/chunker.pdf"

                            target="_blank">ftp://ftp.idsia.ch/pub/juergen/chunker.pdf</a>

                           Overview: <a moz-do-not-send="true"

                            href="http://www.idsia.ch/%7Ejuergen/firstdeeplearner.html"

                            target="_blank">http://www.idsia.ch/~juergen/firstdeeplearner.html</a><br>

                          <br>

                          [4] J. Schmidhuber. Habilitation thesis, TUM,

                          1993. <a moz-do-not-send="true"

                            href="ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf"

                            target="_blank">ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf</a>

                          . Includes an experiment with credit

                          assignment across 1200 subsequent

                          computational stages for a Neural Hierarchical

                          Temporal Memory or History Compressor or RNN

                          stack with unsupervised pre-training [2] (try

                          Google Translate in your mother tongue): <a

                            moz-do-not-send="true"

                            href="http://www.idsia.ch/%7Ejuergen/habilitation/node114.html"

                            target="_blank">http://www.idsia.ch/~juergen/habilitation/node114.html</a><br>

                          <br>

                          [5] S. Hochreiter, J. Schmidhuber. Long

                          Short-Term Memory. Neural Computation,

                          9(8):1735-1780, 1997. Based on TR FKI-207-95,

                          1995.  <a moz-do-not-send="true"

                            href="ftp://ftp.idsia.ch/pub/juergen/lstm.pdf"

                            target="_blank">ftp://ftp.idsia.ch/pub/juergen/lstm.pdf</a>

                          . Lots of of follow-up work on LSTM under <a

                            moz-do-not-send="true"

                            href="http://www.idsia.ch/%7Ejuergen/rnn.html"

                            target="_blank">http://www.idsia.ch/~juergen/rnn.html</a><br>

                          <br>

                          [6] S. Fernandez, A. Graves, J. Schmidhuber.

                          Sequence labelling in structured domains with

                          hierarchical recurrent neural networks. In

                          Proc. IJCAI'07, p. 774-779, Hyderabad, India,

                          2007.  <a moz-do-not-send="true"

                            href="ftp://ftp.idsia.ch/pub/juergen/IJCAI07sequence.pdf"

                            target="_blank">ftp://ftp.idsia.ch/pub/juergen/IJCAI07sequence.pdf</a><br>

                          <br>

                          [7] A. Graves, J. Schmidhuber. Offline

                          Handwriting Recognition with Multidimensional

                          Recurrent Neural Networks. NIPS'22, p 545-552,

                          Vancouver, MIT Press, 2009.  <a

                            moz-do-not-send="true"

                            href="http://www.idsia.ch/%7Ejuergen/nips2009.pdf"

                            target="_blank">http://www.idsia.ch/~juergen/nips2009.pdf</a><br>

                          <br>

                          [8] 2009: First very deep (and recurrent)

                          learner to win international competitions with

                          secret test sets: deep LSTM RNN (1995-) won

                          three connected handwriting contests at ICDAR

                          2009 (French, Arabic, Farsi), performing

                          simultaneous segmentation and recognition.  <a

                            moz-do-not-send="true"

                            href="http://www.idsia.ch/%7Ejuergen/handwriting.html"

                            target="_blank">http://www.idsia.ch/~juergen/handwriting.html</a><br>

                          <br>

                          [9] A. Graves, A. Mohamed, G. E. Hinton.

                          Speech Recognition with Deep Recurrent Neural

                          Networks. ICASSP 2013, Vancouver, 2013.   <a

                            moz-do-not-send="true"

                            href="http://www.cs.toronto.edu/%7Ehinton/absps/RNN13.pdf"

                            target="_blank">http://www.cs.toronto.edu/~hinton/absps/RNN13.pdf</a><br>

                          <br>

                          <br>

                          <br>

                          Juergen Schmidhuber<br>

                          <a moz-do-not-send="true"

                            href="http://www.idsia.ch/%7Ejuergen/whatsnew.html"

                            target="_blank">http://www.idsia.ch/~juergen/whatsnew.html</a><br>

                        </blockquote>

                        <br>

                        -- <br>

                        --<br>

                        Juyang (John) Weng, Professor<br>

                        Department of Computer Science and Engineering<br>

                        MSU Cognitive Science Program and MSU

                        Neuroscience Program<br>

                        428 S Shaw Ln Rm 3115<br>

                        Michigan State University<br>

                        East Lansing, MI 48824 USA<br>

                        Tel: <a moz-do-not-send="true"

                          href="tel:517-353-4388" value="+15173534388"

                          target="_blank">517-353-4388</a><br>

                        Fax: <a moz-do-not-send="true"

                          href="tel:517-432-1061" value="+15174321061"

                          target="_blank">517-432-1061</a><br>

                        Email: <a moz-do-not-send="true"

                          href="mailto:weng@cse.msu.edu" target="_blank">weng@cse.msu.edu</a><br>

                        URL: <a moz-do-not-send="true"

                          href="http://www.cse.msu.edu/%7Eweng/"

                          target="_blank">http://www.cse.msu.edu/~weng/</a><br>

                        ----------------------------------------------<br>

                        <br>

                      </blockquote>

                      <br>

                      <br>

                    </blockquote>

                  </div>

                  <br>

                </div>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

        <br clear="all">

        <br>

        -- <br>

        Ali A. Minai, Ph.D.<br>

        Professor<br>

        Complex Adaptive Systems Lab<br>

        Department of Electrical Engineering & Computing Systems<br>

        University of Cincinnati<br>

        Cincinnati, OH 45221-0030<br>

        <br>

        Phone: (513) 556-4783<br>

        Fax: (513) 556-7326<br>

        Email: <a moz-do-not-send="true" href="mailto:Ali.Minai@uc.edu"

          target="_blank">Ali.Minai@uc.edu</a><br>

                  <a moz-do-not-send="true"

          href="mailto:minaiaa@gmail.com" target="_blank">minaiaa@gmail.com</a><br>

        <br>

        WWW: <a moz-do-not-send="true"

          href="http://www.ece.uc.edu/%7Eaminai/" target="_blank">http://www.ece.uc.edu/~aminai/</a>

      </div>

    </blockquote>

    <br>

  </body>

</html>