<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix"><br>

      For more work from the early 1990s that aims to bridge the gap

      between symbolic grammars and neurally-inspired models see<br>

      <a

href="http://www1.icsi.berkeley.edu/ftp/global/global/pub/ai/stolcke/cogsci91.pdf">Syntactic

        category formation with vector space grammars</a> (Cogsci

      conference, 1991).  Since grammars potentially generate deep

      structures I suppose you can say that what you get is a case of

      "deep nets".<br>

      <br>

      Cheers,<br>

      <br>

      Andreas<br>

      <br>

      On 3/15/2015 11:22 AM, Lucas, Simon M wrote:<br>

    </div>

    <blockquote

cite="mid:DB3PR06MB201861E5B36F6F8BE8A8793EC050@DB3PR06MB201.eurprd06.prod.outlook.com"

      type="cite">

      <pre wrap="">Hi Juergen,

In your excellent survey there is one type of deep

learning that you've missed out, but I'd argue should

be included for completeness: work on Grammar-Based 

Neural Networks.

Baker (1979) showed how the Forward / Backward

algorithm used to train HMMs could be generalised

to train stochastic context free grammars (representing

structures of arbitrary depth).  He called this the Inside / Outside 

algorithm.  Problem: the algorithm scaled O(n^3 m^3) 

where n = length of input, m = number of non-terminals.  

(Lorraine Dodd had a nice paper in Speech '88 showing

how the I/O algorithm could be used to learn the spelling

structure of English words)

What's this got to do with Deep NNs?

Lucas (PhD Thesis, 1991) and Lucas and Damper, Connection Science, 1990

showed how these grammars  could be mapped on to multi-layered neural 

nets, and the training / recognition algorithms could be made more efficient 

by specialising the structure in particular ways.   

This area would be worth another look given the massively more

powerful machines we have now, and also using different activation

functions.

Simon M. Lucas, Connectionist Architectures for Syntactic Pattern Recognition", PhD Thesis, University of Southampton (1991).

Simon M. Lucas and Robert I. Damper, Syntactic neural networks, Connection Science (1990), volume 2, pages: 199 -- 225.

Lorraine Dodd, Grammatical inference for automatic speech recognition, an application of the inside/outside

algorithm to the spelling of English words, Proceedings of Speech '88, pages 1061 - 1068, Institute of Acoustics, Edinburgh.

Baker's I/O algorithm:  <a class="moz-txt-link-freetext" href="http://en.wikipedia.org/wiki/Inside%E2%80%93outside_algorithm">http://en.wikipedia.org/wiki/Inside%E2%80%93outside_algorithm</a>

Best wishes,

  Simon Lucas

Professor Simon Lucas

Head of School

Computer Science and Electronic Engineering

University of Essex, UK

-----Original Message-----

From: Connectionists [<a class="moz-txt-link-freetext" href="mailto:connectionists-bounces@mailman.srv.cs.cmu.edu">mailto:connectionists-bounces@mailman.srv.cs.cmu.edu</a>] On Behalf Of Schmidhuber Juergen

Sent: 13 March 2015 16:53

To: Connectionists List

Subject: Re: Connectionists: Who introduced the term "Deep Learning" to NNs?

Sorry, but the “semantics of what researchers nowadays call deep learning" are much older. In RNNs, the deepest of all NNs, your "strictly unsupervised followed by supervised finetuning” goes back to Schmidhuber's hierarchical deep RNN stacks of 1991 (the neural history compressors). They were largely replaced (still in the 1990s) by deep supervised LSTM RNNs. History repeated itself between 2006 and 2010, when deep unsupervised FNN stacks (kudos to Hinton et al) were replaced by deep standard supervised FNNs, as you pointed out. (It's hardly clear, however, that the re-popularization of supervised NNs wouldn't have occurred without the work on unsupervised NNs.)

Antoine Bordes' Google-generated graph seems to indicate that the usage of the term went up right after Aizenberg et al.’s book came out (2000). As Yoshua Bengio pointed out, however, it includes all kinds of ancient usages of “Deep Learning,” and is not limited to NN-specific usage in the sense of this thread. 

Again, I am just trying locate the introduction of the term. It's an interesting question in its own right, outside of when the principles of deep learning came into being. 

Juergen 

<a class="moz-txt-link-freetext" href="http://people.idsia.ch/~juergen/deep-learning-overview.html">http://people.idsia.ch/~juergen/deep-learning-overview.html</a>

</pre>

      <blockquote type="cite">

        <pre wrap="">On 13 Mar 2015, at 15:23, Marc'Aurelio Ranzato <a class="moz-txt-link-rfc2396E" href="mailto:ranzato@cs.toronto.edu"><ranzato@cs.toronto.edu></a> wrote:

Although the term has been used before, the semantics of what researchers nowadays call "deep learning" really comes from the Hinton Osindero and Teh 2006 paper. By semantics I mean: 1) types of network considered, 2) interpretation of feature hierarchy and 3) training procedure (which was strictly unsupervised followed by supervised finetuning until 2010 or so, and then just simply supervised backprop as in the older literature).

It's a re-discovery but it was anything but obvious at that time, and it seems reasoanble to me to give credit to those people who initiated this process/scientific "revolution".

Marc’Aurelio

</pre>

      </blockquote>

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">On 13 Mar 2015, at 15:15, Yoshua Bengio <a class="moz-txt-link-rfc2396E" href="mailto:yoshua.bengio@gmail.com"><yoshua.bengio@gmail.com></a> wrote:

There has been many many uses of the term 'deep learning' even before (even since 1840), but of course they are mostly not relevant the current use of the term in the machine learning community. See how many in this Google-generated graph (from Google books):

-- Yoshua Bengio

P.S. Thanks Antoine Bordes for pointing this out.

</pre>

      </blockquote>

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">On Fri, 13 Mar 2015, Juergen Schmidhuber wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">Ali,

thanks! Of course, Fukushima had deep learning nets in the 1970s, and Ivakhnenko had them in the 1960s. But they did not use the term “deep learning.”

Dechter (1986) used the term all over the place, writing not only about “deep learning”, but also “deep first-order learning” and  “second-order deep learning.” Dechter’s paper, however, is not really about NNs.

My own team started to use such terms only in the new millennium (the GECCO 2005 paper with Faustino Gomez had the word combination “learn deep” in the title, and was about deep learning in the modern sense). But apparently Aizenberg et al (2000) were really the first to use “deep learning” in an NN context.

Again: my question was not about who invented deep learning half a 

century ago, only about which publication introduced the terminology 

to NNs :-)

Juergen

<a class="moz-txt-link-freetext" href="http://people.idsia.ch/~juergen/deep-learning-overview.html">http://people.idsia.ch/~juergen/deep-learning-overview.html</a>

</pre>

          <blockquote type="cite">

            <pre wrap="">On 13 Mar 2015, at 05:17, Ali Minai <a class="moz-txt-link-rfc2396E" href="mailto:minaiaa@gmail.com"><minaiaa@gmail.com></a> wrote:

Juergen,

I would say that the instances you point out are not really examples of "deep learning" in the sense the term is being used today. The way we use it now, it refers really to "learning in deep networks", whereas "deep learning" (as opposed to "shallow learning") would mean learning something in a deep sense, e.g., at a conceptual, relational or causal level, rather than in a shallow sense, e.g., at a purely correlational level. This latter sense of "deep learning" may also be implicit in some "deep learning" models, but I don't think the "deep" today refers to this aspect of depth.

Any discussion of early "deep networks" must surely also refer to Fukushima's Neocognitron.

Ali

</pre>

          </blockquote>

          <pre wrap="">

</pre>

          <blockquote type="cite">

            <pre wrap="">

On Thu, Mar 12, 2015 at 5:35 PM, Juergen Schmidhuber <a class="moz-txt-link-rfc2396E" href="mailto:juergen@idsia.ch"><juergen@idsia.ch></a> wrote:

</pre>

          </blockquote>

          <pre wrap="">

</pre>

          <blockquote type="cite">

            <pre wrap="">Thanks. Hm, sure, “deep neural nets” are old, and Ivakhnenko’s deep nets worked well even in the 1960s. But what I’d like to know is: who was the first to use the term “deep learning” in an NN publication?

Aizenberg et al (2000) wrote about “deep learning of the features of threshold Boolean functions, one of the most important objects considered in the theory of perceptrons …”

Brian Mingus, however, pointed me to a paper by Rina Dechter (1986). Brian wrote: "Deep learning as compared to shallow learning is terminology used in the study of constraint satisfaction. Constraint satisfaction networks then became RBMs. I would argue this is a good basis for the origin of the modern usage. I like this paper for provenance: <a class="moz-txt-link-freetext" href="http://www.aaai.org/Papers/AAAI/1986/AAAI86-029.pdf">http://www.aaai.org/Papers/AAAI/1986/AAAI86-029.pdf</a> "

But perhaps the term occurred even earlier in the NN literature?

Juergen

</pre>

          </blockquote>

          <pre wrap="">

</pre>

          <blockquote type="cite">

            <blockquote type="cite">

              <pre wrap="">On 12 Mar 2015, at 21:16, Geoffrey Hinton <a class="moz-txt-link-rfc2396E" href="mailto:geoffrey.hinton@gmail.com"><geoffrey.hinton@gmail.com></a> wrote:

I think the current popularity of the term started with the paper 

by Hinton Osindero and Teh in 2006 called "A fast learning 

algorithm for deep belief nets".  After this paper there was a lot 

of talk about deep belief nets.  In about 2007 the term "deep 

belief net" started changing its meaning and was used (rather 

sloppily) to refer to deep neural nets that were pre-trained as 

deep belief nets. The term gained a lot of popularity because these 

nets were used to make good acoustic models and that triggered the 

re-introduction of neural nets into mainline speech recognizers. 

People eventually made a clear terminological distinction between 

deep belief nets (DBNs) and deep neural nets that were initialized 

as deep belief nets (DNNs or DBN-DNNs). Then they discovered that 

with large datasets and sensible initial scales for the weights the 

pre-training was not needed and they generalized DNNs to any old deep neural net.

Its clearly true that people had previously used the term deep 

neural net but that was not the origin of the resurgence of the 

term in about 2007.

Its pretty obvious by now that deep neural networks of the type 

that people were using in the 1980's work very well when they have 

enough data and enough computation, and its pretty obvious that the 

deep convnets that Yann has been using since about 1987 are deep 

neural nets, so what does it matter where the name came from?  Deep 

neural nets are finally living up to their promise so lets all enjoy it.

Geoff

</pre>

            </blockquote>

          </blockquote>

          <pre wrap="">

</pre>

          <blockquote type="cite">

            <blockquote type="cite">

              <pre wrap="">On Thu, Mar 12, 2015 at 1:58 PM, Schmidhuber Juergen <a class="moz-txt-link-rfc2396E" href="mailto:juergen@idsia.ch"><juergen@idsia.ch></a> wrote:

</pre>

            </blockquote>

          </blockquote>

          <pre wrap="">

</pre>

          <blockquote type="cite">

            <blockquote type="cite">

              <blockquote type="cite">

                <pre wrap="">Dear connectionists,

to my knowledge, the ancient term "Deep Learning" was introduced to the NN field by Aizenberg & Aizenberg & Vandewalle's book (2000): "Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications."

Is anyone aware of older NN papers using it?

(Of course, the field itself is much older - Ivakhnenko started 

his work on deep learning networks in the mid 1960s.)

Thanks!

Juergen

<a class="moz-txt-link-freetext" href="http://people.idsia.ch/~juergen/whatsnew.html">http://people.idsia.ch/~juergen/whatsnew.html</a>

</pre>

              </blockquote>

            </blockquote>

            <pre wrap="">

</pre>

          </blockquote>

          <pre wrap="">

</pre>

        </blockquote>

      </blockquote>

      <pre wrap="">

</pre>

    </blockquote>

    <br>

  </body>

</html>