Connectionists: Deep Belief Nets (2006) / Neural History Compressor (1991) or Hierarchical Temporal Memory

Mon Feb 10 10:26:59 EST 2014

John,

perhaps your view is a bit too pessimistic. Note that a single RNN already is a general computer. In principle, dynamic RNNs can map arbitrary observation sequences to arbitrary computable sequences of motoric actions and internal attention-directing operations, e.g., to process cluttered scenes, or to implement development (the examples you mentioned). From my point of view, the main question is how to exploit this universal potential through learning. A stack of dynamic RNN can sometimes facilitate this. What it learns can later be collapsed into a single RNN [3].

Juergen

http://www.idsia.ch/~juergen/whatsnew.html

On Feb 7, 2014, at 12:54 AM, Juyang Weng <weng at cse.msu.edu> wrote:

> Juergen:
> 
> You wrote: A stack of recurrent NN.  But it is a wrong architecture as far as the brain is concerned.
> 
> Although my joint work with Narendra Ahuja and Thomas S. Huang at UIUC was probably the first
> learning network that used the deep Learning idea for learning from clutter scenes (Cresceptron ICCV 1992 and IJCV 1997),
> I gave up this static deep learning idea later after we considered the Principle 1: Development.
> 
> The deep learning architecture is wrong for the brain.  It is too restricted, static in architecture, and cannot learn directly from cluttered scenes required by Principle 1.  The brain is not a cascade of recurrent NN.
> 
> I quote from Antonio Damasio "Decartes' Error": p. 93: "But intermediate communications occurs also via large subcortical nuclei such as those in the thalamas and basal ganglia, and via small nulei such as those in the brain stem."
> 
> Of course, the cerebral pathways themselves are not a stack of recurrent NN either.
> 
> There are many fundamental reasons for that.  I give only one here base on our DN brain model:  Looking at a human, the brain must dynamically attend the tip of the nose, the entire nose, the face, or the entire human body on the fly.  For example, when the network attend the nose, the entire human body becomes the background!  Without a brain network that has both shallow and deep connections (unlike your stack of recurrent NN), your network is only for recognizing a set of static patterns in a clean background.  This is still an overworked pattern recognition problem, not a vision problem.
> 
> -John
> 
> On 2/6/14 7:24 AM, Schmidhuber Juergen wrote:
>> Deep Learning in Artificial Neural Networks (NN) is about credit assignment across many subsequent computational stages, in deep or recurrent NN.
>> 
>> A popluar Deep Learning NN is the Deep Belief Network (2006) [1,2].  A stack of feedforward NN (FNN) is pre-trained in unsupervised fashion. This can facilitate subsequent supervised learning.
>> 
>> Let me re-advertise a much older, very similar, but more general, working Deep Learner of 1991. It can deal with temporal sequences: the Neural Hierarchical Temporal Memory or Neural History Compressor [3]. A stack of recurrent NN (RNN) is pre-trained in unsupervised fashion. This can greatly facilitate subsequent supervised learning.
>> 
>> The RNN stack is more general in the sense that it uses sequence-processing RNN instead of FNN with unchanging inputs. In the early 1990s, the system was able to learn many previously unlearnable Deep Learning tasks, one of them requiring credit assignment across 1200 successive computational stages [4].
>> 
>> Related developments: In the 1990s there was a trend from partially unsupervised [3] to fully supervised recurrent Deep Learners [5]. In recent years, there has been a similar trend from partially unsupervised to fully supervised systems. For example, several recent competition-winning and benchmark record-setting systems use supervised LSTM RNN stacks [6-9].
>> 
>> 
>> References:
>> 
>> [1] G. E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 2006. http://www.cs.toronto.edu/~hinton/science.pdf
>> 
>> [2] G. W. Cottrell. New Life for Neural Networks. Science, Vol. 313. no. 5786, pp. 454-455, 2006. http://www.academia.edu/155897/Cottrell_Garrison_W._2006_New_life_for_neural_networks
>> 
>> [3] J. Schmidhuber. Learning complex, extended sequences using the principle of history compression, Neural Computation, 4(2):234-242, 1992. (Based on TR FKI-148-91, 1991.)  ftp://ftp.idsia.ch/pub/juergen/chunker.pdf  Overview: http://www.idsia.ch/~juergen/firstdeeplearner.html
>> 
>> [4] J. Schmidhuber. Habilitation thesis, TUM, 1993. ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf . Includes an experiment with credit assignment across 1200 subsequent computational stages for a Neural Hierarchical Temporal Memory or History Compressor or RNN stack with unsupervised pre-training [2] (try Google Translate in your mother tongue): http://www.idsia.ch/~juergen/habilitation/node114.html
>> 
>> [5] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. Based on TR FKI-207-95, 1995.  ftp://ftp.idsia.ch/pub/juergen/lstm.pdf . Lots of of follow-up work on LSTM under http://www.idsia.ch/~juergen/rnn.html
>> 
>> [6] S. Fernandez, A. Graves, J. Schmidhuber. Sequence labelling in structured domains with hierarchical recurrent neural networks. In Proc. IJCAI'07, p. 774-779, Hyderabad, India, 2007.  ftp://ftp.idsia.ch/pub/juergen/IJCAI07sequence.pdf
>> 
>> [7] A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. NIPS'22, p 545-552, Vancouver, MIT Press, 2009.  http://www.idsia.ch/~juergen/nips2009.pdf
>> 
>> [8] 2009: First very deep (and recurrent) learner to win international competitions with secret test sets: deep LSTM RNN (1995-) won three connected handwriting contests at ICDAR 2009 (French, Arabic, Farsi), performing simultaneous segmentation and recognition.  http://www.idsia.ch/~juergen/handwriting.html
>> 
>> [9] A. Graves, A. Mohamed, G. E. Hinton. Speech Recognition with Deep Recurrent Neural Networks. ICASSP 2013, Vancouver, 2013.   http://www.cs.toronto.edu/~hinton/absps/RNN13.pdf
>> 
>> 
>> 
>> Juergen Schmidhuber
>> http://www.idsia.ch/~juergen/whatsnew.html
> 
> -- 
> --
> Juyang (John) Weng, Professor
> Department of Computer Science and Engineering
> MSU Cognitive Science Program and MSU Neuroscience Program
> 428 S Shaw Ln Rm 3115
> Michigan State University
> East Lansing, MI 48824 USA
> Tel: 517-353-4388
> Fax: 517-432-1061
> Email: weng at cse.msu.edu
> URL: http://www.cse.msu.edu/~weng/
> ----------------------------------------------
>