Connectionists: Deep Belief Nets (2006) / Neural History Compressor (1991) or Hierarchical Temporal Memory

Thu Feb 6 07:24:03 EST 2014

Deep Learning in Artificial Neural Networks (NN) is about credit assignment across many subsequent computational stages, in deep or recurrent NN.

A popluar Deep Learning NN is the Deep Belief Network (2006) [1,2].  A stack of feedforward NN (FNN) is pre-trained in unsupervised fashion. This can facilitate subsequent supervised learning.

Let me re-advertise a much older, very similar, but more general, working Deep Learner of 1991. It can deal with temporal sequences: the Neural Hierarchical Temporal Memory or Neural History Compressor [3]. A stack of recurrent NN (RNN) is pre-trained in unsupervised fashion. This can greatly facilitate subsequent supervised learning.

The RNN stack is more general in the sense that it uses sequence-processing RNN instead of FNN with unchanging inputs. In the early 1990s, the system was able to learn many previously unlearnable Deep Learning tasks, one of them requiring credit assignment across 1200 successive computational stages [4]. 

Related developments: In the 1990s there was a trend from partially unsupervised [3] to fully supervised recurrent Deep Learners [5]. In recent years, there has been a similar trend from partially unsupervised to fully supervised systems. For example, several recent competition-winning and benchmark record-setting systems use supervised LSTM RNN stacks [6-9].

References:

[1] G. E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 2006. http://www.cs.toronto.edu/~hinton/science.pdf

[2] G. W. Cottrell. New Life for Neural Networks. Science, Vol. 313. no. 5786, pp. 454-455, 2006. http://www.academia.edu/155897/Cottrell_Garrison_W._2006_New_life_for_neural_networks

[3] J. Schmidhuber. Learning complex, extended sequences using the principle of history compression, Neural Computation, 4(2):234-242, 1992. (Based on TR FKI-148-91, 1991.)  ftp://ftp.idsia.ch/pub/juergen/chunker.pdf  Overview: http://www.idsia.ch/~juergen/firstdeeplearner.html

[4] J. Schmidhuber. Habilitation thesis, TUM, 1993. ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf . Includes an experiment with credit assignment across 1200 subsequent computational stages for a Neural Hierarchical Temporal Memory or History Compressor or RNN stack with unsupervised pre-training [2] (try Google Translate in your mother tongue): http://www.idsia.ch/~juergen/habilitation/node114.html 

[5] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. Based on TR FKI-207-95, 1995.  ftp://ftp.idsia.ch/pub/juergen/lstm.pdf . Lots of of follow-up work on LSTM under http://www.idsia.ch/~juergen/rnn.html

[6] S. Fernandez, A. Graves, J. Schmidhuber. Sequence labelling in structured domains with hierarchical recurrent neural networks. In Proc. IJCAI'07, p. 774-779, Hyderabad, India, 2007.  ftp://ftp.idsia.ch/pub/juergen/IJCAI07sequence.pdf

[7] A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. NIPS'22, p 545-552, Vancouver, MIT Press, 2009.  http://www.idsia.ch/~juergen/nips2009.pdf

[8] 2009: First very deep (and recurrent) learner to win international competitions with secret test sets: deep LSTM RNN (1995-) won three connected handwriting contests at ICDAR 2009 (French, Arabic, Farsi), performing simultaneous segmentation and recognition.  http://www.idsia.ch/~juergen/handwriting.html

[9] A. Graves, A. Mohamed, G. E. Hinton. Speech Recognition with Deep Recurrent Neural Networks. ICASSP 2013, Vancouver, 2013.   http://www.cs.toronto.edu/~hinton/absps/RNN13.pdf

Juergen Schmidhuber
http://www.idsia.ch/~juergen/whatsnew.html