Connectionists: Deep Belief Nets (2006) / Neural History Compressor (1991) or Hierarchical Temporal Memory

Mon Feb 10 18:07:11 EST 2014

It's worth mentioning given the recent direction of the conversation that
our group and others have been working for many years on the question of how
brain-like structures can be evolved artificially.  While this research
area, called neuroevolution, is at a high level abstraction, it concretely
begins to address some of the key questions being raised here about how the
messy a priori constraints around the learner itself can practically be
achieved.  While early work in neuroevolution was mainly just simple
extensions of conventional evolutionary algorithms, more recent work takes
seriously deeper issues that are closer to the interface with
neuroscientific concerns like the relationship of neural geometry to
functionality.   

Just as an example, our 2010 Neural Computation publication, "Autonomous
Evolution of Topographic Regularities in Artificial Neural Networks"
(available at
http://www.mitpressjournals.org/doi/abs/10.1162/neco.2010.06-09-1042 or
http://eplex.cs.ucf.edu/papers/gauci_nc10.pdf for the manuscript), shows how
topographic maps can emerge if neurons in an evolved network are allowed to
exist at defined locations.  (The particular algorithm is called HyperNEAT.)

While this kind of work does not immediately converge with low-level neural
models, it would be shortsighted to assume these areas will not eventually
benefit from converging.  Given the hunch that many share that much of
learning in nature is contingent on ad hoc heuristics and tendencies
implanted through eons of evolution, eventually pure learning models may
need the support of sophisticated evolutionary infrastructure to most
effectively (and realistically) take advantage of messy real world contexts.

Best Regards,

Ken Stanley

Associate Professor of Computer Science 

University of Central Florida

From: Connectionists [mailto:connectionists-bounces at mailman.srv.cs.cmu.edu]
On Behalf Of Ali Minai
Sent: Monday, February 10, 2014 4:37 PM
To: Connectionists List
Subject: Re: Connectionists: Deep Belief Nets (2006) / Neural History
Compressor (1991) or Hierarchical Temporal Memory

I think Gary's last paragraph is absolutely key. Unless we take both the
evolutionary and the developmental processes into account, we will neither
understand complex brains fully nor replicate their functionality too well
in our robots etc. We build complex robots that know nothing and then ask
them to learn complex things, setting up a hopelessly difficult learning
problem. But that isn't how animals learn, or why animals have the brains
and bodies they have. A purely abstract computational approach to neural
models makes the same category error that connectionists criticized
symbolists for making, just at a different level.

Ali

On Mon, Feb 10, 2014 at 11:38 AM, Gary Marcus <gary.marcus at nyu.edu> wrote:

Juergen and others,

I am with John on his two basic concerns, and think that your appeal to
computational universality is a red herring; I cc the entire group because I
think that these issues lay at the center of why many of the hardest
problems in AI and neuroscience continue to lay outside of reach, despite
in-principle proofs about computational universality. 

John's basic points, which I have also made before (e.g. in my books The
Algebraic Mind and The Birth of the Mind and in my periodic New Yorker
posts) are two

a. It is unrealistic to expect that hierarchies of pattern recognizers will
suffice for the full range of cognitive problems that humans (and strong AI
systems) face. Deep learning, to take one example, excels at classification,
but has thus far had relatively little to contribute to inference or natural
language understanding.  Socher et al's impressive CVG work, for instance,
is parasitic on a traditional (symbolic) parser, not a soup-to-nuts neural
net induced from input. 

b. it is unrealistic to expect that all the relevant information can be
extracted by any general purpose learning device.

Yes, you can reliably map any arbitrary input-output relation onto a
multilayer perceptron or recurrent net, but only if you know the complete
input-output mapping in advance. Alas, you can't be guaranteed to do that in
general given arbitrary subsets of the complete space; in the real world,
learners see subsets of possible data and have to make guesses about what
the rest will be like. Wolpert's No Free Lunch work is instructive here (and
also in line with how cognitive scientists like Chomsky, Pinker, and myself
have thought about the problem). For any problem, I presume that there
exists an appropriately-configured net, but there is no guarantee that in
the real world you are going to be able to correctly induce the right system
via general-purpose learning algorithm given a finite amount of data, with a
finite amount of training. Empirically, neural nets of roughly the form you
are discussing have worked fine for some problems (e.g. backgammon) but been
no match for their symbolic competitors in other domains (chess) and worked
only as an adjunct rather than an central ingredient in still others
(parsing, question-answering a la Watson, etc); in other domains, like
planning and common-sense reasoning, there has been essentially no serious
work at all.

My own take, informed by evolutionary and developmental biology, is that no
single general purpose architecture will ever be a match for the endproduct
of a billion years of evolution, which includes, I suspect, a significant
amount of customized architecture that need not be induced anew in each
generation.  We learn as well as we do precisely because evolution has
preceded us, and endowed us with custom tools for learning in different
domains. Until the field of neural nets more seriously engages in
understanding what the contribution from evolution to neural wetware might
be, I will remain pessimistic about the field's prospects.

Best,

Gary Marcus

Professor of Psychology

New York University

Visiting Cognitive Scientist

Allen Institute for Brain Science

Allen Institute for Artiificial Intelligence

co-edited book coming late 2014:

The Future of the Brain: Essays By The World's Leading Neuroscientists

 <http://garymarcus.com/> http://garymarcus.com/

On Feb 10, 2014, at 10:26 AM, Juergen Schmidhuber <juergen at idsia.ch> wrote:

John,

perhaps your view is a bit too pessimistic. Note that a single RNN already
is a general computer. In principle, dynamic RNNs can map arbitrary
observation sequences to arbitrary computable sequences of motoric actions
and internal attention-directing operations, e.g., to process cluttered
scenes, or to implement development (the examples you mentioned). From my
point of view, the main question is how to exploit this universal potential
through learning. A stack of dynamic RNN can sometimes facilitate this. What
it learns can later be collapsed into a single RNN [3].

Juergen

http://www.idsia.ch/~juergen/whatsnew.html

On Feb 7, 2014, at 12:54 AM, Juyang Weng <weng at cse.msu.edu> wrote:

Juergen:

You wrote: A stack of recurrent NN.  But it is a wrong architecture as far
as the brain is concerned.

Although my joint work with Narendra Ahuja and Thomas S. Huang at UIUC was
probably the first
learning network that used the deep Learning idea for learning from clutter
scenes (Cresceptron ICCV 1992 and IJCV 1997),
I gave up this static deep learning idea later after we considered the
Principle 1: Development.

The deep learning architecture is wrong for the brain.  It is too
restricted, static in architecture, and cannot learn directly from cluttered
scenes required by Principle 1.  The brain is not a cascade of recurrent NN.

I quote from Antonio Damasio "Decartes' Error": p. 93: "But intermediate
communications occurs also via large subcortical nuclei such as those in the
thalamas and basal ganglia, and via small nulei such as those in the brain
stem."

Of course, the cerebral pathways themselves are not a stack of recurrent NN
either.

There are many fundamental reasons for that.  I give only one here base on
our DN brain model:  Looking at a human, the brain must dynamically attend
the tip of the nose, the entire nose, the face, or the entire human body on
the fly.  For example, when the network attend the nose, the entire human
body becomes the background!  Without a brain network that has both shallow
and deep connections (unlike your stack of recurrent NN), your network is
only for recognizing a set of static patterns in a clean background.  This
is still an overworked pattern recognition problem, not a vision problem.

-John

On 2/6/14 7:24 AM, Schmidhuber Juergen wrote:

Deep Learning in Artificial Neural Networks (NN) is about credit assignment
across many subsequent computational stages, in deep or recurrent NN.

A popluar Deep Learning NN is the Deep Belief Network (2006) [1,2].  A stack
of feedforward NN (FNN) is pre-trained in unsupervised fashion. This can
facilitate subsequent supervised learning.

Let me re-advertise a much older, very similar, but more general, working
Deep Learner of 1991. It can deal with temporal sequences: the Neural
Hierarchical Temporal Memory or Neural History Compressor [3]. A stack of
recurrent NN (RNN) is pre-trained in unsupervised fashion. This can greatly
facilitate subsequent supervised learning.

The RNN stack is more general in the sense that it uses sequence-processing
RNN instead of FNN with unchanging inputs. In the early 1990s, the system
was able to learn many previously unlearnable Deep Learning tasks, one of
them requiring credit assignment across 1200 successive computational stages
[4].

Related developments: In the 1990s there was a trend from partially
unsupervised [3] to fully supervised recurrent Deep Learners [5]. In recent
years, there has been a similar trend from partially unsupervised to fully
supervised systems. For example, several recent competition-winning and
benchmark record-setting systems use supervised LSTM RNN stacks [6-9].

References:

[1] G. E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data
with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 2006.
http://www.cs.toronto.edu/~hinton/science.pdf

[2] G. W. Cottrell. New Life for Neural Networks. Science, Vol. 313. no.
5786, pp. 454-455, 2006.
http://www.academia.edu/155897/Cottrell_Garrison_W._2006_New_life_for_neural
_networks

[3] J. Schmidhuber. Learning complex, extended sequences using the principle
of history compression, Neural Computation, 4(2):234-242, 1992. (Based on TR
FKI-148-91, 1991.)  ftp://ftp.idsia.ch/pub/juergen/chunker.pdf  Overview:
http://www.idsia.ch/~juergen/firstdeeplearner.html

[4] J. Schmidhuber. Habilitation thesis, TUM, 1993.
ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf . Includes an experiment
with credit assignment across 1200 subsequent computational stages for a
Neural Hierarchical Temporal Memory or History Compressor or RNN stack with
unsupervised pre-training [2] (try Google Translate in your mother tongue):
http://www.idsia.ch/~juergen/habilitation/node114.html

[5] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural
Computation, 9(8):1735-1780, 1997. Based on TR FKI-207-95, 1995.
ftp://ftp.idsia.ch/pub/juergen/lstm.pdf . Lots of of follow-up work on LSTM
under http://www.idsia.ch/~juergen/rnn.html

[6] S. Fernandez, A. Graves, J. Schmidhuber. Sequence labelling in
structured domains with hierarchical recurrent neural networks. In Proc.
IJCAI'07, p. 774-779, Hyderabad, India, 2007.
ftp://ftp.idsia.ch/pub/juergen/IJCAI07sequence.pdf

[7] A. Graves, J. Schmidhuber. Offline Handwriting Recognition with
Multidimensional Recurrent Neural Networks. NIPS'22, p 545-552, Vancouver,
MIT Press, 2009.  http://www.idsia.ch/~juergen/nips2009.pdf

[8] 2009: First very deep (and recurrent) learner to win international
competitions with secret test sets: deep LSTM RNN (1995-) won three
connected handwriting contests at ICDAR 2009 (French, Arabic, Farsi),
performing simultaneous segmentation and recognition.
http://www.idsia.ch/~juergen/handwriting.html

[9] A. Graves, A. Mohamed, G. E. Hinton. Speech Recognition with Deep
Recurrent Neural Networks. ICASSP 2013, Vancouver, 2013.
http://www.cs.toronto.edu/~hinton/absps/RNN13.pdf

Juergen Schmidhuber
http://www.idsia.ch/~juergen/whatsnew.html

-- 
--
Juyang (John) Weng, Professor
Department of Computer Science and Engineering
MSU Cognitive Science Program and MSU Neuroscience Program
428 S Shaw Ln Rm 3115
Michigan State University
East Lansing, MI 48824 USA
Tel: 517-353-4388
Fax: 517-432-1061
Email: weng at cse.msu.edu
URL: http://www.cse.msu.edu/~weng/
----------------------------------------------

-- 
Ali A. Minai, Ph.D.
Professor
Complex Adaptive Systems Lab
Department of Electrical Engineering & Computing Systems
University of Cincinnati
Cincinnati, OH 45221-0030

Phone: (513) 556-4783
Fax: (513) 556-7326
Email: Ali.Minai at uc.edu
          minaiaa at gmail.com

WWW: http://www.ece.uc.edu/~aminai/ <http://www.ece.uc.edu/%7Eaminai/>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20140210/29330060/attachment.html>