Connectionists: Deep Belief Nets (2006) / Neural History Compressor (1991) or Hierarchical Temporal Memory

Wed Feb 12 02:37:37 EST 2014

Gary, 
an RNN is a general computer. Its weights are its program. The main question is: how to find good weights? Of course, this requires additional code, namely, the program searcher or learning algorithm (a central theme of AI research) - what you call "a whole bunch of machinery". Elegant code is tiny in comparison to the set of weights that it learns. What's a good learning algorithm depends on the problem class. Summarising my posts: 1. Universal optimal program search applies to all well-defined problems and all general computers including RNN.  2. Artificial evolution is useful for reinforcement learning RNN (in partially observable environments) where plain supervised backprop is useless (last post). 3. Unsupervised deep RNN stacks (1991) can improve supervised RNN (original post).
Cheers,
Juergen

On Feb 11, 2014, at 4:06 PM, Stephen José Hanson <jose at rubic.rutgers.edu> wrote:

> Gary, Gary Gary... why does this seem like a deja-vu..?
> 
> In any case, this thread might be interested in a paper we published in
> 1996 in Neural Computation
> which used a General RNN  to learn from scratch FSMs (other later were
> able to learn FSM with 1000s of states--see Lee Giles's work ).
> 
> In our case we wanted to learn the syntax independently of the lexicon
> and then transfer between languagues, showing that the RNN was able to
> generalize *across* languages, and could bootstrap a state space that
> automatically accomodated the new Lexicon/Grammer simply through
> learning and relearning the same grammar with different leixicons.. 
> 
> take a look:  
> On the Emergence of Rules in Neural Networks
> Stephen José Hanson, Michiro Negishi, Neural Computation, 
> September 2002, Vol. 14, No. 9, Pages 2245-2268
> 
> http://www.mitpressjournals.org/doi/abs/10.1162/089976602320264079?journalCode=neco
> 
> 
> 
> 
> In any case, this genetics/learning argument is the Red Herring.. as you
> well know.
> 
> Best
> 
> Steve
> 
> On Tue, 2014-02-11 at 09:01 -0500, Gary Marcus wrote:
>> Juergen: 
>> 
>> Nice papers - but the goalposts are definitely shifting here. We started with your claim that "a single RNN already is a general computer. In principle, dynamic RNNs can map arbitrary observation sequences to arbitrary computable sequences of motoric actions and internal attention-directing operations, e.g., to process cluttered scenes”; I acknowledged the truth of the claim, but suggested that the appeal to universality was a red herring.
>> 
>> What you’ve offered in response are two rather different architectures: a lambda calculus-based learning system that makes no contact (at least in the paper I read) with RNNs at all, and an evolutionary system that uses a whole bunch of machinery besides RNNs to derive RNNs that can do the right mapping. My objection was to the notion that all you need is an RNN; by pointing to various external gadgets, you reinforce my belief that RNNs aren’t enough by themselves.
>> 
>> Of course, you are absolutely right that at some level of abstraction “evolution is another a form of learning”, but I think it behooves the field to recognize that that other form of learning is likely to have very different properties from, say, back-prop.  Evolution shapes cascades of genes that build complex cumulative systems in a distributed but algorithmic fashion; currently popular learning algorithms tune individual weights based on training examples.  To assimilate the two is to do a disservice of the evolutionary contribution.  
>> 
>> Best,
>> Gary
>> 
>> 
>> On Feb 11, 2014, at 6:21 AM, Juergen Schmidhuber <juergen at idsia.ch> wrote:
>> 
>>> Gary (Marcus), you wrote: "it is unrealistic to expect that all the relevant information can be extracted by any general purpose learning device." You might be a bit too pessimistic about general purpose systems. Unbeknownst to many NN researchers, there are _universal_ problem solvers that are time-optimal in various theoretical senses [10-12] (not to be confused with universal incomputable AI [13]). For example, there is a meta-method [10] that solves any well-defined problem as quickly as the unknown fastest way of solving it, save for an additive constant overhead that becomes negligible as problem size grows. Note that most problems are large; only few are small. (AI researchers are still in business because many are interested in problems so small that it is worth trying to reduce the overhead.)
>>> 
>>> Several posts addressed the subject of evolution (Gary Marcus, Ken Stanley, Brian Mingus, Ali Minai, Thomas Trappenberg). Evolution is another a form of learning, of searching the parameter space. Not provably optimal in the sense of the methods above, but often quite practical. It is used all the time for reinforcement learning without a teacher. For example, an RNN with over a million weights recently learned through evolution to drive a simulated car based on a high-dimensional video-like visual input stream [14,15]. The RNN learned both control and visual processing from scratch, without being aided by unsupervised techniques (which may speed up evolution by reducing the search space through compact sensory codes). 
>>> 
>>> Jim, you wrote: "this could actually be an interesting opportunity for some cross disciplinary thinking about how one would use an active sensory data acquisition controller to select the sensory data that is ideal given an internal model of the world." Well, that's what intrinsic reward-driven curiosity and attention direction is all about - reward the controller for selecting data that maximises learning/compression progress of the world model - lots of work on this since 1990 [16,17]. (See also posts on developmental robotics by Brian Mingus and Gary Cottrell.)
>>> 
>>> [10]  Marcus Hutter. The Fastest and Shortest Algorithm for All Well-Defined Problems. International Journal of Foundations of Computer Science, 13(3):431-443, 2002. (On J. Schmidhuber's SNF grant 20-61847.)
>>> [11] http://www.idsia.ch/~juergen/optimalsearch.html
>>> [12] http://www.idsia.ch/~juergen/goedelmachine.html
>>> [13] http://www.idsia.ch/~juergen/unilearn.html
>>> [14] J. Koutnik, G. Cuccu, J. Schmidhuber, F. Gomez. Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning. Proc. GECCO'13, Amsterdam, July 2013.
>>> [15] http://www.idsia.ch/~juergen/compressednetworksearch.html
>>> [16] http://www.idsia.ch/~juergen/interest.html
>>> [17] http://www.idsia.ch/~juergen/creativity.html
>>> 
>>> Juergen
>>> 
>>> 
>> 
>> 
> 
> -- 
> Stephen José Hanson
> Director RUBIC (Rutgers Brain Imaging Center)
> Professor of Psychology
> Member of Cognitive Science Center (NB)
> Member EE Graduate Program (NB)
> Member CS Graduate Program (NB)
> Rutgers University 
> 
> email: jose at psychology.rutgers.edu
> web: psychology.rutgers.edu/~jose
> lab: www.rumba.rutgers.edu
> fax: 866-434-7959
> voice: 973-353-3313 (RUBIC)
>