Connectionists: generalizing language in neural networks [was Re: Computational Modeling of Bilingualism Special Issue]

Juergen Schmidhuber juergen at idsia.ch
Tue Mar 26 12:48:00 EDT 2013


More than a decade ago, Long Short-Term Memory recurrent neural  
networks (LSTM) learned certain context-free and context-sensitive  
languages that cannot be represented by finite state automata such as  
HMMs. Parts of the network became stacks or event counters.

F. A. Gers and J. Schmidhuber. LSTM Recurrent Networks Learn Simple  
Context Free and Context Sensitive Languages. IEEE Transactions on  
Neural Networks 12(6):1333-1340, 2001.

J. Schmidhuber, F. Gers, D. Eck. Learning nonregular languages: A  
comparison of simple recurrent networks and LSTM. Neural Computation,  
14(9):2039-2041, 2002.

F. Gers and J. A. Perez-Ortiz and D. Eck and J. Schmidhuber. Learning  
Context Sensitive Languages with LSTM Trained with Kalman Filters.  
Proceedings of ICANN'02, Madrid, p 655-660, Springer, Berlin, 2002.


Old slides on this:
http://www.idsia.ch/~juergen/lstm/sld028.htm

Juergen



On Mar 26, 2013, at 5:09 AM, Gary Marcus wrote:

> I posed some important challenges for language-like generalization  
> in PDP and SRN models in 1998 in an article in Cognitive Psychology,  
> with further discussion in 1999 Science article (providing data from  
> human infants), and a 2001 MIT Press book, The Algebraic Mind.
>
> For example, if one trains a standard PDP autoassociator on identity  
> with integers represented by distribution representation consisting  
> of binary digits and expose the model only to even numbers, the  
> model will not generalize to odd numbers (i.e., it will not  
> generalize identity to the least significant bit) even though  
> (depending on the details of implementation) it can generalize to  
> some new even numbers. Another way to put this is that these sort of  
> models can interpolate within some cloud around a space of training  
> examples, but can't generalize universally-quanitfied one-to-one  
> mappings outside that space.
>
> Likewise, training an Elman-style SRN with localist inputs (one  
> word, one node, as in Elman's work on SRNS) on a set of sentences  
> like "a rose is a rose" and "a tulip is a tulip" leads the model to  
> learn those individual relationships, but not to generalize to "a  
> blicket is a blicket", where blicket represents an untrained node.
>
> These problems have to do with a kind of localism that is inherent  
> in the back-propogation rule. In the 2001 book, I discuss some of  
> the ways around them, and the compromises that known workarounds  
> lead to.  I believe that some alternative kind of architecture is  
> called for.
>
> SInce the human brain is pretty quick to generalize universally- 
> quantified one-to-one-mappings, even to novel elements, and even on  
> the basis of small amounts of data, I consider these to be important  
> - but largely unsolved -- problems. The brain must do it, but we  
> still really understand how.  (J. P. Thivierge and I made one  
> suggestion in this paper in TINS.)
>
> Sincerely,
>
> Gary Marcus
>
>
> Gary Marcus
> Professor of Psychology
> New York University
> Author of Guitar Zero
> http://garymarcus.com/
> New Yorker blog
>
> On Mar 25, 2013, at 11:30 PM, Janet Wiles <janetw at itee.uq.edu.au>  
> wrote:
>
>> Recurrent neural networks can represent, and in some cases learn  
>> and generalise classes of languages beyond finite state machines.  
>> For a review, of their capabilities see the excellent edited book  
>> by Kolen and Kramer. e.g., ch 8 is on "Representation beyond finite  
>> states"; and ch9 is "Universal Computation and Super-Turing  
>> Capabilities".
>>
>> Kolen and Kramer (2001) "A Field Guide Dynamical Recurrent  
>> Networks", IEEE Press.
>>
>> From: connectionists-bounces at mailman.srv.cs.cmu.edu [mailto:connectionists-bounces at mailman.srv.cs.cmu.edu 
>> ] On Behalf Of Juyang Weng
>> Sent: Sunday, 24 March 2013 9:17 AM
>> To: connectionists at mailman.srv.cs.cmu.edu
>> Subject: Re: Connectionists: Computational Modeling of Bilingualism  
>> Special Issue
>>
>> Ping Li:
>>
>> As far as I understand, traditional connectionist architectures  
>> cannot do abstraction well as Marvin Minsky, Michael Jordan
>> and many others correctly stated.  For example, traditional neural  
>> networks cannot learn a finite automaton (FA) until recently (i.e.,
>> the proof of our Developmental Network).  We all know that FA is  
>> the basis for all probabilistic symbolic networks (e.g., Markov  
>> models)
>> but they are all not connectionist.
>>
>> After seeing your announcement, I am confused with the book title
>> "Bilingualism Special Issue: Computational Modeling of  
>> Bilingualism" but with your comment "most of the models are based  
>> on connectionist architectures."
>>
>> Without further clarifications from you, I have to predict that  
>> these connectionist architectures in the book are all grossly wrong  
>> in terms
>> of brain-capable connectionist natural language processing, since  
>> they cannot learn an FA.   This means that they cannot generalize  
>> to state-equivalent but unobserved word sequences.   Without this  
>> basic capability required for natural language processing, how can  
>> they claim connectionist natural language processing, let alone  
>> bilingualism?
>>
>> I am concerned that many papers proceed with specific problems  
>> without understanding the fundamental problems of the traditional  
>> connectionism. The fact that the biological brain is connectionist  
>> does not necessarily mean that all connectionist researchers know  
>> about the brain's connectionism.
>>
>> -John Weng
>>
>> On 3/22/13 6:08 PM, Ping Li wrote:
>> Dear Colleagues,
>>
>> A Special Issue on Computational Modeling of Bilingualism has been  
>> published. Most of the models are based on connectionist  
>> architectures.
>>
>> All the papers are available for free viewing until April 30, 2013  
>> (follow the link below to its end):
>>
>> http://cup.linguistlist.org/2013/03/bilingualism-special-issue-computational-modeling-of-bilingualism/
>>
>> Please let me know if you have difficulty accessing the above link  
>> or viewing any of the PDF files on Cambridge University Press's  
>> website.
>>
>> With kind regards,
>>
>> Ping Li
>>
>>
>> =================================================================
>> Ping Li, Ph.D. | Professor of Psychology, Linguistics, Information  
>> Sciences & Technology  |  Co-Chair, Inter-College Graduate Program  
>> in Neuroscience | Co-Director, Center for Brain, Behavior, and  
>> Cognition | Pennsylvania State University  | University Park, PA  
>> 16802, USA  |
>> Editor, Bilingualism: Language and Cognition, Cambridge University  
>> Press | Associate Editor: Journal of Neurolinguistics, Elsevier  
>> Science Publisher
>> Email: pul8 at psu.edu  | URL: http://cogsci.psu.edu
>> =================================================================
>>
>>
>>
>> -- 
>> --
>> Juyang (John) Weng, Professor
>> Department of Computer Science and Engineering
>> MSU Cognitive Science Program and MSU Neuroscience Program
>> 428 S Shaw Ln Rm 3115
>> Michigan State University
>> East Lansing, MI 48824 USA
>> Tel: 517-353-4388
>> Fax: 517-432-1061
>> Email: weng at cse.msu.edu
>> URL: http://www.cse.msu.edu/~weng/
>> ----------------------------------------------
>>
>



More information about the Connectionists mailing list