LSTM paper announcement

Josef Hochreiter hochreit at informatik.tu-muenchen.de
Mon Dec 30 06:32:48 EST 1996



                        LONG SHORT-TERM MEMORY

  Sepp Hochreiter, TUM                     Juergen Schmidhuber, IDSIA

  Substantially  revised  and  extended  Version 3.0 of TR FKI-207-95
  (32 pages 130 KB; formerly 8 pages 50 KB), with numerous additional
  experiments and details. 

Abstract.    Learning to store information over extended time intervals
via  recurrent  backpropagation  takes a very long time,  mostly due to
insufficient, decaying error back flow.  We briefly review Hochreiter's
1991 analysis of this problem,  then address it by introducing a novel,
efficient method called "Long Short-Term Memory" (LSTM). LSTM can learn
to bridge time lags in excess of 1000 steps by enforcing constant error
flow through  "constant error carrousels"  (CECs) within special units.
Multiplicative gate units learn to open and close access to CEC. LSTM's
update  complexity  per  time step is  O(W), where  W  is the number of
weights.  In comparisons with RTRL, BPTT, Recurrent Cascade-Correlation,
Elman nets, and  Neural  Sequence  Chunking,  LSTM  leads to many  more
successful runs, and learns much faster.  LSTM also solves complex long  
time lag tasks that  have never been  solved  by previous recurrent net
algorithms.  LSTM works with local, distributed, real-valued, and noisy
pattern representations.


Recent spin-off papers:     

LSTM can solve hard long time lag problems.   To appear in NIPS 9,  MIT 
Press, Cambridge MA, 1997.

Bridging long time lags by weight guessing and "Long Short-Term Memory".  
In F. L. Silva, J. C. Principe, L. B. Almeida, eds.,  Frontiers in Arti-
ficial Intelligence and Applications, Volume 37, pages 65-72, IOS Press,
Amsterdam, Netherlands, 1996.

_______________________________________________________________________

WWW/FTP pointers:                 

       ftp://flop.informatik.tu-muenchen.de/pub/fki/fki-207-95rev.ps.gz
                              ftp://ftp.idsia.ch/pub/juergen/lstm.ps.gz

For additional recurrent net papers see our home pages.   For instance, 
the original analysis  of  recurrent nets' error flow and long time lag 
problems is in Sepp's 1991 thesis (p. 19-21).

               http://www7.informatik.tu-muenchen.de/~hochreit/pub.html
                            http://www.idsia.ch/~juergen/onlinepub.html
Happy new year!

Sepp & Juergen

PS:    Why don't you stop by at IDSIA and give a talk next time you are
       near Switzerland or Italy?





More information about the Connectionists mailing list