LSTM paper announcement
Josef Hochreiter
hochreit at informatik.tu-muenchen.de
Mon Dec 30 06:32:48 EST 1996
LONG SHORT-TERM MEMORY
Sepp Hochreiter, TUM Juergen Schmidhuber, IDSIA
Substantially revised and extended Version 3.0 of TR FKI-207-95
(32 pages 130 KB; formerly 8 pages 50 KB), with numerous additional
experiments and details.
Abstract. Learning to store information over extended time intervals
via recurrent backpropagation takes a very long time, mostly due to
insufficient, decaying error back flow. We briefly review Hochreiter's
1991 analysis of this problem, then address it by introducing a novel,
efficient method called "Long Short-Term Memory" (LSTM). LSTM can learn
to bridge time lags in excess of 1000 steps by enforcing constant error
flow through "constant error carrousels" (CECs) within special units.
Multiplicative gate units learn to open and close access to CEC. LSTM's
update complexity per time step is O(W), where W is the number of
weights. In comparisons with RTRL, BPTT, Recurrent Cascade-Correlation,
Elman nets, and Neural Sequence Chunking, LSTM leads to many more
successful runs, and learns much faster. LSTM also solves complex long
time lag tasks that have never been solved by previous recurrent net
algorithms. LSTM works with local, distributed, real-valued, and noisy
pattern representations.
Recent spin-off papers:
LSTM can solve hard long time lag problems. To appear in NIPS 9, MIT
Press, Cambridge MA, 1997.
Bridging long time lags by weight guessing and "Long Short-Term Memory".
In F. L. Silva, J. C. Principe, L. B. Almeida, eds., Frontiers in Arti-
ficial Intelligence and Applications, Volume 37, pages 65-72, IOS Press,
Amsterdam, Netherlands, 1996.
_______________________________________________________________________
WWW/FTP pointers:
ftp://flop.informatik.tu-muenchen.de/pub/fki/fki-207-95rev.ps.gz
ftp://ftp.idsia.ch/pub/juergen/lstm.ps.gz
For additional recurrent net papers see our home pages. For instance,
the original analysis of recurrent nets' error flow and long time lag
problems is in Sepp's 1991 thesis (p. 19-21).
http://www7.informatik.tu-muenchen.de/~hochreit/pub.html
http://www.idsia.ch/~juergen/onlinepub.html
Happy new year!
Sepp & Juergen
PS: Why don't you stop by at IDSIA and give a talk next time you are
near Switzerland or Italy?
More information about the Connectionists
mailing list