LSTM recurrent nets, PhD Thesis, Papers & Code

Fri Sep 14 10:40:09 EDT 2001

Dear Connectionists,

I am glad to announce my PhD thesis on Long Short-Term Memory (LSTM)
in Recurrent Neural Networks (RNNs), several LSTM papers, and LSTM
source code.

Felix Gers, IDSIA                                          www.idsia.ch

-------------------------------PHD THESIS------------------------------

Long Short-Term Memory in Recurrent Neural Networks:

http://www.idsia.ch/~felix/My_papers/phd.ps.gz
http://www.idsia.ch/~felix/My_papers/phd.pdf

On-line abstract:
http://www.idsia.ch/~felix/My_papers/phd/node3.html

-------------------------------JOURNAL PAPERS--------------------------

F. A. Gers, J. Schmidhuber, and F. Cummins.  Learning to forget:
Continual Prediction with LSTM. Neural Computation, 2000.

http://www.idsia.ch/~felix/My_papers/FgGates-NC.ps.gz
http://www.idsia.ch/~felix/My_papers/FgGates-NC.pdf

Abstract. Long Short-Term Memory (LSTM, Hochreiter & Schmidhuber, 1997)
can solve numerous tasks not solvable by previous learning algorithms
for recurrent neural networks (RNNs). We identify a weakness of LSTM
networks processing continual input streams that are not a priori
segmented into subsequences with explicitly marked ends at which the
network's internal state could be reset.  Without resets, the state
may grow indefinitely and eventually cause the network to break down.
Our remedy is a novel, adaptive "forget gate" that enables an LSTM cell
to learn to reset itself at appropriate times, thus releasing internal
resources.  We review illustrative benchmark problems on which standard
LSTM outperforms other RNN algorithms. All algorithms (including LSTM)
fail to solve continual versions of these problems. LSTM with forget
gates, however, easily solves them in an elegant way.

---

F. A. Gers and J. Schmidhuber.  LSTM recurrent networks learn simple
context free and context sensitive languages. IEEE Transactions on
Neural Networks, 2001.

  http://www.idsia.ch/~felix/My_papers/L-IEEE.ps.gz
  http://www.idsia.ch/~felix/My_papers/L-IEEE.pdf

Abstract. Previous work on learning regular languages from exemplary
training sequences showed that Long Short-Term Memory (LSTM) outperforms
traditional recurrent neural networks (RNNs).  Here we demonstrate
LSTM's superior performance on context free language (CFL) benchmarks
for recurrent neural networks (RNNs), and show that it works even
better than previous hardwired or highly specialized architectures.
To the best of our knowledge, LSTM variants are also the first RNNs to
learn a simple context sensitive language (CSL), namely a^n b^n c^n.

-------------------------------OTHER PAPERS----------------------------

Numerous additional LSTM conference papers and TRs available at:

http://www.idsia.ch/~felix/Publications.html

-------------------------------LSTM CODE-------------------------------

C++ and Matlab code of the LSTM algorithm available at:

http://www.idsia.ch/~felix/SourceCode_Data.html

-------------------------------PhD POSITION----------------------------

New LSTM PhD position at IDSIA:

http://www.idsia.ch/~juergen/phd2001.html

-----------------------------------------------------------------------