TR on recurrent networks and long-term dependenices

Thu Aug 15 09:49:00 EDT 1996

The following Technical Report is available via the University of 
Maryland Department of Computer Science and the NEC Research 
Institute archives:

____________________________________________________________________

             HOW EMBEDDED MEMORY IN RECURRENT NEURAL NETWORK
           ARCHITECTURES HELPS LEARNING LONG-TERM DEPENDENCIES          

Technical Report CS-TR-3626 and UMIACS-TR-96-28, Institute for 
Advanced Computer Studies, University of Maryland, College Park, MD 
20742

     Tsungnan Lin{1,2}, Bill G. Horne{1}, C. Lee Giles{1,3}

  {1}NEC Research Institute, 4 Independence Way, Princeton, NJ 08540
  {2}Department of Electrical Engineering, Princeton University, 
     Princeton, NJ 08540
  {3}UMIACS, University of Maryland, College Park, MD 20742

                             ABSTRACT

Learning long-term temporal dependencies with recurrent neural
networks can be a difficult problem.  It has recently been
shown that a class of recurrent neural networks called NARX
networks perform much better than conventional recurrent
neural networks for learning certain simple long-term dependency
problems. The intuitive explanation for this behavior is that
the output memories of a NARX network can be manifested as
jump-ahead connections in the time-unfolded network.  These
jump-ahead connections can propagate gradient information more
efficiently, thus reducing the sensitivity of the network
to long-term dependencies.

This work gives empirical justification to our
hypothesis that similar improvements in learning long-term
dependencies can be achieved with other classes of recurrent
neural network architectures simply by increasing the order of
the embedded memory.

In particular we explore the impact of learning simple long-term
dependency problems on three classes of recurrent neural networks
architectures:  globally recurrent networks, locally recurrent
networks, and NARX (output feedback) networks.

Comparing the performance of these architectures with different
orders of embedded memory on two simple long-term dependences
problems shows that all of these classes of networks
architectures demonstrate significant improvement on learning
long-term dependencies when the orders of embedded memory are
increased. These results can be important to a user comfortable
to a specific recurrent neural network architecture because
simply increasing the embedding memory order will make the
architecture more robust to the problem of long-term dependency
learning.

-------------------------------------------------------------------

KEYWORDS: discrete-time, memory, long-term dependencies, recurrent 
neural networks, training, gradient-descent

PAGES:  15                      FIGURES:  7             TABLES:  2

-------------------------------------------------------------------

http://www.neci.nj.nec.com/homepages/giles.html
http://www.cs.umd.edu/TRs/TR-no-abs.html

or

ftp://ftp.nj.nec.com/pub/giles/papers/UMD-CS-TR-3626.recurrent.arch.long.term.ps.Z

------------------------------------------------------------------------------------

--                                 
C. Lee Giles / Computer Sciences / NEC Research Institute / 
4 Independence Way / Princeton, NJ 08540, USA / 609-951-2642 / Fax 2482
www.neci.nj.nec.com/homepages/giles.html
==