TR announcment - long-term dependencies

Tue Jul 18 18:27:57 EDT 1995

The following Technical Report is available via the University of Maryland 
Department of Computer Science and the NEC Research Institute archives:

_____________________________________________________________________________

          LEARNING LONG-TERM DEPENDENCIES IS NOT AS DIFFICULT 
                  WITH NARX RECURRENT NEURAL NETWORKS

Technical Report UMIACS-TR-95-78 and CS-TR-3500, Institute for 
Advanced Computer Studies, University of Maryland, College Park, MD 
20742

     Tsungnan Lin{1,2}, Bill G. Horne{1}, Peter Tino{1,3}, C. Lee Giles{1,4}

  {1}NEC Research Institute, 4 Independence Way, Princeton, NJ 08540
  {2}Department of Electrical Engineering, Princeton University, Princeton, 
     NJ 08540
  {3}Dept. of Computer Science and Engineering, Slovak Technical University, 
     Ilkovicova 3, 812 19 Bratislava, Slovakia
  {4}UMIACS, University of Maryland, College Park, MD 20742

                             ABSTRACT

It has recently been shown that gradient descent learning algorithms for 
recurrent neural networks can perform poorly on tasks that involve long-
term dependencies, i.e. those problems for which the desired output 
depends on inputs presented at times far in the past. 

In this paper we explore the long-term dependencies problem for a class of 
architectures called NARX recurrent neural networks, which have power
ful representational capabilities. We have previously reported that gradient 
descent learning is more effective in NARX networks than in recurrent 
neural network architectures that have ``hidden states'' on problems includ
ing grammatical inference and nonlinear system identification. Typically, 
the network converges much faster and generalizes better than other net
works. The results in this paper are an attempt to explain this phenomenon. 

We present some experimental results which show that NARX networks 
can often retain information for two to three times as long as conventional 
recurrent neural networks. We show that although NARX networks do not 
circumvent the problem of long-term dependencies, they can greatly 
improve performance on long-term dependency problems.

We also describe in detail some of the assumption regarding what it means 
to latch information robustly and suggest possible ways to loosen these 
assumptions.

----------------------------------------------------------------------------------

----------------------------------------------------------------------------------

http://www.neci.nj.nec.com/homepages/giles.html
http://www.cs.umd.edu/TRs/TR-no-abs.html

or

ftp://ftp.nj.nec.com/pub/giles/papers/UMD-CS-TR-3500.long-term.dependencies.narx.ps.Z

------------------------------------------------------------------------------------

--                                 
C. Lee Giles / NEC Research Institute / 4 Independence Way
Princeton, NJ 08540, USA / 609-951-2642 / Fax 2482
URL  http://www.neci.nj.nec.com/homepages/giles.html
==