tech report available

Tue Jun 6 06:52:25 EDT 2006

If you'd like a copy of the following tech report, please write, call,
or send e-mail to:

Kathy Farrelly
Cognitive Science, C-015
University of California, San Diego
La Jolla, CA 92093-0115
(619) 534-6773
farrelly%ics at ucsd.edu

Report Info:

         A LEARNING ALGORITHM FOR CONTINUALLY RUNNING 
                FULLY RECURRENT NEURAL NETWORKS

          Ronald J. Williams, Northeastern University
       David Zipser, University of California, San Diego

The exact form of a  gradient-following  learning  algorithm  for
completely recurrent networks running in continually sampled time
is derived. Practical learning algorithms based  on  this  result
are shown to learn complex tasks requiring recurrent connections.
In the recurrent networks studied here, any unit can be connected
to  any  other,  and  any  unit can receive external input. These
networks run continually in the  sense  that  they  sample  their
inputs  on  every  update cycle, and any unit can have a training
target on any cycle. The storage required and computation time on
each  step  are independent of time and are completely determined
by the size of the network, so no prior knowledge of the temporal
structure of the task being learned is required. The algorithm is
nonlocal in the sense that each unit must have knowledge  of  the
complete  recurrent weight matrix and error vector. The algorithm
is computationally intensive in sequential computers, requiring a
storage  capacity  of  order the 3rd power of the number of units
and computation time on each cycle of order  the  4th  power  the
number  of  units.  The  simulations  include  examples  in which
networks are taught tasks not possible with tapped  delay  lines;
that  is,  tasks that require the preservation of state. The most
complex example of this kind is  learning  to  emulate  a  Turing
machine  that  does a parenthesis balancing problem. Examples are
also given of networks  that  do  feedforward  computations  with
unknown  delays,  requiring  them  to  organize  into the correct
number of layers. Finally, examples are given in  which  networks
are  trained  to  oscillate in various ways, including sinusoidal
oscillation.