Higher-order recurrent neural networks

Tue Nov 26 18:03:57 EST 1991

More references for higher-order recurrent nets and some general comments:

John Kolen mentions:
*****************************************

Higher order recurrent networks are recurrent networks with higher order
connections, (i[1]*i[2]*w[1,2] instead of i[1]*w[1]).  An example of a
high order recurent network is Pollack's sequential cascaded networks
which appear, I believe, in the latest issue of Machine Learning.  This
network can be described as two three-dimensional matrices, W and V, and
the following equations.

        O[t] = Sigmoid( (W . S[t]) . I[t])
        S[t+1]=Sigmoid( (V . S[t]) . I[t])

where I[t] is the input vector, O[t] is the output vector, and S[t] is the
state vector, each at time t.  ( . is inner product)

**********************************************

For other references on higher-order recurrent nets, see the following:
(This list is not meant to be inclusive, but to give some
flavor of the diversity of work in this area.)

Y.C. Lee, et.al,1986, Physica D.
H.H. Chen, et.al, 1986, AIP conference proceedings on Neural Networks
	for Computing
F. Pineda, 1988, AIP conference proceedings for NIPS
Psaltis, et.al, 1988, Neural Networks.
Giles, et al. 1990, NIPS2; and 1991 IJCNN proceedings,
	Neural Computation, 1992.
Mozer and Bachrach, Machine Learning 1991
Hush, et.al., 1991 Proceedings for Neural Networks for
	Signal Processing.
Watrous and Kuhn, 1992 Neural Computation

In particular the work by Giles, et.al. describes a 2nd order 
forward-propagation RTRL to learn grammars from grammatical strings.* 
What may be of interest is that using a heuristic extraction method,
one can extract the "learned" grammar from the the recurrent network 
both during and after training. 

It's worth noting that higher-order nets usually include
sub-orders as special cases, i.e. 2nd includes 1st.
In addition, sigma-pi units are just a subset of higher-order 
models and in some cases do not have the computational representative
power of higher-order models. For example, the term (using Kolen's 
notation above)  

S[i,t] . I[j,t]

would have the same weight coefficient in the original
sigma-pi notation as the term 

S[j,t] . I[i,t].

Higher-order notation would distinguish between these terms
using the tensor weights W[k,i,j] and W[k,j,i].

*(Similar work has been done by Watrous & Kuhn and Pollack)

                                  C. Lee Giles
                                  NEC Research Institute
                                  4 Independence Way
                                  Princeton, NJ 08540
                                  USA

Internet:   giles at research.nj.nec.com
    UUCP:   princeton!nec!giles
   PHONE:   (609) 951-2642
     FAX:   (609) 951-2482