Reprint:LEARNING A CLASS OF LARGE FINITE STATE MACHINES WITH A RECURRENT

Thu Aug 4 18:00:26 EDT 1994

NEURAL NETWORK 

The following reprint is available via the University of Maryland Department 
of Computer Science and the NEC Research Institute archives:

_____________________________________________________________________________

                 Learning a Class of Large Finite State Machines
                      with a Recurrent Neural Network

        UNIVERSITY OF MARYLAND TECHNICAL REPORT UMIACS-TR-94-94 AND CS-TR-3328

                 C. L. Giles[1,2], B. G. Horne[1], T. Lin[1,3] 

      [1] NEC Research Institute, 4 Independence Way, Princeton, NJ  08540
            [2]UMIACS, University of Maryland, College Park, MD 20742
          [3] EE Department, Princeton University, Princeton, NJ 08540

                      {giles,horne,lin}@research.nj.nec.com

One of the issues in any learning model is how it scales with problem size.
Neural networks have not been immune to scaling issues. We show that a
dynamically- driven discrete-time recurrent network (DRNN) can learn rather
large grammatical inference problems when the strings of a finite memory
machine (FMM) are encoded as temporal sequences. FMMs are a subclass of
finite state machines which have a finite memory or a finite order of
inputs and outputs. The DRNN that learns the FMM is a neural network that
maps directly from the sequential machine implementation of the FMM. It has
feedback only from the output and not from any hidden units; an example is
the recurrent network of Narendra and Parthasarathy. (FMMs that have zero
order in the feedback of outputs are called definite memory machines and
are analogous to Time-delay or Finite Impulse Response neural networks.)
Due to their topology these DRNNs are as least as powerful as any
sequential machine implementation of a FMM and should be capable of
representing any FMM. We choose to learn ``particular FMMs.\' Specif
ically, these FMMs have a large number of states (simulations are for $256$
and $512$ state FMMs) but have minimal order, relatively small depth and
little logic when the FMM is implemented as a sequential machine.
Simulations for the num ber of training examples versus generalization
performance and FMM extraction size show that the number of training
samples necessary for perfect generalization is less than that necessary to
completely characterize the FMM to be learned. This is in a sense a best
case learning problem since any arbitrarily chosen FMM with a minimal
number of states would have much more order and string depth and most
likely require more logic in its sequential machine implementation.

--------------------------------------------------------------------------

                          FTP INSTRUCTIONS

                unix> ftp cs.umd.edu (128.8.128.8)
                Name: anonymous
                Password: (your_userid at your_site)
                ftp> cd pub/papers/TRs
                ftp> binary
                ftp> get 3328.ps.Z
                ftp> quit
                unix> uncompress 3328.ps.Z

                                 OR

                unix> ftp external.nj.nec.com (138.15.10.100)
                Name: anonymous
                Password: (your_userid at your_site)
                ftp> cd pub/giles/papers
                ftp> binary
                ftp> get large.fsm.ps.Z
                ftp> quit
                unix> uncompress large.fsm.ps.Z

--------------------------------------------------------------------------

--                                 
C. Lee Giles / NEC Research Institute / 4 Independence Way
Princeton, NJ 08540 / 609-951-2642 / Fax 2482
==