Why does the error rise in a SRN?

Tue Jun 6 06:52:25 EDT 2006

Gary writes:
>  
>  Yes, it seems that Elman nets can't learn in batch mode.
>

I have tried recurrent networks with Elman-structure, but with complete
gradient descent through time. This was done on a couple of problems
including Morse code recognition, handwritten digit recognition,
prediction of a ball trajectory. I used the Connection Machine, batch
mode, and a very small learning rate (things are fast on a Connection
Machine), and I did not observe that the error on the training set
started to increase. However, I did observe that the networks often
converged to useless local minima. Finding a meaningful representation for
the context layer seems to be an order of magnitude more difficult than
identifying weight and biases in a feed-forward network.

Sebastian