Why does the error rise in a SRN?

Thu Apr 2 08:39:02 EST 1992

> I have been working with the Simple Recurrent Network (Elman style)
> and variants there of for some time. Something which seems to happen
> with surprising frequency is that the error will decrease for a period
> and then will start to increase again. 

> (1) Has anyone else noticed this? 

I have had the same experience. Here is an example:

lrate	lgrain		mu	momentum	epoch	tss
------------------------------------------------------------------
0.05	pattern		0.5	0.1		26	40.8941
0.05	pattern		0.5	0.1		53	29.8656
0.05	pattern		0.5	0.1		86	26.2229
0.05	pattern		0.5	0.1		391	11.6567
0.05	pattern		0.5	0.1		458	12.1636
0.05	pattern		0.5	0.1		513	14.0021

The data consist of 16 separate sequences of 700 patterns per
sequence. Thus one epoch consists of 11,200 patterns. 

> (2) Is it task dependent?

My task looks quite different from yours. I am trying to train a SRN (Elman)
to generate sensor readings for an accelerating jet engine. The sensors
include thrust, exhaust gas temperature, shaft rpm, and six others.
The sensor readings are the target outputs. The inputs are the ambient
conditions (humidity, atmospheric pressure, outside temperature, ...)
and the idle shaft rpm. All training data comes from a single jet engine
under a variety of ambient conditions. The same throttle motion is
used for each of the 16 sequences of patterns.

> (3) Why does it happen?

I don't know. I have tried lgrain (learning grain) = epoch, but then
the net does not seem to converge at all -- tss (total sum squares)
stays around 300. I have tried momentum = 0.9, but, again, tss seems
to stay around 300. I suspect -- without any real justification -- that
this phenomenon is related to catastrophic interference. I am in the
process of applying Fahlman's Recurrent Cascade-Correlation
algorithm to the same problem. I hope that RCC may work better than
SRN, in this case.