No subject

Thu Sep 26 01:09:05 EDT 1991

To those who use connectionist networks for sequential prediction
------------------------------------------------------------------
applications.
------------

Background:
-----------
   I have been using neural network models 
(both Feed-Forward Nets and Recurrent Nets) in a prediction
application and I am getting pretty good results. In fact
neural networks approach outperformed many well known analytic
models. Similar results have been reported by many researchers
in (chaotic) time series predictions. 

 Suppose that X is the independent variable and Y is the
dependent variable. Let (x(i),y(i)) represent a sequence 
of actual input/output values observed at 
time i = 0,1,2,..,t of a temporal process. Let further that both 
the input and the output variables are single dimensional variable and
can take on a sequence of +ve integers up to a maximum of 2000.
Once we train a network with the
history of the system up to time "t" we can use the network
to predict outputs y(t+h), h=1,..,n  for any future input x(t+h).
In my application I already have the complete sequence and
hence I know what is the maximum value for x and y.
Using these maximum I normalized both X and Y over a 0.1 to 0.9 range.
(Here I call such normalization as "scaled representation".)
Since I have the complete sequence it is possible for me to evaluate 
how good the networks' predictions are.

Now some basic issues: 
---------------------
1) How to represent these variables if we don't know in advance
what the maximum values are? 
 Scaled representation presupposes the existence of a maximum value.
 Some may suggest that a linear units can be used at the output layer
 to get rid of scaling. If so how do I represent the input variable?
 The standard sigmoidal unit(with temp = 1.0) gets saturated(or railed
 to 1.0) when the sum is >= 14. However one may suggest that changing 
 the output range of the sigmoidal can help to 
 get rid of saturation effect. Is it a correct approach?

2) In such prediction application, people (including me)
compare the predictive accuracy of neural networks with
that of parametric models(that are based on analytical reasons). 
But one main advantage with the parametric models is that
their parameters can be calculated using any of the following
parameter estimation techniques: least square,
maximum likelyhood, Bayesian, Genetic Algorithms or any other
method. These parameter estimation techniques do not require
any scaling, and hence there is no need for preguessing of the maximum values.
However with the scaled representation in neural networks one can
not proceed without making guesses about the maximum(or a future)
input and/or output. In many real life situations such guesses are
infeasible or dangerous. How do we address this situation?

____________________________________________________________________________
N.  KARUNANITHI              E-Mail: karunani at handel.CS.ColoState.EDU
Computer Science Dept,       
Colorado State University,
Collins, CO 80523.           
____________________________________________________________________________