some questions on training neural nets...

Thu Feb 3 11:40:51 EST 1994

 Charles X. Ling writes:   

>   Hi neural net experts,
>
>   I am using backprop (and variations of it) quite often although I have
>   not followed neural net (NN) research as well as I wanted. Some rather 
>   basic issues in training NN still puzzle me a lot, and I hope to get advice 
>   and help from the experts in the area. Sorry for being ignorant....

In addition to Tom's pertinent comments, (tgd at chert.cs.orst.edu, Thu Feb 3) I 
would suggest consulting the following references which contain discussions
of various issues pretaining to /model selection/overfitting/stopped training/
complexity control/bias variance dilema.  (This list is by no means
complete).  References 2), 4), 13), 15) and 17) are particularly relevant
to the questions raised.

1)  Baldi, P. and Chauvin, Y. (1991). Temporal evolution of generalization during learning in linear networks,  {\it Neural Computation} 3, 589-603.                 
2) Finnoff, W., Hergert, F. and Zimmermann, H.G., 
Improving generalization performance by  nonconvergent model selection methods,  {\it Neural Networks}, vol.6, nr.6, pp. 771-783, 1993. 

3)  Finnoff, W. and  Zimmermann, H.G. (1991). Detecting structure in small datasets by network fitting under complexity constraints. To appear in {\it Proc. of 2nd Ann. Workshop on Computational Learning Theory and Natural Learning Systems}, Berkley.                                

4) Geman, S., Bienenstock, E. and Doursat R., (1992). Neural networks and the bias/variance dilemma, {\it Neural Computation} 4, 1-58.

5)  Guyon, I., Vapnik, V., Boser, B., Bottou, L. and Solla, S. (1992). Structural risk minimization for character recognition. In  J. Moody, J. Hanson and R. Lippmann (Eds.), {\it Advances in  Neural Information Processing Systems IV} (pp. 471-479). San Mateo: Morgan Kaufman.  

6) Hanson, S. J., and Pratt, L. Y. (1989). Comparing biases for minimal network construction with back-propagation, In  D. S. Touretzky, (Ed.), {\it Advances in Neural Information Processing I} (pp.177-185). San Mateo: Morgan Kaufman.

7)  Hergert, F., Finnoff, W. and Zimmermann, H.G. (1992). A comparison of weight elimination methods                                                           for reducing complexity in neural networks.  {\it Proc. Int. Joint Conf. on Neural Networks}, Baltimore.

8)   Hergert, F., Zimmermann, H.G., Kramer, U., and Finnoff,  W. (1992).
Domain independent testing and performance comparisons for neural networks.  In I. Aleksander and J. Taylor (Eds.) {\it  Artificial Neural Networks II} (pp.1071-1076). London: North Holland.

9)  Le Cun, Y., Denker J. and Solla, S. (1990). Optimal Brain Damage.  In D. Touretzky (Ed.) {\it Advances in  Neural Information Processing Systems II} (pp.598-605).   San Mateo: Morgan Kaufman.

10)  MacKay, D. (1991). {\it Bayesian Modelling and Neural Networks},  Dissertation, Computational and Neural Systems, California Inst. of Tech. 139-74, Pasadena.    

11)  Moody, J. (1992). Generalization, weight decay and architecture selection for nonlinear learning systems. In  J. Moody, J. Hanson and R. Lippmann (Eds.), {\it Advances in  Neural Information Processing Systems IV} (pp. 471-479). San Mateo: Morgan Kaufman. 

12) Morgan, N. and Bourlard, H. (1990).  Generalization and parameter estimation in feedforward nets: Some experiments.  In D. Touretzky (Ed.) {\it Advances in  Neural Information Processing Systems II} (pp.598-605). San Mateo: Morgan Kaufman.  

13) Sj\"oberg, J. and  Ljung, L. (1992). Overtraining, regularization and searching for minimum in neural networks, {Report LiTH-ISY-I-1297, Dep. of Electrical Engineering}, Link\"oping University, S-581 83 Link\"oping, Sweden.                                       
14) Stone, C.J. (1977).  Cross-validation: A review. {\it Math. Operations res. Statist. Ser.}, 9, 1-51.

15)  Vapnik, V. (1992). Principles of risk minimization for learning theory.  In  J. Moody, J. Hanson and R. Lippmann (Eds.), {\it Advances in  Neural Information Processing Systems IV} (pp. 831-838 ). San Mateo: Morgan Kaufman.                                 

16)  Weigend, A. and  Rumelhart, D. (1991). The effective dimension of the space of hidden units, in {\it Proc. Int. Joint Conf. on Neural Networks}, Singapore. 
17)  Weigend, A.,  Rumelhart, D., and Huberman, B. (1991). Generalization by weight elimination with application to forecasting.  In  R. Lippman, J. Moody and D. Touretzy (Eds.), {\it Advances in Neural Information Processing III} (pp.875-882).  San Mateo:  Morgan Kaufman.                                

18) White, H. (1989). Learning in artificial neural networks: A statistical perspective, {\it Neural Computation} 1, 425-464.

-William

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

William Finnoff
Prediction Co.
320 Aztec St., Suite B
Santa Fe, NM, 87501, USA

Tel.: (505)-984-3123
Fax:  (505)-983-0571

e-mail: finnoff at predict.com