Preprint available

Thu Nov 25 11:55:04 EST 1999

Dear Colleagues

The technical report below is available from our website

    http://www.cs.sun.ac.za/projects/tech_reports/US-CS-TR-99-14.ps.gz

We welcome any comments you may have.

With kind regards,

Christian

Christian W. Omlin                 e-mail: omlin at cs.sun.ac.za
Department of Computer Science     phone (direct): +27-21-808-4308
University of Stellenbosch         phone (secretary): +27-21-808-4232
Private Bag X1                     fax: +27-21-808-4416
Stellenbosch 7602                  http://www.cs.sun.ac.za/people/staff/omlin
SOUTH AFRICA                       http://www.neci.nj.nec.com/homepages/omlin 

------------------------------- cut here ------------------------------

                 What Inductive Bias Gives Good
              Neural Network Training Performance?

		    S. Snyders   C.W. Omlin
	         Department of Computer Science
	           University of Stellenbosch
	               7602 Stellennbosch
                         South Africa
             E-mail: {snyders,omlin}@cs.sun.ac.za

		          ABSTRACT

There  has  been an increased interest in the use of prior knowl-
edge for training neural networks. Prior knowledge in the form of
Horn  clauses  has  been  the predominant paradigm for knowledge-
based neural networks.  Given a set of training examples  and  an
initial  domain theory, a neural network is constructed that fits
the training examples by preprogramming some of the weights.  The
initialized  neural network is then trained using backpropagation
to refine the knowledge.  The prior knowledge presumably  defines
a  good  starting point in weight space and provides an inductive
bias leading to faster  convergence;  it  overrides  backpropaga-
tion's  bias  toward  a  smooth  interpolation resulting in small
weights. This paper proposes  a  heuristic  for  determining  the
strength of the inductive bias by making use of gradient informa-
tion in weight space in the direction of the programmed  weights.
The  network starts its search in weight space where the gradient
is maximal thus speeding-up convergence.  Tests  on  a  benchmark
problem  from molecular biology demonstrate that our heuristic on
average reduces the training time by 60%  compared  to  a  random
choice of the strength of the inductive bias; this performance is
within 20% of the training time that can be achieved with optimal
inductive  bias.  The difference in generalization performance is
not statistically significant.