Overfitting in learning discrete patterns
john kolen
kolen-j at cis.ohio-state.edu
Sun Mar 6 10:39:14 EST 1994
Fabien.Moutarde at aar.alcatel-alsthom.fr wrote:
I would like to know how were the weights initialized ?
Were they taken from uniform distribution in some fixed
interval whatever the network architecture ? Which interval ?
You are asking the right questions. Are you aware of (Kolen &
Pollack, 1990) which explores the effects of initial weights on
back propagation?
if you begin learning with some neurons already in their non
linear regime somewhere in learning space, then the initial
function realized by the network is not smooth, and the
irregularities are likely to remain between learning points and
to produce overfitting. This implies that the bigger the
network, the lower the initial weights should be.
The last sentence does not necessarily follow from the previous
line. The magnitude of the weights is less important than the
magnitude of the *net input* reaching the unit. For instance, if
the network operates in an environment in which there are between
unit correllations in the input, then large magnitude weights
can effectively become small magnitude weights from the
perspective of the nonlinear squashing function. In this
situation, I would predict that large weights actually help in
the distribution of error to the previous layer.
John Kolen
References
J. F. Kolen and J. B. Pollack, 1990. Backpropagation is
Sensitive to Initial Conditions. _Complex Systems_. 4:3. pg
269-280. Available from neuroprose as kolen.bpsic.*.ps.Z (8
files).
More information about the Connectionists
mailing list