Weight Decay
Yann le Cun
neural!yann
Wed Jan 25 15:13:58 EST 1989
Consider a single layer linear network with N inputs.
When the number of training pattern is smaller than N , the
set of solutions (in weight space) is a proper linear subspace.
adding weight decay will select the minimum norm solution in this subspace
(if the weight decay coefficient is decreased with time).
The minimum norm solution happens to be the solution given by the
pseudo-inverse technique (cf Kohonen), and the solution which
optimally cancels out uncorrelated zero mean additive noise on the input.
- Yann Le Cun
More information about the Connectionists
mailing list