Weight Decay

Wed Jan 25 15:13:58 EST 1989

Consider a single layer linear network with N inputs. 
When the number of training pattern is smaller than N , the
set of solutions (in weight space) is a proper linear subspace.
adding weight decay will select the minimum norm solution in this subspace
(if the weight decay coefficient is decreased with time).
The minimum norm solution happens to be the solution given by the 
pseudo-inverse technique (cf Kohonen), and the solution which
optimally cancels out uncorrelated zero mean additive noise on the input.

- Yann Le Cun