No subject

kanderso@BBN.COM kanderso at BBN.COM
Thu Jan 5 16:30:15 EST 1989


I enjoyed John's summary of weight decay, but it raised a few
questions.  Just as John did, i'll be glad to summarize the responses
to the group.

1.  <hinton at ai.toronto.edu> mentioned that " Weight-decay is a version
of what statisticians call "Ridge Regression"."  What do you mean by
"version" is is exactly the same, or just slightly?  I think i know
what Ridge Regression is, but i don't see an obvious strong
connection.  I see a weak one, and after i think about it more maybe i'll
say something about it.

The ideas behind Ridge regression probably came from Levenberg and
Marquardt who used it in nonlinear least squares:

Levenberg K., A Method for the solution of certain nonlinear problems
  in least squares, Q. Appl. Math, Vol 2, pages 164-168, 1944.

Marquardt, D.W., An algorithm for least squares estimation of
  non-linear parameters, J. Soc. Industrial and Applied Math.,
  11:431-441, 1963.

2.  John quoted Dave Rumelhart as saying that standard weight decay
distributes weights more evenly over the  given connections, thereby
increasing robustness.  Why does smearing out large weights increase
robustness?  What does robustness mean here, the ability to generalize?

k


More information about the Connectionists mailing list