Paper Announcement (Neuroprose)
Wray Buntine
wray at ptolemy.arc.nasa.gov
Wed Oct 23 18:33:42 EDT 1991
> Simplifying Neural Network
> Soft Weight-Sharing Measures
> by
> Soft Weight-Measure
> Soft Weight Sharing
>
> Barak Pearlmutter
> Department of Psychology
> P.O. Box 11A Yale Station
I enjoyed this take-off immensely.
Determining good regularisers (or priors) is a major problem facing
feed-forward network research (and related representations), so I also
enjoyed the original Nowlan-Hinton paper. Dramatic performance
improvements can be got by careful choice of regulariser/prior (I know
this from my tree research), and its a bit of a black art right now,
though I have some good directions. Nowlan & Hinton suggest a strong
theoretical basis exists for their approach (see their section 8), so
perhaps we'll see more of this style, and "cleaner" versions to keep
the theoreticians happy.
By the way, at CLNL in Berkeley in August I expressed the view that
this problem: i.e.
Regularizers
------------
for a given network/activation-function configuration,
what are suitable parameterised families of regularizes,
and how might the parameters be set from the knowledge
of the particular application being addressed
NB. the setting of the $\lambda$ tradeoff term in Nowlan & Hinton's
equation (1) has several fairly elegant and practical solutions
along with:
Training
--------
decision-theoretic/bounded-rationality approaches to
batch vs. block (sub-batch) vs. pattern updates during gradient
descent (i.e. of back-prop.)
(i.e. the Fahlman-LeCunn-English-Grajski-et-al. discussion,
or the batch update vs. stochastic update problem)
and subsequent addition of second-order gradient methods
as two of the most pressing problems to make feed-forward networks
a "mature" technology that will then supercede many earlier
non-neural methods.
Wray Buntine
NASA Ames Research Center phone: (415) 604 3389
Mail Stop 244-17 fax: (415) 604 6997
Moffett Field, CA, 94035 email: wray at ptolemy.arc.nasa.gov
PS.thanks also to Martin Moller for adding some meat to the Training
problem:
> An interesting observation is that the number of blocks needed
> to make an update is growing during learning so that after a certain
> number of epochs the blocksize is equal to the number of patterns.
> When this happens the algorithm is equal to a traditional batch-mode
> algorithm and no validation is needed anymore.
When explaining batch update vs. stochastic update to people,
I always use this behaviour as an example of what a decision-theoretic
training scheme **should** do, so I'm glad you've confirmed it
experimentally.
More information about the Connectionists
mailing list