Connectionists: New paper on Deep-Learning and Noise: dramatic speedup and accuracy.

Stephen José Hanson jose at rubic.rutgers.edu
Tue Aug 14 07:31:38 EDT 2018


https://arxiv.org/abs/1808.03578


  Dropout is a special case of the stochastic delta rule: faster and
  more accurate deep learning

Noah Frazier-Logue 
<https://arxiv.org/search/cs?searchtype=author&query=Frazier-Logue%2C+N>,Stephen 
José Hanson 
<https://arxiv.org/search/cs?searchtype=author&query=Hanson%2C+S+J>
(Submitted on 10 Aug 2018)

    Multi-layer neural networks have lead to remarkable performance on
    many kinds of benchmark tasks in text, speech and image processing.
    Nonlinear parameter estimation in hierarchical models is known to be
    subject to overfitting. One approach to this overfitting and related
    problems (local minima, colinearity, feature discovery etc.) is
    called dropout (Srivastava, et al 2014, Baldi et al 2016). This
    method removes hidden units with a Bernoulli random variable with
    probabilitypover updates. In this paper we will show that Dropout is
    a special case of a more general model published originally in 1990
    called the stochastic delta rule ( SDR, Hanson, 1990). SDR
    parameterizes each weight in the network as a random variable with
    meanμwijand standard deviationσwij. These random variables are
    sampled on each forward activation, consequently creating an
    exponential number of potential networks with shared weights. Both
    parameters are updated according to prediction error, thus
    implementing weight noise injections that reflect a local history of
    prediction error and efficient model averaging. SDR therefore
    implements a local gradient-dependent simulated annealing per weight
    converging to a bayes optimal network. Tests on standard benchmarks
    (CIFAR) using a modified version of DenseNet shows the SDR
    outperforms standard dropout in error by over 50% and in loss by
    over 50%. Furthermore, the SDR implementation converges on a
    solution much faster, reaching a training error of 5 in just 15
    epochs with DenseNet-40 compared to standard DenseNet-40's 94 epochs.


-- 
Stephen José Hanson
Full Professor
Director RUBIC (University-Wide)
Department of Psychology (NK)
Cognitive Science Center (NB)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20180814/74f6fa15/attachment.html>


More information about the Connectionists mailing list