In the PDP books batch learning accumulates error derivatives from each pattern rather than simply their contributions to the total error, before making weight changes. It seems that gradient descent ought to add all the errors before taking any derivatives. Any comments? Howard Card