## Stochastic Gradient Tricks

*Abstract*:
The first chapter of *Neural Networks, Tricks of the Trade*
strongly advocates the the *stochastic back-propagation* method to
train neural networks. This is in fact an instance of a more general
technique called *stochastic gradient descent*. This chapter provides
background material, explains why SGD is a good learning algorithm
when the training set is large, and provides useful recommendations.
This chapter appears in the “reloaded” edition of the tricks book
(amazon)
(springer).
It completes the material presented in the initial chapter "Efficient Backprop".

**Stochastic Gradient Tricks**,

*Neural Networks, Tricks of the Trade, Reloaded*, 430–445, Edited by Grégoire Montavon, Genevieve B. Orr and Klaus-Robert Müller, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012.

@incollection{bottou-tricks-2012, author = {Bottou, L\'{e}on}, title = {Stochastic Gradient Tricks}, booktitle = {Neural Networks, Tricks of the Trade, Reloaded}, pages = {430--445}, editor = {Montavon, Gr\'{e}goire and Orr, Genevieve B. and M\"{u}ller, Klaus-Robert}, series = {Lecture Notes in Computer Science (LNCS 7700)}, publisher = {Springer}, year = {2012}, url = {http://leon.bottou.org/papers/bottou-tricks-2012}, }

### Erratum

The online version of this paper has been slightly modified on 5/24/2013. The description of the sparse version of the averaged stochastic gradient has been completed with an explanation of the case \( \mu_t=1 \).