===== Stochastic Gradient Tricks =====

//Abstract//:
The first chapter of //Neural Networks, Tricks of the Trade// 
strongly advocates the the //stochastic back-propagation// method to
train neural networks.  This is in fact an instance of a more general
technique called //stochastic gradient descent//. This chapter provides 
background material, explains why SGD is a good learning algorithm
when the training set is large, and provides useful recommendations.
This chapter appears in the "reloaded" edition of the tricks book
[[http://www.amazon.com/Neural-Networks-Lecture-Computer-Theoretical/dp/364235288X|(amazon)]]
[[http://www.springer.com/computer/theoretical+computer+science/book/978-3-642-35288-1|(springer)]].
It completes the material presented in the initial chapter [[lecun-98x|"Efficient Backprop"]].

<box 99% orange>
Léon Bottou:  **Stochastic Gradient Tricks**,  //Neural Networks, Tricks of the Trade, Reloaded//, 430--445, Edited by Grégoire Montavon, Genevieve B. Orr and Klaus-Robert Müller, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012.

[[http://leon.bottou.org/publications/djvu/tricks-2012.djvu|tricks-2012.djvu]]
[[http://leon.bottou.org/publications/pdf/tricks-2012.pdf|tricks-2012.pdf]]
[[http://leon.bottou.org/publications/psgz/tricks-2012.ps.gz|tricks-2012.ps.gz]]
</box>

  @incollection{bottou-tricks-2012,
    author = {Bottou, L\'{e}on},
    title = {Stochastic Gradient Tricks},
    booktitle = {Neural Networks, Tricks of the Trade, Reloaded},
    pages = {430--445},
    editor = {Montavon, Gr\'{e}goire and Orr, Genevieve B. and M\"{u}ller, Klaus-Robert},
    series = {Lecture Notes in Computer Science (LNCS 7700)},
    publisher = {Springer},
    year = {2012},
    url = {http://leon.bottou.org/papers/bottou-tricks-2012},
  }


==== Erratum ====

The online version of this paper has been slightly modified on 5/24/2013.
The description of the sparse version of the averaged stochastic gradient
has been completed with an explanation of the case \( \mu_t=1 \).