===== Stochastic Gradient Tricks ===== //Abstract//: The first chapter of //Neural Networks, Tricks of the Trade// strongly advocates the the //stochastic back-propagation// method to train neural networks. This is in fact an instance of a more general technique called //stochastic gradient descent//. This chapter provides background material, explains why SGD is a good learning algorithm when the training set is large, and provides useful recommendations. This chapter appears in the "reloaded" edition of the tricks book [[http://www.amazon.com/Neural-Networks-Lecture-Computer-Theoretical/dp/364235288X|(amazon)]] [[http://www.springer.com/computer/theoretical+computer+science/book/978-3-642-35288-1|(springer)]]. It completes the material presented in the initial chapter [[lecun-98x|"Efficient Backprop"]]. Léon Bottou: **Stochastic Gradient Tricks**, //Neural Networks, Tricks of the Trade, Reloaded//, 430--445, Edited by Grégoire Montavon, Genevieve B. Orr and Klaus-Robert Müller, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012. [[http://leon.bottou.org/publications/djvu/tricks-2012.djvu|tricks-2012.djvu]] [[http://leon.bottou.org/publications/pdf/tricks-2012.pdf|tricks-2012.pdf]] [[http://leon.bottou.org/publications/psgz/tricks-2012.ps.gz|tricks-2012.ps.gz]] @incollection{bottou-tricks-2012, author = {Bottou, L\'{e}on}, title = {Stochastic Gradient Tricks}, booktitle = {Neural Networks, Tricks of the Trade, Reloaded}, pages = {430--445}, editor = {Montavon, Gr\'{e}goire and Orr, Genevieve B. and M\"{u}ller, Klaus-Robert}, series = {Lecture Notes in Computer Science (LNCS 7700)}, publisher = {Springer}, year = {2012}, url = {http://leon.bottou.org/papers/bottou-tricks-2012}, } ==== Erratum ==== The online version of this paper has been slightly modified on 5/24/2013. The description of the sparse version of the averaged stochastic gradient has been completed with an explanation of the case \( \mu_t=1 \).