*Abstract*:
During the last decade, the data sizes have grown faster than the speed of
processors. In this context, the capabilities of statistical machine learning
methods is limited by the computing time rather than the sample size. A more
precise analysis uncovers qualitatively different trade-offs for the case of
small-scale and large-scale learning problems. The large-scale case involves
the computational complexity of the underlying optimization algorithm in
non-trivial ways. Unlikely optimization algorithms such as stochastic gradient
descent show amazing performance for large-scale problems. In particular,
second order stochastic gradient and averaged stochastic gradient are
asymptotically efficient after a single pass on the training set.

Léon Bottou: **Large-Scale Machine Learning with Stochastic Gradient Descent**, *Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT'2010)*, 177–187, Edited by Yves Lechevallier and Gilbert Saporta, Paris, France, August 2010, Springer.

@inproceedings{bottou-2010, author = {Bottou, L\'{e}on}, title = {Large-Scale Machine Learning with Stochastic Gradient Descent}, year = {2010}, booktitle = {Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT'2010)}, editor = {Lechevallier, Yves and Saporta, Gilbert}, address = {Paris, France}, month = {August}, publisher = {Springer}, pages = {177--187}, url = {http://leon.bottou.org/papers/bottou-2010}, }

This short papers review stochastic gradient descent for machine learning, justifies it using the same argument as (Bottou & Bousquet,2008), and compares two interesting accelerated algorithms, SGDQN (Bordes & al., 2009), and Averaged Stochastic Gradient (Polyak & Juditsky, 1992).

A preprint of (Xu, 2010) is available on Arxiv:
arXiv:1107.2490v1 [cs.LG].