*Abstract*:
This contribution develops a theoretical framework
that takes into account the effect of approximate
optimization on learning algorithms.
The analysis shows distinct tradeoffs for the
case of small-scale and large-scale learning problems.
Small-scale learning problems are subject to the
usual approximation–estimation tradeoff.
Large-scale learning problems are subject to
a qualitatively different tradeoff involving the computational
complexity of the underlying optimization
algorithms in non-trivial ways.

Léon Bottou and Olivier Bousquet: **The Tradeoffs of Large Scale Learning**, *
Advances in Neural Information Processing Systems*, 20:161–168, Edited by J.C.
Platt, D. Koller, Y. Singer and S. Roweis, NIPS Foundation (http://books.nips.cc),
2008.

@incollection{bottou-bousquet-2008, author = {Bottou, L\'{e}on and Bousquet, Olivier}, title = {The Tradeoffs of Large Scale Learning}, booktitle = {Advances in Neural Information Processing Systems}, pages = {161--168}, publisher = {NIPS Foundation (http://books.nips.cc)}, year = {2008}, editor = {Platt, J.C. and Koller, D. and Singer, Y. and Roweis, S.}, volume = {20}, url = {http://leon.bottou.org/papers/bottou-bousquet-2008}, }

This paper establishes a formal distinction between small-scale and large-scale learning systems. The generalization performance of small-scale systems is entirely defined by the statistical properties of their estimation procedure. The generalization performance of large-scale systems also depends on the chosen optimization algorithm. In particular, stochastic gradient algorithms display very good generalization performances despite being very poor optimization algorithms. Here is a very clear experimental validation.