The Tradeoffs of Large Scale Learning

Abstract: This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of small-scale and large-scale learning problems. Small-scale learning problems are subject to the usual approximation–estimation tradeoff. Large-scale learning problems are subject to a qualitatively different tradeoff involving the computational complexity of the underlying optimization algorithms in non-trivial ways.

Léon Bottou and Olivier Bousquet: The Tradeoffs of Large Scale Learning, Advances in Neural Information Processing Systems, 20:161–168, Edited by J.C. Platt, D. Koller, Y. Singer and S. Roweis, NIPS Foundation (, 2008.

nips-2007.djvu nips-2007.pdf

  author = {Bottou, L\'{e}on and Bousquet, Olivier},
  title = {The Tradeoffs of Large Scale Learning},
  booktitle = {Advances in Neural Information Processing Systems},
  pages = {161--168},
  publisher = {NIPS Foundation (},
  year = {2008},
  editor = {Platt, J.C. and Koller, D. and Singer, Y. and Roweis, S.},
  volume = {20},
  url = {},


This paper establishes a formal distinction between small-scale and large-scale learning systems. The generalization performance of small-scale systems is entirely defined by the statistical properties of their estimation procedure. The generalization performance of large-scale systems also depends on the chosen optimization algorithm. In particular, stochastic gradient algorithms display very good generalization performances despite being very poor optimization algorithms. Here is a very clear experimental validation.

papers/bottou-bousquet-2008.txt · Last modified: 2008/12/15 12:04 by leonb
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0