===== The Tradeoffs of Large Scale Learning =====

//Abstract//:
This contribution develops a theoretical framework
that takes into account the effect of approximate
optimization on learning algorithms.
The analysis shows distinct tradeoffs for the
case of small-scale and large-scale learning problems.
Small-scale learning problems are subject to the
usual approximation--estimation tradeoff.
Large-scale learning problems are subject to
a qualitatively different tradeoff involving the computational
complexity of the underlying optimization
algorithms in non-trivial ways.

<box 99% orange>
Léon Bottou and Olivier Bousquet:  **The Tradeoffs of Large Scale Learning**,  //
Advances in Neural Information Processing Systems 20 (NIPS 2007)//, 161--168, Edited by J.C.
Platt, D. Koller, Y. Singer and S. Roweis, NIPS Foundation (http://books.nips.cc),
2008.

[[http://leon.bottou.org/publications/djvu/nips-2007.djvu|nips-2007.djvu]]
[[http://leon.bottou.org/publications/pdf/nips-2007.pdf|nips-2007.pdf]]
[[http://leon.bottou.org/publications/psgz/nips-2007.ps.gz|nips-2007.ps.gz]]
</box>

  @incollection{bottou-bousquet-2008,
    author = {Bottou, L\'{e}on and Bousquet, Olivier},
    title = {The Tradeoffs of Large Scale Learning},
    booktitle = {Advances in Neural Information Processing Systems 20 (NIPS 2007)},
    pages = {161--168},
    publisher = {NIPS Foundation (http://books.nips.cc)},
    year = {2008},
    editor = {Platt, J.C. and Koller, D. and Singer, Y. and Roweis, S.},
    url = {http://leon.bottou.org/papers/bottou-bousquet-2008},
  }


==== Notes ====

This paper establishes a formal distinction between small-scale and large-scale learning systems. The generalization performance of small-scale systems is entirely defined by the statistical properties of their estimation procedure. The 
generalization performance of large-scale systems also depends on the chosen
optimization algorithm.  
In particular, [[:research:stochastic|stochastic gradient algorithms]] display
very good generalization performances despite being very poor optimization algorithms.
Here is a [[:projects:sgd|very clear experimental validation]].