====== The Tradeoffs of Large-scale Learning ======

This lecture was prepared for the NIPS 2007 tutorial.
Variants of this lecture were given at 
[[http://www.ipam.ucla.edu/programs/sews2/|IPAM]]
[[http://google.com|Google]], 
[[http://research.microsoft.com/|Microsoft Research]],
the [[http://ncp07.insa-rouen.fr/|International Conference of Nonconvex Programming (NCP'07)]],
the [[http://capfr08.googlepages.com/|Conference Francophone sur l'Apprentissage Automatique (CAP'08)]],
the NIPS workshop [[http://opt2008.kyb.tuebingen.mpg.de/|Optimization for Machine Learning (OPT'08)]]
[[http://ceremade.communication-pro.fr/|Symposium on Learning and Data Sciences (SLDS'09)]], and the 
[[http://ismp2009.eecs.northwestern.edu/|International Symposium of Mathematical Programming (ISMP'09)]]

===== Summary ===== 

{{wall.png?180 }}
Pervasive and networked computers have reduced the cost
of collecting and distributing large-scale datasets.
Since usual machine learning algorithms demand computing 
times that grow faster than the volume of the data,
computing time is now the bottleneck 
in real life applications.

The first part of this presentation clarifies the relation between the statistical efficiency, 
the design of learning algorithms and their computational cost. 
The analysis shows distinct tradeoffs for the 
case of small-scale and large-scale learning problems.
Small-scale learning problems are subject to the 
usual approximation--estimation tradeoff.
Large-scale learning problems are subject to
a qualitatively different tradeoff involving the computational 
complexity of the underlying optimization 
algorithms in non-trivial ways.
For instance, a mediocre optimization algorithms,
stochastic gradient descent, is shown to perform 
very well on large-scale learning problems.
The second part makes a detailled exploration 
of stochastic gradient descent learning algorithms.
Simple and complex experiments are discussed.
[[:projects:sgd|The source code and datasets are available online]].
The third part discusses algorithms that 
learn with a single pass over the data. 

===== Links =====

  * The slides: [[http://leon.bottou.org/slides/largescale/lstut.djvu|(djvu, 236k)]], [[http://leon.bottou.org/slides/largescale/lstut.pdf|(pdf, 376k)]].

  * [[:projects:sgd|Stochastic Gradient source code for SVM and CRF]].

  * [[:research/approxalgo|Learning with Approximative Optimization]].

  * [[:research/stochastic|Learning with Stochastic Gradient]].

===== Papers =====

<box 99% orange>
Léon Bottou and Olivier Bousquet:  **The Tradeoffs of Large Scale Learning**,  
//Advances in Neural Information Processing Systems//, 20, 
MIT Press, Cambridge, MA, 2008.

[[:papers/bottou-bousquet-2008|more...]]
</box>