This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
research:largescale [2007/08/16 16:56] leonb |
research:largescale [2013/02/25 09:55] leonb [Related] |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Large Scale Learning ====== | ====== Large Scale Learning ====== | ||
+ | |||
+ | |||
+ | ===== Motivation and Goals ===== | ||
The methods of conventional statistics | The methods of conventional statistics | ||
Line 18: | Line 21: | ||
For the sake of the argument, assume that | For the sake of the argument, assume that | ||
there are only two categories of computers. | there are only two categories of computers. | ||
- | The first category, the ``makers,'' | + | The first category, the //makers//, directly mediate human activity. |
They implement the dialogue between sellers and buyers, | They implement the dialogue between sellers and buyers, | ||
run the accounting department, control industrial processes, | run the accounting department, control industrial processes, | ||
route telecommunications, | route telecommunications, | ||
- | huge quantities of data. The second category, the ``thinkers,'' | + | huge quantities of data. The second category, the //thinkers//, |
analyze this data, build models, and draw conclusions that | analyze this data, build models, and draw conclusions that | ||
direct the activity of the makers. | direct the activity of the makers. | ||
Makers produce datasets whose size grows linearly with | Makers produce datasets whose size grows linearly with | ||
their collective computing power. We can roughly write | their collective computing power. We can roughly write | ||
- | this size as $nd$ where $n$ is a number of examples | + | this size as //n d// where //n// is a number of examples |
- | and $d$ the number of features per example. | + | and //d// the number of features per example. |
The computational needs of state-of-the-art learning | The computational needs of state-of-the-art learning | ||
- | algorithms typically grow like $n^2d$ or $nd^2$. | + | algorithms typically grow like //n^2 d// or //n d^2//. |
If thinkers were to use these algorithms, | If thinkers were to use these algorithms, | ||
their collective computing power would have to | their collective computing power would have to | ||
Line 42: | Line 45: | ||
sell enough to recoup their expense. | sell enough to recoup their expense. | ||
+ | // Hence we need algorithms that scale linearly with the size of the training data. // | ||
+ | ===== Approximate Optimization ===== | ||
+ | {{ wall2.png}} | ||
+ | Large-scale machine learning was first approached as an engineering problem. For instance, to leverage a | ||
+ | larger training set, we can use a parallel computer to run a known machine learning algorithm | ||
+ | or adapt more advanced numerical methods to optimize a known machine learning | ||
+ | objective function. Such approaches rely on the appealing | ||
+ | assumption that one can decouple the statistical aspects from the computational aspects of the machine | ||
+ | learning problem. | ||
+ | This work shows that this assumption is incorrect, and that giving it up leads to considerably | ||
+ | more effective learning algorithms. A new theoretical framework | ||
+ | takes into account the effect of approximate | ||
+ | optimization on learning algorithms. | ||
- | ===== Approximate Optimization | + | The analysis shows distinct tradeoffs for the |
+ | case of small-scale and large-scale learning problems. | ||
+ | Small-scale learning problems are subject to the | ||
+ | usual approximation--estimation tradeoff. | ||
+ | Large-scale learning problems are subject to | ||
+ | a qualitatively different tradeoff involving the computational | ||
+ | complexity of the underlying optimization | ||
+ | algorithms in non-trivial ways. | ||
+ | For instance, [[: | ||
+ | appear to be mediocre optimization algorithms and yet are shown to | ||
+ | [[: | ||
+ | |||
+ | |||
+ | |||
+ | ===== Tutorials | ||
+ | * NIPS 2007 tutorial " | ||
- | ===== Stochastic Gradient for Large-Scale Learning | + | ===== Related |
- | [[stochastic]] | + | |
+ | ===== Papers ===== | ||
+ | <box 99% orange> | ||
+ | Léon Bottou and Olivier Bousquet: | ||
+ | //Advances in Neural Information Processing Systems//, 20, | ||
+ | MIT Press, Cambridge, MA, 2008. | ||
- | ===== Reprocessing and Active Learning ===== | + | [[: |
+ | </ | ||
- | [[lasvm]] |