Abstract: This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of small-scale and large-scale learning problems. Small-scale learning problems are subject to the usual approximation–estimation tradeoff. Large-scale learning problems are subject to a qualitatively different tradeoff involving the computational complexity of the underlying optimization algorithms in non-trivial ways. For instance, a mediocre optimization algorithms, stochastic gradient descent, is shown to perform very well on large-scale learning problems.
@incollection{bottou-bousquet-2008b, author = {Bottou, L\'{e}on and Bousquet, Olivier}, title = {Learning Using Large Datasets}, booktitle = {{Mining Massive DataSets for Security}}, publisher = {{IOS} {Press}}, address = {Amsterdam}, year = {2008}, series = {{NATO} {ASI} Workshop Series}, note = {to appear}, url = {http://leon.bottou.org/papers/bottou-bousquet-2008b}, }
Slightly expanded version of the original NIPS version reporting some experimental results from the SGD pages.