Large Scale Online Learning
Abstract: We consider situations where training data is abundant and computing resources are comparatively scarce. We argue that suitably designed online learning algorithms asymptotically outperform any batch learning algorithm. Both theoretical and experimental evidences are presented.
@incollection{bottou-lecun-2004,
  author = {Bottou, L\'{e}on and {LeCun}, Yann},
  title = {Large Scale Online Learning},
  booktitle = {Advances in Neural Information Processing Systems 16 (NIPS 2003)},
  editor = {Thrun, Sebastian and Saul, Lawrence and Bernhard {Sch\"{o}lkopf}},
  publisher = {MIT Press},
  address = {Cambridge, MA},
  year = {2004},
  url = {http://leon.bottou.org/papers/bottou-lecun-2004},
}
Notes
The ASMB version of this work (Bottou and LeCun, 2005) contains the complete proof. The NIPS version was written several months after the ASMB. It contains only a proof sketch but reports experimental results. It also offers a better discussion of the previous results obtained by Murata and Amari (1998) which are in fact much more general than I initially realized. Relative to these results, this work contains three contributions: (a) spelling out the computational consequences of the efficiency results, (b) expressing the solution of the batch learning problems as a stochastic process that is fundamentally similar to a second order stochastic gradient algorithm, and (c) providing a more rigorous proof, taking into account, for instance, how the gain matrix approximates the inverse Hessian.
