===== SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent =====

//Abstract//:
The SGDQN algorithm is a stochastic gradient descent 
algorithm that makes careful use of second-order information
and splits the parameter update into independently scheduled 
components. Thanks to this design, SGDQN iterates nearly as 
fast as a first-order stochastic gradient 
descent but requires less iterations 
to achieve the same accuracy. 
This algorithm won the "Wild Track" of the first 
PASCAL Large Scale Learning Challenge.


<fc #0000ff>//Errata//: 
Please see section [[#Errata]] below.</fc>


<box 99% orange>
Antoine Bordes, Léon Bottou and Patrick Gallinari:  **SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent**,  //Journal of Machine Learning Research//, 10:1737--1754, July 2009.

[[http://jmlr.csail.mit.edu/papers/v10/bordes09a.html|JMLR Link]]
[[http://jmlr.csail.mit.edu/papers/v11/bordes10a.html|JMLR Erratum]]
(nbsp)(nbsp)
[[http://leon.bottou.org/publications/djvu/jmlr-2009.djvu|jmlr-2009.djvu]]
[[http://leon.bottou.org/publications/pdf/jmlr-2009.pdf|jmlr-2009.pdf]]
[[http://leon.bottou.org/publications/psgz/jmlr-2009.ps.gz|jmlr-2009.ps.gz]]
</box>

  @article{bordes-bottou-gallinari-2009,
    author = {Bordes, Antoine and Bottou, L\'{e}on and Gallinari, Patrick},
    title = {SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent},
    journal = {Journal of Machine Learning Research},
    year = {2009},
    volume = {10},
    pages = {1737--1754},
    month = {July},
    url = {http://leon.bottou.org/papers/bordes-bottou-gallinari-2009},
  }

==== Implementation ====

The complete source code of 
[[http://webia.lip6.fr/~bordes/mywiki/doku.php?id=sgdqn|LibSGDQN]]
is available on 
[[http://webia.lip6.fr/~bordes/mywiki/doku.php|Antoine's]] web site.
This source code comes with a script that replicates the
experiments discussed in this paper.


==== Appendix ====

The appendix contains a derivation of upper and lower bounds 
on the asymptotic convergence speed of stochastic gradient algorithm.
The constants are exact in the case of second order stochastic gradient.


==== Errata ====

The SGDQN algorithm as described in this paper contains a subtle flaw
described in a subsequent [[:papers:bordes-2010|erratum]].

There is a missing 1/2 factor in the bounds of theorem 1.

\[
 \def\w{\mathbf{w}}
 {\frac{1}{2}} \frac{{\mathrm tr}(\mathbf{HBGB})}{2\lambda_{\max}-1}\,t^{-1} + {\mathrm o}(t^{-1})
  ~\leq~ \mathbb{E}_{\sigma}\big[\:{\cal P}_n(\w_t)-{\cal P}_n(\w^*_n)\:\big] ~\leq~ 
  {\frac{1}{2}} \frac{{\mathrm tr}(\mathbf{HBGB})}{2\lambda_{\min}-1}\,t^{-1} + {\mathrm o}(t^{-1}) 
\]

The version of the paper found on this site contains the correct theorem and proof.