*Abstract*:
The SGDQN algorithm is a stochastic gradient descent
algorithm that makes careful use of second-order information
and splits the parameter update into independently scheduled
components. Thanks to this design, SGDQN iterates nearly as
fast as a first-order stochastic gradient
descent but requires less iterations
to achieve the same accuracy.
This algorithm won the “Wild Track” of the first
PASCAL Large Scale Learning Challenge.

*Errata*:
Please see section Errata below.

Antoine Bordes, Léon Bottou and Patrick Gallinari: **SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent**, *Journal of Machine Learning Research*, 10:1737–1754, July 2009.

JMLR Link JMLR Erratum jmlr-2009.djvu jmlr-2009.pdf jmlr-2009.ps.gz

@article{bordes-bottou-gallinari-2009, author = {Bordes, Antoine and Bottou, L\'{e}on and Gallinari, Patrick}, title = {SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent}, journal = {Journal of Machine Learning Research}, year = {2009}, volume = {10}, pages = {1737--1754}, month = {July}, url = {http://leon.bottou.org/papers/bordes-bottou-gallinari-2009}, }

The complete source code of LibSGDQN is available on Antoine's web site. This source code comes with a script that replicates the experiments discussed in this paper.

The appendix contains a derivation of upper and lower bounds on the asymptotic convergence speed of stochastic gradient algorithm. The constants are exact in the case of second order stochastic gradient.

The SGDQN algorithm as described in this paper contains a subtle flaw described in a subsequent erratum.

There is a missing 1/2 factor in the bounds of theorem 1.

\[ \def\w{\mathbf{w}} {\frac{1}{2}} \frac{{\mathrm tr}(\mathbf{HBGB})}{2\lambda_{\max}-1}\,t^{-1} + {\mathrm o}(t^{-1}) ~\leq~ \mathbb{E}_{\sigma}\big[\:{\cal P}_n(\w_t)-{\cal P}_n(\w^*_n)\:\big] ~\leq~ {\frac{1}{2}} \frac{{\mathrm tr}(\mathbf{HBGB})}{2\lambda_{\min}-1}\,t^{-1} + {\mathrm o}(t^{-1}) \]

The version of the paper found on this site contains the correct theorem and proof.