### Table of Contents

## SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

*Abstract*:
The SGDQN algorithm is a stochastic gradient descent
algorithm that makes careful use of second-order information
and splits the parameter update into independently scheduled
components. Thanks to this design, SGDQN iterates nearly as
fast as a first-order stochastic gradient
descent but requires less iterations
to achieve the same accuracy.
This algorithm won the “Wild Track” of the first
PASCAL Large Scale Learning Challenge.

*Errata*:
Please see section Errata below.

**SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent**,

*Journal of Machine Learning Research*, 10:1737–1754, July 2009.

JMLR Link JMLR Erratum jmlr-2009.djvu jmlr-2009.pdf jmlr-2009.ps.gz

@article{bordes-bottou-gallinari-2009, author = {Bordes, Antoine and Bottou, L\'{e}on and Gallinari, Patrick}, title = {SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent}, journal = {Journal of Machine Learning Research}, year = {2009}, volume = {10}, pages = {1737--1754}, month = {July}, url = {http://leon.bottou.org/papers/bordes-bottou-gallinari-2009}, }

### Implementation

The complete source code of LibSGDQN is available on Antoine's web site. This source code comes with a script that replicates the experiments discussed in this paper.

### Appendix

The appendix contains a derivation of upper and lower bounds on the asymptotic convergence speed of stochastic gradient algorithm. The constants are exact in the case of second order stochastic gradient.

### Errata

The SGDQN algorithm as described in this paper contains a subtle flaw described in a subsequent erratum.

There is a missing 1/2 factor in the bounds of theorem 1.

\[ \def\w{\mathbf{w}} {\frac{1}{2}} \frac{{\mathrm tr}(\mathbf{HBGB})}{2\lambda_{\max}-1}\,t^{-1} + {\mathrm o}(t^{-1}) ~\leq~ \mathbb{E}_{\sigma}\big[\:{\cal P}_n(\w_t)-{\cal P}_n(\w^*_n)\:\big] ~\leq~ {\frac{1}{2}} \frac{{\mathrm tr}(\mathbf{HBGB})}{2\lambda_{\min}-1}\,t^{-1} + {\mathrm o}(t^{-1}) \]

The version of the paper found on this site contains the correct theorem and proof.