## Curiously fast convergence of some stochastic gradient descent algorithms

This document reports an empirical finding about the convergence speed
of stochastic gradient algorithms applied to finite training set and
challenges the attendees of SLDS 2009
with their theoretical implications. The stochastic gradient optimization theory
states that the residual error decreases no faster than *t ^{-1}* when
the examples are picked randomly with replacement. The main observation is that
picking the examples with replacement (i.e. shuffling the data before each epoch and
going sequentially) achieves a speed curiously close to

*t*. To my knowledge (as of 2014), the convergence speed for SGD sampled without replacement has not been theoretically established. Although the open problems were not published in the SLDS 2009 conference book, this finding was mentioned and replicated in the journal version of the Pegasos paper (2011, section 7.5). This open problem was also mentioned by Ben Recht during the IPAM Workshop on Stochastic Gradient Methods (2014) in relation with the much more impressive speeds achieved by recent algorithms for the optimization of finite sums (SAG, SVRG.)

^{-2}New (Dec 2015): Asu Ozdaglar gave a proof during her NIPS 2015 invited talk. (video recording ) (ArXiV)

*, Unpublished open problem offered to the attendance of the SLDS 2009 conference, 2009.*

**Curiously fast convergence of some stochastic gradient descent algorithms**

@unpublished{bottou-slds-open-problem-2009, author = {Bottou, L\'{e}on}, title = {Curiously fast convergence of some stochastic gradient descent algorithms}, note = {Unpublished open problem offered to the attendance of the SLDS 2009 conference}, year = {2009}, url = {http://leon.bottou.org/papers/bottou-slds-open-problem-2009}, }