===== Curiously fast convergence of some stochastic gradient descent algorithms =====
This document reports an empirical finding about the convergence speed
of stochastic gradient algorithms applied to finite training set and
challenges the attendees of [[https://www.ceremade.dauphine.fr/SLDS2009|SLDS 2009]]
with their theoretical implications. The stochastic gradient optimization theory
states that the residual error decreases no faster than //t-1// when
the examples are picked randomly with replacement. The main observation is that
picking the examples with replacement (i.e. shuffling the data before each epoch and
going sequentially) achieves a speed curiously close to //t-2//.
To my knowledge (as of 2014), the convergence speed for SGD sampled
without replacement has not been theoretically established.
Although the open problems were not published in the
[[summa-2011|SLDS 2009 conference book]],
this finding was mentioned and replicated in the
[[http://dl.acm.org/citation.cfm?id=1966812|journal]]
[[http://www.cs.huji.ac.il/~shais/papers/ShalevSiSrCo10.pdf|version of the Pegasos paper]] (2011, section 7.5).
This open problem was also mentioned by [[http://www.eecs.berkeley.edu/~brecht|Ben Recht]] during the
[[http://www.ipam.ucla.edu/programs/sgm2014|IPAM Workshop on Stochastic Gradient Methods]] (2014)
in relation with the much more impressive speeds achieved by recent algorithms
for the optimization of finite sums ([[http://arxiv.org/abs/1309.2388|SAG]],
[[http://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction|SVRG]].)
New (Dec 2015): [[https://asu.mit.edu/|Asu Ozdaglar]] gave a proof during her NIPS 2015 invited talk. ([[http://research.microsoft.com/apps/video/?id=259592|video recording ]]) ([[http://arxiv.org/abs/1510.08560|ArXiV]])
\\
Léon Bottou: //**Curiously fast convergence of some stochastic gradient descent algorithms**//,
Unpublished open problem offered to the attendance of the SLDS 2009 conference, 2009.
[[http://leon.bottou.org/publications/djvu/slds-2009.djvu|slds-2009.djvu]]
[[http://leon.bottou.org/publications/pdf/slds-2009.pdf|slds-2009.pdf]]
[[http://leon.bottou.org/publications/psgz/slds-2009.ps.gz|slds-2009.ps.gz]]
@unpublished{bottou-slds-open-problem-2009,
author = {Bottou, L\'{e}on},
title = {Curiously fast convergence of some stochastic gradient descent algorithms},
note = {Unpublished open problem offered to the attendance of the SLDS 2009 conference},
year = {2009},
url = {http://leon.bottou.org/papers/bottou-slds-open-problem-2009},
}