This talk was first given in the Sixth Annual Machine Learning Symposium of the New York Academy of Sciences and in the NIPS 2011 Workshop on Computational Trade-offs in Statistical Learning.
The goal of the presentation is to describe practical stochastic gradient algorithms that process each training example only once, yet asymptotically match the performance of the true optimum. This statement needs, of course, to be made more precise. To achieve this, we'll review the works of Nevel'son and Has'minskij (1972), Fabian (1973, 1978), Murata & Amari (1998), Bottou & LeCun (2004), Polyak & Juditsky (1992), Wei Xu (2010), and Bach & Moulines (2011). We will then show how these ideas lead to practical algorithms and new challenges.