User Tools

Site Tools


Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
research:largescale [2007/08/16 16:56]
leonb
research:largescale [2013/02/25 09:57]
leonb [Papers]
Line 1: Line 1:
 ====== Large Scale Learning ====== ====== Large Scale Learning ======
 +
 +
 +===== Motivation and Goals =====
  
 The methods of conventional statistics  The methods of conventional statistics 
Line 18: Line 21:
 For the sake of the argument, assume that  For the sake of the argument, assume that 
 there are only two categories of computers. there are only two categories of computers.
-The first category, the ``makers,'' directly mediate human activity.+The first category, the //makers//, directly mediate human activity.
 They implement the dialogue between sellers and buyers, They implement the dialogue between sellers and buyers,
 run the accounting department, control industrial processes, run the accounting department, control industrial processes,
 route telecommunications, etc.  The makers generate and collect route telecommunications, etc.  The makers generate and collect
-huge quantities of data.  The second category, the ``thinkers,''+huge quantities of data.  The second category, the //thinkers//,
 analyze this data, build models, and draw conclusions that  analyze this data, build models, and draw conclusions that 
 direct the activity of the makers. direct the activity of the makers.
 Makers produce datasets whose size grows linearly with  Makers produce datasets whose size grows linearly with 
 their collective computing power. We can roughly write their collective computing power. We can roughly write
-this size as $nd$ where $nis a number of examples +this size as //n d// where //n// is a number of examples 
-and $dthe number of features per example.+and //d// the number of features per example.
 The computational needs of state-of-the-art learning  The computational needs of state-of-the-art learning 
-algorithms typically grow like $n^2d$ or $nd^2$.+algorithms typically grow like //n^2 d// or //n d^2//.
 If thinkers were to use these algorithms,  If thinkers were to use these algorithms, 
 their collective computing power would have to  their collective computing power would have to 
Line 42: Line 45:
 sell enough to recoup their expense.   sell enough to recoup their expense.  
  
 +// Hence we need algorithms that scale linearly with the size of the training data. //
  
 +===== Approximate Optimization =====
  
 +{{ wall2.png}}
 +Large-scale machine learning was first approached as an engineering problem. For instance, to leverage a
 +larger training set, we can use a parallel computer to run a known machine learning algorithm 
 +or adapt more advanced numerical methods to optimize a known machine learning
 +objective function. Such approaches rely on the appealing
 +assumption that one can decouple the statistical aspects from the computational aspects of the machine
 +learning problem. 
  
 +This work shows that this assumption is incorrect, and that giving it up leads to considerably 
 +more effective learning algorithms. A new theoretical framework
 +takes into account the effect of approximate 
 +optimization on learning algorithms.
  
-===== Approximate Optimization =====+The analysis shows distinct tradeoffs for the  
 +case of small-scale and large-scale learning problems. 
 +Small-scale learning problems are subject to the  
 +usual approximation--estimation tradeoff. 
 +Large-scale learning problems are subject to 
 +a qualitatively different tradeoff involving the computational  
 +complexity of the underlying optimization  
 +algorithms in non-trivial ways. 
 +For instance, [[:research:stochastic|Stochastic Gradient Descent (SGD)]] algorithms 
 +appear to be mediocre optimization algorithms and yet are shown to  
 +[[:projects/sgd|perform extremely well]] on large-scale learning problems. 
 + 
 + 
 + 
 +===== Tutorials ===== 
 + 
 +  * NIPS 2007 tutorial "[[:talks/largescale|Large Scale Learning]]"
 + 
 +===== Related ===== 
 + 
 +   * [[:research:stochastic|Stochastic gradient learning algorithms]] 
 +===== Papers ===== 
 + 
 +<box 99% orange> 
 +Léon Bottou and Olivier Bousquet:  **The Tradeoffs of Large Scale Learning**,   
 +//Advances in Neural Information Processing Systems//, 20,  
 +MIT Press, Cambridge, MA, 2008. 
 + 
 +[[:papers/bottou-bousquet-2008|more...]] 
 +</box>
  
 +<box 99% orange>
 +Léon Bottou and Yann LeCun:  **On-line Learning for Very Large Datasets**,  //Applied Stochastic Models in Business and Industry//, 21(2):137-151, 2005.
  
-===== Stochastic Gradient for Large-Scale Learning =====+[[:papers/bottou-lecun-2004a|more...]] 
 +</box>
  
-[[stochastic]]+<box 99% orange> 
 +Léon Bottou:  **Online Algorithms and Stochastic Approximations**,  //Online Learning and Neural Networks//, Edited by David Saad, Cambridge University Press, Cambridge, UK, 1998.
  
 +[[:papers/bottou-98x|more...]]
 +</box>
  
-===== Reprocessing and Active Learning =====+<box 99% blue> 
 +Léon Bottou:  //**Une Approche théorique de l'Apprentissage Connexionniste: Applications à la Reconnaissance de la Parole**//, Orsay, France, 1991.
  
 +[[:papers/bottou-91a|more...]]
 +</box>
  
-[[lasvm]] 
research/largescale.txt · Last modified: 2013/02/25 09:57 by leonb

Page Tools