User Tools

Site Tools


Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
papers:tr-bottou-2011 [2012/09/12 08:11]
leonb
papers:tr-bottou-2011 [2016/09/12 09:44] (current)
leonb [Notes]
Line 1: Line 1:
 ===== From Machine Learning to Machine Reasoning ===== ===== From Machine Learning to Machine Reasoning =====
  
-//Abstract//: A plausible definition of "reasoning" could be "algebraically manipulating previously acquired knowledge in order to answer a new question". This definition covers first-order logical inference or probabilistic inference. It also includes much simpler manipulations commonly used to build large learning systems. For instance, we can build an optical character recognition system by first training a character segmenter, an isolated character recognizer, and a language model, using appropriate labeled training sets. Adequately concatenating these modules and fine tuning the resulting system can be viewed as an algebraic operation in a space of models. The resulting model answers a new question, that is, converting the image of a text page into a computer readable text.+//Abstract//: A plausible definition of "reasoning" could be "algebraically manipulating previously acquired knowledge in order to answer a new question". This definition covers first-order logical inference or probabilistic inference. It also includes the simpler manipulations commonly used to build large learning systems. For instance, we can build an optical character recognition system by first training a character segmenter, an isolated character recognizer, and a language model, using appropriate labeled training sets. Adequately concatenating these modules and fine tuning the resulting system can be viewed as an algebraic operation in a space of models. The resulting model answers a new question, that is, converting the image of a text page into a computer readable text.
 This observation suggests a conceptual continuity between algebraically rich inference systems, such as logical or probabilistic inference, and simple manipulations, such as the mere concatenation of trainable learning systems. Therefore, instead of trying to bridge the gap between machine learning systems and sophisticated "all-purpose" inference mechanisms, we can instead algebraically enrich the set of manipulations applicable to training systems, and build reasoning capabilities from the ground up. This observation suggests a conceptual continuity between algebraically rich inference systems, such as logical or probabilistic inference, and simple manipulations, such as the mere concatenation of trainable learning systems. Therefore, instead of trying to bridge the gap between machine learning systems and sophisticated "all-purpose" inference mechanisms, we can instead algebraically enrich the set of manipulations applicable to training systems, and build reasoning capabilities from the ground up.
  
Line 9: Line 9:
 arXiv:1102.1808, February 2011. arXiv:1102.1808, February 2011.
  
 +\\
 [[http://arxiv.org/abs/1102.1808|arXiv link]] [[http://arxiv.org/abs/1102.1808|arXiv link]]
 [[http://leon.bottou.org/publications/djvu/tr-2011-02-08.djvu|tr-2011-02-08.djvu]] [[http://leon.bottou.org/publications/djvu/tr-2011-02-08.djvu|tr-2011-02-08.djvu]]
Line 23: Line 24:
     url = {http://leon.bottou.org/papers/tr-bottou-2011},     url = {http://leon.bottou.org/papers/tr-bottou-2011},
   }   }
 +
 +
 +A revision of this text was published in [[bottou-mlj-2013| 2014 (MLJ)]].
 +
 +
 +===== Notes =====
 +
 +This documents cite the work of Vincent Etter (2009) carried out during his NEC Labs internship. Vincent's master report is now available on [[http://vincent.etter.io/publications/etter2009master.pdf|his home page]] ({{vincentetter.pdf|local copy}}). Section 5 is an exploration of that were extensively discussed between Ronan Collobert, Jason Weston and I. We had the hope to discover relevant recursive sentence representation in an unsupervised manner. Alas, we found that the shape of the structure of a recursive network has very little impact on its representation abilities, something that was clearly confirmed by [[https://arxiv.org/pdf/1301.2811.pdf|Scheible and Schütze (2013)]] on a sentiment classification task. Even a left-to-right tree (which amounts to using a recurrent neural network in fact) works essentially as well, something that was cleanly confirmed by [[http://arxiv.org/pdf/1506.01057v1.pdf|Li et al. (2015)]] on a broad collection of NLP tasks.  I still had hopes to make it work when I wrote this tech report in 2010. However these two works have convinced me that structure discovery won't happen without a new idea.
 +
  
papers/tr-bottou-2011.1347451868.txt.gz · Last modified: 2012/09/12 08:11 by leonb

Page Tools