leon.bottou.org

From Machine Learning to Machine Reasoning

Abstract: A plausible definition of “reasoning” could be “algebraically manipulating previously acquired knowledge in order to answer a new question”. This definition covers first-order logical inference or probabilistic inference. It also includes the simpler manipulations commonly used to build large learning systems. For instance, we can build an optical character recognition system by first training a character segmenter, an isolated character recognizer, and a language model, using appropriate labeled training sets. Adequately concatenating these modules and fine tuning the resulting system can be viewed as an algebraic operation in a space of models. The resulting model answers a new question, that is, converting the image of a text page into a computer readable text. This observation suggests a conceptual continuity between algebraically rich inference systems, such as logical or probabilistic inference, and simple manipulations, such as the mere concatenation of trainable learning systems. Therefore, instead of trying to bridge the gap between machine learning systems and sophisticated “all-purpose” inference mechanisms, we can instead algebraically enrich the set of manipulations applicable to training systems, and build reasoning capabilities from the ground up.

Léon Bottou: From Machine Learning to Machine Reasoning, arXiv:1102.1808, February 2011.

arXiv link tr-2011-02-08.djvu tr-2011-02-08.pdf tr-2011-02-08.ps.gz

@techreport{tr-bottou-2011,
  author = {Bottou, L\'eon},
  title = {From Machine Learning to Machine Reasoning},
  institution = {arXiv.1102.1808},
  month = {February},
  year = {2011},
  url = {http://leon.bottou.org/papers/tr-bottou-2011},
}

A revision of this text was published in 2014 (MLJ).

Notes

This documents cite the work of Vincent Etter (2009) carried out during his NEC Labs internship. Vincent's master report is now available on his home page (local copy). Section 5 is an exploration of that were extensively discussed between Ronan Collobert, Jason Weston and I. We had the hope to discover relevant recursive sentence representation in an unsupervised manner. Alas, we found that the shape of the structure of a recursive network has very little impact on its representation abilities, something that was clearly confirmed by Scheible and Schütze (2013) on a sentiment classification task. Even a left-to-right tree (which amounts to using a recurrent neural network in fact) works essentially as well, something that was cleanly confirmed by Li et al. (2015) on a broad collection of NLP tasks. I still had hopes to make it work when I wrote this tech report in 2010. However these two works have convinced me that structure discovery won't happen without a new idea.