I am a research scientist with broad interests in practical and theoretical machine learning. My work on large scale learning and stochastic gradient algorithms has received attention in the recent years. I am also known for the DjVu document compression system. I joined Facebook AI Research in March 2015.
Use the sidebar to navigate this site.
Announcing the Fourth Symposium on Statistical Learning and Data Science (SLDS), to be held at Science Po Paris, July 11-13 2018. The SLDS symposia are held once every three years and historically have had strong themes. Given the involvement of Science Po and CEVIPOF, the theme of the fourth edition is shaping to be “Data Science and Democracy”. We are expecting a stimulating and multi-disciplinary meeting that will focus on the scientific advances in statistical learning and data science as well as on their impact on society.
Attention: This is not the same event as the 2018 Conference of the ASA Statistical Learning and Data Science section (formerly SLDM) which is held every other year. The next name collision is expected in 2024.
I was scavenging my old emails a couple weeks ago and found a copy of an early technical report that not only describes Graph Transformer Networks in a couple pages but also explains why they are defined the way they are.
Although Graph Transformer Networks have been introduced twenty years ago, they are considerably more powerful than most structured output machine learning methods. Not only do they handle the label bias problem as well as CRFs, but their hierarchical and modular structure lends itself to many refinements: they can be trained with weak supervision; they can handle pruned search strategies and adapt training to make the pruning work better; and they also provide a proven framework to reuse existing code and heuristics.
Graph transformer networks were also very successful in the real world. They have been used in commercially deployed check reading machines for more than a decade, processing about one billion checks per year. Unfortunately they are described in hard-to-read papers, such as (Bottou et al., CVPR 1997)) or such as the rarely read second half of (LeCun et al., IEEE 1998).
This tech report is now available on my web site.
Why settle for 60000 MNIST training examples when you can have one trillion?
The MNIST8M dataset was generated using the elastic deformation code originally written for (Loosli, Canu, and Bottou, 2007). Unfortunately the original MNIST8M files were accidentally deleted from the NEC servers a couple weeks ago. Instead of regenerating the files, I have repackaged the generation code in a convenient form. You can now generate arbitrary amounts of pseudo-random MNIST training examples. You can even use this code to generate your training data on the fly. We call this the infinite MNIST dataset.
Our paper “Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising” has appeared in JMLR. This paper takes the example of ad placement to illustrate how one can leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select changes that improve both the short-term and long-term performance. In particular, the paper demonstrates the connection between the classic explore–exploit and correlation–causation issues in machine learning and statistics.