I am a research scientist with broad interests in practical and theoretical machine learning. My work on large scale learning and stochastic gradient algorithms has received attention in the recent years. I am also known for the DjVu document compression system.
Use the sidebar to navigate this site.
Many machine learning authors write that a certain fundamental combinatorial result was independently established by Vapnik and Chervonenkis (1971), Sauer (1972), Shelah (1972), and sometimes Perles and Shelah (reference unknown). Vapnik and Chervonenkis published a version of their results in the Proceedings of the USSR Academy of Sciences four years earlier in 1968. It also appears that Sauer and Shelah pursued this result for very different purposes.
Patrice Simard and I have been friends since the old AT&T Bell Labs times. He eventually convinced me to work for him at Microsoft. He told me to expect “interesting times”.
I can see several reasons for these interesting times.
- The scientific point of view. There are few places where I can find machine learning problems with similar scale, similar challenges, and similar impact. This practical experience will surely feed my future machine learning research. In fact I believe that such experiences are necessary to do research. One needs to see the world…
- The social point of view. The Internet is the largest encyclopedia of knowledge ever known to mankind, and this is great. On the other hand, everything you do on the Internet is recorded by someone somewhere. Large online services such as Google or Microsoft concentrate unprecedented amounts of such information. Our society is not ready for that. Very good things or very bad things can happen equally easily. They will affect all of us. We cannot just watch and count the points.
- The competitive point of view. Microsoft combines a difficult competitive position with considerable resources: it has both the will and the means to do new things on the scientific, engineering, economical, and social levels. How to resist that? Of course nothing is ever certain…
Rob Schapire and David Blei gave me the opportunity to teach the cos424 course at Princeton University for the spring 2010 semester. In fact Rob is on sabbatical leave at Yahoo! and David is parenting. Running the orphan course was a useful experience. One thousand slides later, I am really eager to see the student projects…
It is the nineties again. Ronan Collobert from NEC Labs just released a noncommercial version of his neural network system for semantic extraction. Given an input sentence in plain english, Senna outputs a host of Natural Language Processing (NLP) tags: part-of-speech (POS) tags, chunking (CHK), name entity recognition (NER), and semantic role labeling (SRL). Senna does this with state-of-the-art accuracies, roughly two hundred times faster than competing approaches.
The Senna source code represents about 2000 lines of C. This is probably one thousand times smaller than your usual natural language processing program. In fact all the Senna tagging tasks are performed using the same neural network simulation code.