More news

Joining the Flatiron Institute

After eleven exciting years at FAIR. I joined the Flatiron Institute at the Simons Foundation.

What is the most important letter in “AI”?

Much of the agitation these days is on letter “A” of “Artificial”. Because an artificial intelligence is one that he can own or sell, the Schumpeterian entrepreneur sees a chance to rewire our society, to “found a private kingdom”, “the nearest approach to medieval lordship possible to modern man”, to “prove oneself superior to others”, and also to experience the “joy of creating” and of “exercising one’s energy and ingenuity.” (Schumpeter, The Theory of Economic Development, 1934). We read this in the news every day.

However, the real prize is with letter “I” of “Intelligence”. We cannot overstate the role of intelligence in the development of mankind over many millennia. Yet we remain unable to precisely describe how intelligence works, or even define it without loopy concepts. However, we now have machines that display some level of intelligence. We do not understand precisely how they work, but we know much more about the operation of these machines than we know about our cognitive processes. We can use them as models. Any salient discussion of the nature of artificial intelligence is also a discussion of the nature of intelligence, and eventually a discussion that impacts all forms of human activity or knowledge. Such an outcome has a much bigger impact and lasts much longer than any business venture

My conclusion was that understanding AI is more important than just building it.

The Flatiron Institute focuses on computational methods for both mathematics and empirical sciences/ Therefore it provides a double opportunity to better understand AI, first because AI is an instance of computational mathematics, second because we can observe how AI will transform computational method for empirical sciences and maybe draw useful conclusions.

2026/02/25 21:16 · leonb

The Fiction Machine

This SIAM News article by Bernhard and I summarizes the argument of Borges and AI and make additional points about the curse of alignment and the machine within the machine.

2026/02/25 17:15 · leonb

Two lessons from ICLR 2025

Machine learning conferences nowadays are too large for my enjoyment. I made the trip to Singapore for two posters and a talk in the associative memory workshop. I spent my time listening to the morning keynotes, walking briskly in the poster room, and catching up with friends and colleagues from both industry and academia.

Fifteen minutes before the morning keynote

On my way back, I met Kyunghyun Cho in the airport. We had a drink over what we had learned. Cho always has great insights; he has invented attention mechanisms; he dines with Korean stars. Therefore, I know what I must do when he tells me “You should tweet that!”

Two weakly related points.

→ Read more...

2025/04/30 18:03 · leonb

Borges and AI

Léon Bottou and Bernhard Schölkopf https://arxiv.org/abs/2310.01425

We started this work mid-2022. AI was already turning into a mainstream topic. Both as a scientist and a member of the society, I was troubled by the ambient confusion between the actual AI technology and the AI of our dreams or nightmares. We seem unable to grasp this technology and its impact without referring to an AI mythology that maybe starts with Homer's golden maiden and was popularized by modern science fiction.

Therefore we decided to instead interpret the advances of AI using a very different lens: the fiction of Jorge Luis Borges, whose subtly ironical stories illuminate how language works and relates to reality. This intellectual exercise turned into a very fruitful exercise, one that has reframed our outlook on AI:

It clarifies the relation between AI and language models, or fiction machines.
It explains how humans perceive these technologies, searching for vindications that comfort our preconceptions, vainly attempting to purify the fiction machine, or trusting this modern Pythia over our own reason.
It also explains how fiction machines should be seen as tools to construct theories for both real and imagined worlds. The ability to create fictional stories —so-called “hallucinations”— is crucially important. For instance, to understand a factual story, say a historical battle, we must be able to imagine how different circumstance or decisions would have changed the events. This provides a new meaning to Pat Winston's claim about the centrality of story making and story telling.
And finally, it shows the importance how understanding the world through the right story. For instance, understanding the weather patterns through the mood of the Gods only went so far. Yet it took centuries to readjust.

→ Read more...

2023/12/19 16:38 · leonb

From Causal Graphs to Causal Invariance

Pointing out the very well written report Causality for Machine Learning recently published by Cloudera's Fast Forward Labs. Nisha Muktewar and Chris Wallace must have put a lot of work into this. This report stands out because they have a complete section about Causal Invariance and they neatly summarizes the purpose of our own Invariant Risk Minimization with beautiful experimental results.

→ Read more...

2020/06/16 15:44 · leonb

NYC Data Science Seminar

Alex Peysakhovich and I represent Facebook on the organizing committee of the NYC Data Science Seminar Series. This rotating seminar organized by Columbia, CornellTech, Facebook, Microsoft Research NYC, and New York University has featured a number of prominent speakers.

→ Read more...

2018/03/02 15:09 · leonb

Graph Transducer Networks explained

I was scavenging my old emails a couple weeks ago and found a copy of an early technical report that not only describes Graph Transformer Networks in a couple pages but also explains why they are defined the way they are.

→ Read more...

2015/05/14 15:40 · leonb

The infinite MNIST dataset

Why settle for 60000 MNIST training examples when you can have one trillion? The MNIST8M dataset was generated using the elastic deformation code originally written for (Loosli, Canu, and Bottou, 2007). Unfortunately the original MNIST8M files were accidentally deleted from the NEC servers a couple weeks ago. Instead of regenerating the files, I have repackaged the generation code in a convenient form.

→ Read more...

2014/07/11 22:52 · leonb

Explore/Exploit = Correlation/Causation!

Our paper“Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising” has appeared in JMLR. This paper takes the example of ad placement to illustrate how one can leverage causal inference to understand the behavior of complex learning systems interacting with their environment.

→ Read more...

2013/12/12 13:26 · leonb

Nips 2013

Nips just took place near Lake Tahoe. Many people have written how things are changing in machine learning. There also were many interesting papers and invited talks. Thanks to the program chairs Max and Zoubin for producing this exciting conference program. Thanks to the workshop chairs Rich Caruana and Gunnar Rätsch for the stimulating workshops. Thanks to Terry Sejnowsky for creating NIPS, and special thanks to Mary-Ellen Perry without whom nothing would happen.

2013/12/12 13:24 · leonb

Counterfactual Reasoning and Learning Systems

The report “Counterfactual Reasoning and Learning Systems” shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select changes that improve both the short-term and long-term performance of such systems. This work is illustrated by experiments carried out on the ad placement system associated with the Bing search engine.

2012/09/12 12:29 · leonb

SGD-2.0 released

Announcing version 2.0 of my Stochastic Gradient Descent package. This release provides implementations of the Stochastic Gradient Descent and Averaged Stochastic Gradient Descent algorithms for Linear SVMs and CRFs. The latter sometimes shows vastly superior performance. See the SGD package pages for details.

2011/10/11 18:51 · leonb

Natural Language Processing from Scratch

Ronan's masterpiece, "Natural Language Processing (Almost) from Scratch", has been published in JMLR. This paper describes how to use a unified neural network architecture to solve a collection of natural language processing tasks with near state-of-the-art accuracies and ridiculously fast processing speed. A couple thousand lines of C code processes english sentence at more than 10000 words per second and outputs part-of-speech tags, named entity tags, chunk boundaries, semantic role labeling tags, and, in the latest version, syntactic parse trees. Download SENNA!

2011/10/09 03:30 · leonb

Learning Semantics

Learning Semantics, Nips 2011 Workshop, Saturday December 17, 2011. Melia Sierra Nevada & Melia Sol y Nieve, Sierra Nevada, Spain.

This workshop is organized in collaboration with Antoine Bordes, Jason Weston, Ronan Collobert. This event should be very interesing: I believe that recent machine learning advances indicate new connections between machine learning and machine reasoning and lead to new opportunties for learning the semantics of the world.

2011/09/01 00:50 · leonb

From machine learning to machine reasoning

Over the last couple of years, I progressively formulated an unusual idea about the connection between machine learning and machine reasoning. I have discussed this idea with many friends and I even gave a seminar in Montreal in 2008. It is described in this technical report.

→ Read more...

2011/02/09 08:13

On the Vapnik-Chevonenkis-Sauer lemma

Many machine learning authors write that a certain fundamental combinatorial result was independently established by Vapnik and Chervonenkis (1971), Sauer (1972), Shelah (1972), and sometimes Perles and Shelah (reference unknown). Vapnik and Chervonenkis published a version of their results in the Proceedings of the USSR Academy of Sciences four years earlier in 1968. It also appears that Sauer and Shelah pursued this result for very different purposes.

→ Read more...

2010/12/20 17:52 · leonb

Microsoft

Patrice Simard and I have been friends since the old AT&T Bell Labs times. He eventually convinced me to work for him at Microsoft. He told me to expect “interesting times”.

I can see several reasons for these interesting times.

The scientific point of view. There are few places where I can find machine learning problems with similar scale, similar challenges, and similar impact. This practical experience will surely feed my future machine learning research. In fact I believe that such experiences are necessary to do research. One needs to see the world…

The social point of view. The Internet is the largest encyclopedia of knowledge ever known to mankind, and this is great. On the other hand, everything you do on the Internet is recorded by someone somewhere. Large online services such as Google or Microsoft concentrate unprecedented amounts of such information. Our society is not ready for that. Very good things or very bad things can happen equally easily. They will affect all of us. We cannot just watch and count the points.

The competitive point of view. Microsoft combines a difficult competitive position with considerable resources: it has both the will and the means to do new things on the scientific, engineering, economical, and social levels. How to resist that? Of course nothing is ever certain…

2010/05/14 19:17

Cos424

Rob Schapire and David Blei gave me the opportunity to teach the cos424 course at Princeton University for the spring 2010 semester. In fact Rob is on sabbatical leave at Yahoo! and David is parenting. Running the orphan course was a useful experience. One thousand slides later, I am really eager to see the student projects…

2010/04/30 03:00

Semantic Extraction with a Neural Network Architecture

Use BLAS, not PERL!

It is the nineties again. Ronan Collobert from NEC Labs just released a noncommercial version of his neural network system for semantic extraction. Given an input sentence in plain english, Senna outputs a host of Natural Language Processing (NLP) tags: part-of-speech (POS) tags, chunking (CHK), name entity recognition (NER), and semantic role labeling (SRL). Senna does this with state-of-the-art accuracies, roughly two hundred times faster than competing approaches.

The Senna source code represents about 2000 lines of C. This is probably one thousand times smaller than your usual natural language processing program. In fact all the Senna tagging tasks are performed using the same neural network simulation code.

Download Senna here. A Senna paper has been submitted to JMLR.

2010/02/16 15:16

SGDQN

The SGDQN paper has been published on the JMLR site. This variant of stochastic gradient got very good results during the first PASCAL Large Scale Learning Challenge. The paper gives a lot of explanation on the design of the algorithm. Source code is available from Antoine's web site.

2009/08/07 14:32

ICML 2009

ICML 2009 took place in June. Michael Littman and I were the program co-chairs. Since we were expecting a lot of work, we tried to make it interesting by experimenting with a number of changes in the review process. Read more for a little explanation and a few conclusions…

→ Read more...

2009/07/24 16:31

OLaRank Implementation Released

Antoine Bordes provides an implementation of the OLaRank algorithm.

OLaRank is an online solver of the dual formulation of support vector machines for structured output spaces. The algorithm can use exact or greedy inference. Its running time scales linearly with the data size, competitive with a perceptron based on the same inference procedure. Its accuracy however is much better as it replicates the accuracy of a structured SVM. See the ECML/PKDD paper "Sequence Labelling SVMs Trained in One Pass" for details.

2008/10/06 20:18

LaRank Implementation Released

Antoine Bordes provides an implementation of the LaRank algorithm, together with the datasets. This new implementation runs slightly faster than the code we have used for the LaRank paper. In addition there is a special version for the case of linear kernels.

2008/03/12 15:49

NIPS 2007: Learning with Large Datasets

A page has been allocated for my segment of the NIPS 2007 Tutorials. The second part of the tutorial Learning with Large Datasets was given by Alex Gray. Alex had to replace Andrew Moore on short notice because airplane delays conspired against our initial plans. The page contains the slides and a video recording a the lecture I gave at Microsoft Research a few days after NIPS.

2007/12/12 22:35

Blavatnik Award

During the 4th Annual Gala of the New York Academy of Sciences, I became one of the happy winners of the first Blavatnik Award for Young Scientists. The other finalists were very impressive. Choosing the winners must have been difficult. Leonard_Blavatnik told me he attended the Nobel ceremony a few years ago and thought that something similar should be done in New York for younger scientists. Apparently he plans to fund a similar award every year.

2007/12/12 22:27

Talks online

The talks page contains pointers to my most significant lectures. Slides are available under both the PDF and DjVu formats.

2011/06/25 23:11

Stochastic Gradient for SVM and CRF

You can now download fast stochastic gradient optimizers for linear Support Vector Machines (SVMs) and Conditional Random Fields (CRFs). Stochastic Gradient Descent has been historically associated with back-propagation algorithms in multilayer neural networks. These nonlinear nonconvex problems can be very difficult. Therefore it is useful to see how Stochastic Gradient Descent performs on such simple linear and convex problems. The benchmarks are very clear!

2007/08/23 14:22

Large-Scale Kernel Machines

MIT Press has announced the availability of the book Large-Scale Kernel Machines, edited by Léon Bottou, Olivier Chapelle, Dennis DeCoste, and Jason Weston. This book expands the theme of our NIPS 2005 workshop. The book homepage contains useful information. You can even find the complete BibTex file that was used to generate the list of references.

2007/08/10 13:22

Publication database updated

Thanks to a small lush script to parse BibTex files, all my publications are now indexed here. Most of them are available online. I still need to scan the oldest ones. My little BibTex parser now lives in the Lush CVS repository.

2011/06/25 23:11

Old website offline

Browsing http://leon.bottou.com now redirects you to this new website. There is still much work to be done.

2011/06/25 23:11

leon.bottou.org

Table of Contents

More news

Joining the Flatiron Institute

The Fiction Machine

Two lessons from ICLR 2025

Borges and AI

From Causal Graphs to Causal Invariance

NYC Data Science Seminar

Graph Transducer Networks explained

The infinite MNIST dataset

Explore/Exploit = Correlation/Causation!

Nips 2013

Counterfactual Reasoning and Learning Systems

SGD-2.0 released

Natural Language Processing from Scratch

Learning Semantics

From machine learning to machine reasoning

On the Vapnik-Chevonenkis-Sauer lemma

Microsoft

Cos424

Semantic Extraction with a Neural Network Architecture

SGDQN

ICML 2009

OLaRank Implementation Released

LaRank Implementation Released

NIPS 2007: Learning with Large Datasets

Blavatnik Award

Talks online

Stochastic Gradient for SVM and CRF

Large-Scale Kernel Machines

Publication database updated

Old website offline