NYC Data Science Seminar
Alex Peysakhovich and I represent Facebook on the organizing committee of the NYC Data Science Seminar Series. This rotating seminar organized by Columbia, CornellTech, Facebook, Microsoft Research NYC, and New York University has featured a number of prominent speakers. Although there is a
Towards principled methods for training generative adversarial networks
Towards principled methods for training generative adversarial networks
Abstract: The goal of this paper is not to introduce a single algorithm or method, but to make theoretical steps towards fully understanding the training dynamics of generative adversarial networks. In order to substantiate our theoretical analysis, we perform targeted experiments to verify our assumptions, illustrate our claims, and quantify the phenomena. This paper is divided into three sections. The first section introd…
Neuristique s.a.
Neuristique s.a.
Neuristique was founded in 1988 by a dozen friends with big dreams.
The mission statement was very long sentence that mentions
the application of artificial neural networks, the development of artificial brains,
the application of artificial neural networks, the development of artificial brains,
and the exploration of space. We were very young and inexperienced.
The BackPropagation Cook Book
The BackPropagation Cook Book
This lecture co-authored with Yann Le Cun
took place at the 1996 NIPS Workshop
Tricks of the Trade
organized by Klaus-Robert Müller and Genevieve Orr.
* See the slides (djvu 199KB) (pdf 2.7MB).
Graph Transformer Networks
http://leon.bottou.org/talks/gtn?rev=1515614299
Graph Transformer Networks
This lecture describe Graph Transformer Networks
It took place at the 2001 ICML workshop Machine Learning for Spatial and Temporal Data organized by Tom Dietterich.
Graph Transformer Networks are one of the most powerful and successful method for learning sequential data. About 10% to 20% of the checks written in the U.S. since 1996 have been processed by a Graph Transformer Network. Graph Transformer Networks are related to
DjVu: Scanned Documents on the Web
DjVu: Scanned Documents on the Web
DjVu is a document compression system that allows the distribution of scanned documents on the web. DjVu files are very compact. A typical 300dpi bitonal page takes 10-15KB. A typical 300dpi color page takes 40-60KB. The presentation discusses the main technical innovations that made DjVu possible.
On the Vapnik-Chevonenkis-Sauer lemma
On the Vapnik-Chevonenkis-Sauer lemma
Many machine learning authors write that a certain fundamental combinatorial result
was independently established by Vapnik and Chervonenkis (1971), Sauer (1972),
Shelah (1972), and sometimes Perles and Shelah (reference unknown).
Vapnik and Chervonenkis published a version of their results in the
Proceedings of the USSR Academy of Sciences four years earlier in 1968.
It also appears that Sauer and Shelah pursued this result
for very different purposes.text/html2017-11-29T10:30:30-04:00leonbpapers:bottou-tricks-2012 - [Erratum]
Stochastic Gradient Tricks
Stochastic Gradient Tricks
Abstract:
The first chapter of Neural Networks, Tricks of the Trade
strongly advocates the the stochastic back-propagation method to
train neural networks. This is in fact an instance of a more general
technique called stochastic gradient descenttext/html2017-11-29T10:30:04-04:00leonbpapers:bottou-jmlr-2013 - [Erratum]
Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising
Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising
Abstract:
Abstract:
This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select changes that improve both the short-term and long-term performance of such systems. This work is illustrated by experiments carried out on the ad p…
On-line Learning for Very Large Datasets
On-line Learning for Very Large Datasets
Excerpt:
The main point of this paper is to show that, in situations where the supply
of training samples is essentially unlimited, a well designed on-line
algorithm converges toward the minimum of the expected costtext/html2017-11-29T10:27:15-04:00leonbpapers:bordes-bottou-gallinari-2009 - [Errata]
SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent
SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent
Abstract:
The SGDQN algorithm is a stochastic gradient descent
algorithm that makes careful use of second-order information
and splits the parameter update into independently scheduled
components. Thanks to this design, SGDQN iterates nearly as
fast as a first-order stochastic gradient
descent but requires less iterations
to achieve the same accuracy.
This algorithm won thetext/html2017-11-29T10:25:27-04:00leonbpapers:bordes-ertekin-weston-bottou-2005 - [4. Erratum]
Fast Kernel Classifiers with Online and Active Learning
Fast Kernel Classifiers with Online and Active Learning
Abstract:
Very high dimensional learning systems become theoretically possible when
training examples are abundant. The computing cost then becomes the limiting
factor. Any efficient learning algorithm should at least pay a brief look at
each example. But should all examples be given equal attention?
This contribution proposes an empirical answer.
We first presents an online SVM algorithm based on this premise.
LASVM yields competitive…text/html2017-09-18T14:54:00-04:00leonbtalks:tuebingen.png - created
text/html2015-12-17T15:46:14-04:00leonbtalks:concepts.png - created
