## Training Invariant Support Vector Machines using Selective Sampling

Abstract: Bordes et al (2005) describe the efficient online LASVM algorithm using selective sampling. On the other hand, Loosli et al. (2005) propose a strategy for handling invariance in SVMs, also using selective sampling. This paper combines the two approaches to build a very large SVM. We present state-of-the-art results obtained on a handwritten digit recognition problem with 8 millions examples on a single processor. This work also demonstrates that online SVMs can effectively handle really large databases.

Gaëlle Loosli, Stéphane Canu and Léon Bottou: Training Invariant Support Vector Machines using Selective Sampling, in Large Scale Kernel Machines, Léon Bottou, Olivier Chapelle, Dennis DeCoste, and Jason Weston editors, 301–320, MIT Press, Cambridge, MA., 2007.

Technical report: loosli-2006.djvu loosli-2006.pdf loosli-2006.ps.gz

@incollection{loosli-canu-bottou-2006,
author = {Loosli, Ga\"{e}lle and Canu, St\'{e}phane and Bottou, L\'{e}on},
title = {Training Invariant Support Vector Machines using Selective Sampling},
pages = {301-320},
editor = {Bottou, L\'{e}on and Chapelle, Olivier and {DeCoste}, Dennis and Weston, Jason},
booktitle = {Large Scale Kernel Machines},
publisher = {MIT Press},
year = {2007},
url = {http://leon.bottou.org/papers/loosli-canu-bottou-2006},
}

### Implementation Details

In response to various inquiries regarding the experimental setup:

All experiments were carried out on a dual Opteron machine running 2.4GHz and equipped with 16GB of main memory. The cache sizes were chosen to ensure that any experiment would fit in 8GB allowing us to run two simultaneous experiments on this computer. The memory usage consists of roughly 700MB of data to generate the training examples on-the-fly (MNIST digits, Lie derivatives, precomputed random vector fields), 500MB to cache transformed digits, and 6.5GB of kernel cache.

The LASVM algorithm is implemented by reusing a few files from the distributed LASVM source code (messages.c, kcache.c, and lasvm.c). Documentation for these files is provided in the corresponding header files. On-the-fly generation and caching of the training examples was realized inside a highly optimized kernel function (undocumented). This kernel function is simply passed to the kernel cache constructor ''lasvm_kcache_create()'. The glue code was written in Lush using the standard LASVM bindings.

### Datasets

The datasets used for these experiments were generated on the fly by performing careful elastic deformation of the original MNIST training set.

We used to provide two files containing the 8100000 examples generated for our final experiment. Unfortunately these files were accidentally deleted in 2014 from the NEC server that used to host them. Instead of regenerating these examples, we found more useful to package the code that was used to generate them in the first place. We call this the infinite MNIST dataset.

Note that there is not point trying to load such large files into the distributed LASVM program. The distributed code uses a kernel representation that was designed to perform like LIBSVM and is completely unsuitable for this purpose. See the implementation details above.