Training Invariant Support Vector Machines using Selective Sampling

Abstract: Bordes et al (2005) describe the efficient online LASVM algorithm using selective sampling. On the other hand, Loosli et al. (2005) propose a strategy for handling invariance in SVMs, also using selective sampling. This paper combines the two approaches to build a very large SVM. We present state-of-the-art results obtained on a handwritten digit recognition problem with 8 millions examples on a single processor. This work also demonstrates that online SVMs can effectively handle really large databases.

Gaëlle Loosli, Stéphane Canu and Léon Bottou: Training Invariant Support Vector Machines using Selective Sampling, in Large Scale Kernel Machines, Léon Bottou, Olivier Chapelle, Dennis DeCoste, and Jason Weston editors, 301–320, MIT Press, Cambridge, MA., 2007.

Technical report: loosli-2006.djvu loosli-2006.pdf loosli-2006.ps.gz

@incollection{loosli-canu-bottou-2006,
  author = {Loosli, Ga\"{e}lle and Canu, St\'{e}phane and Bottou, L\'{e}on},
  title = {Training Invariant Support Vector Machines using Selective Sampling},
  pages = {301-320},
  editor = {Bottou, L\'{e}on and Chapelle, Olivier and {DeCoste}, Dennis and Weston, Jason},
  booktitle = {Large Scale Kernel Machines},
  publisher = {MIT Press},
  address = {Cambridge, MA.},
  year = {2007},
  url = {http://leon.bottou.org/papers/loosli-canu-bottou-2006},
}

Implementation Details

In response to various inquiries regarding the experimental setup:

All experiments were carried out on a dual Opteron machine running 2.4GHz and equipped with 16GB of main memory. The cache sizes were chosen to ensure that any experiment would fit in 8GB allowing us to run two simultaneous experiments on this computer. The memory usage consists of roughly 700MB of data to generate the training examples on-the-fly (MNIST digits, Lie derivatives, precomputed random vector fields), 500MB to cache transformed digits, and 6.5GB of kernel cache.

The LASVM algorithm is implemented by reusing a few files from the distributed LASVM source code (messages.c, kcache.c, and lasvm.c). Documentation for these files is provided in the corresponding header files. On-the-fly generation and caching of the training examples was realized inside a highly optimized kernel function (undocumented). This kernel function is simply passed to the kernel cache constructor ''lasvm_kcache_create()'. The glue code was written in Lush using the standard LASVM bindings.

Datasets

The datasets used for these experiments were generated on the fly by performing careful elastic deformation of the original MNIST training set. For convenience, we provide two files containing the 8100000 examples generated for our final experiment, using the same format as the original MNIST files. Tests were performed on the standard MNIST test files.

There is not point trying to load these files into the distributed LASVM program. The distributed code uses a kernel representation that was designed to perform like LIBSVM and is completely unsuitable for this purpose. See the implementation details above.

See also

 
papers/loosli-canu-bottou-2006.txt · Last modified: 2008/04/23 10:53 by leonb
Recent changes RSS feed Creative Commons License DjVu Enabled Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki