====== Fast Kernel Learning Algorithms ======
This is a subtopic of [[:research:largescale|Large-Scale Learning]].
Kernel machines are often associated with dual techniques
that implement very large parameter spaces at the expense
of scalability in the number of examples.
However, kernel machines can scale nicely by cleverly approximating
the conventional optimization problem solved during learning.
These ideas are not specific to kernel machines
and are being actively explored in the context of
other learning algorithms.
Starting with relatively well understood kernel machines
has provided a sounder way to assess their impact
on the statistical efficiency of learning algorithms.
===== LaSVM and related algorithms =====
Earlier work has established that second-order [[:research:stochastic|stochastic gradient]] learning algorithm need only a single pass over the training data to achieve a generalization performance comparable to that of an exact optimization algorithm. Alas second-order stochastic gradient algorithm become quite memory intensive when then
dimension of the problem increases.
{{:papers:lasvm-epsiloncache.png }}
//Reprocessing// training examples is a simple workaround.
Whenever we process a new training example,
we can also return on the past training examples
in order to approach the performance of a full optimization
on all the training examples, including the new one.
Of course storing all these examples could be extremely inefficient.
We found that we can store a small subset of the past training examples
that summarizes them well. Furthermore, after processing a new training example,
we found sufficient to only reprocess a couple stored examples to recover,
on average, the desired level of performance.
In the case of a [[wp>Support Vector Machines]],
the obvious choice consists of summarizing
the examples using the Support Vectors.
We designed a sequence of online SVM algorithms, the [[:papers:bordes-bottou-2005|Huller]], [[:papers:bordes-ertekin-weston-bottou-2005|LaSVM]], and [[papers:bordes-2007|LaRank]].
These algorithms reach generalization peformances comparable to the batch SVMs
after a single pass over the training examples.
Most training examples will only be seen once by the learning algorithm. This property considerably reduces the memory requirements of kernel algorithms. As a consequence these algorithms scale to datasets containing [[:papers:loosli-canu-bottou-2006|millions of examples]].
===== Large-scale active learning =====
{{active.png?376 }}
Since we can summarize the past training examples using a well chosen subset,
we can also consider selecting each new training examples.
Instead of performing a sequential pass over the training set,
we can actively pick informative training examples,
that is, examples located in the vicinity of the
decision boundary. This //active learning// strategy
solves two problems that usually plague
support vector machines.
* Support vector machines trained on noisy datasets typically have many support vectors. Active learning support vector machines [[:papers:bordes-ertekin-weston-bottou-2005|remain sparse on noisy datsets]] and even enjoy a slight performance advantage because of the increased sparsity.
* Support vector machines trained on datasets with imbalanced classes have degraded performance. Active learning support vector machines [[:papers:ertekin-2007|maintain their performance]] because all datasets are balanced when one selects examples near the decision boundary.
===== Collaborators =====
* [[http://www.pascal-network.org/Network/Researchers/847|Antoine Bordes]]
* [[http://www.cse.psu.edu/~sertekin/|Seyda Ertekin]]
* [[http://gaelle.loosli.fr|Gaëlle Loosli]]
* [[http://ml.nec-labs.com/person.php?person=jason|Jason Weston]]
===== Nips Workshop: Large-Scale Kernel Machines =====
[[:papers:lskm-2007|{{:papers:lskm.jpg?95 }}]]
There is a tension between the challenges of large-scale datasets and
the relative slowness of the standard kernel machines algorithms.
To make them fast, one needs to bring new ideas.
This was the starting idea of the
[[http://nips.cc/Conferences/2005/Workshops|NIPS 2005 workshop]] on
[[http://nips.cc/Conferences/2005/Workshops/WorkshopAbstracts.html#LargeScale|Large-Scale Kernel Machines]],
organized by [[http://research.yahoo.com/~chap|Olivier Chapelle]],
[[http://www.google.com/search?q=dennis+decoste|Dennis DeCoste]],
[[http://ml.nec-labs.com/person.php?person=jason|Jason Weston]],
and myself.
We were not disappointed.
The [[:papers:lskm-2007|Large-Scale Kernel Machines]] book
progresses from well-understood techniques to increasingly
controversial topics.
For instance, the last chapter considers problems that animals
perform effortlessly, such as perception and control, and argue that
local kernel representations are inefficient for these problems. Meanwhile, this
argument might not apply to other relevant tasks, such as mining transaction logs.
Are these classes of problems different enough to justify distinct
scientific approaches?
===== Publications =====
Antoine Bordes and Léon Bottou: **The Huller: a simple and efficient online SVM**, //Machine Learning: ECML 2005//, 505-512, Lecture Notes in Artificial Intelligence, LNAI 3720, Springer Verlag, 2005.
[[:papers/bordes-bottou-2005|more...]]
Antoine Bordes, Seyda Ertekin, Jason Weston and Léon Bottou: **Fast Kernel Classifiers with Online and Active Learning**, //Journal of Machine Learning Research//, 6:1579-1619, September 2005.
[[:papers/bordes-ertekin-weston-bottou-2005|more...]]
Gaëlle Loosli, Stéphane Canu and Léon Bottou: **Training Invariant Support Vector Machines using Selective Sampling**, in //[[:papers/lskm-2007|Large Scale Kernel Machines]]//, Léon Bottou, Olivier Chapelle, Dennis DeCoste, and Jason Weston editors, 301--320, MIT Press, Cambridge, MA., 2007.
[[:papers/loosli-canu-bottou-2006|more...]]
Antoine Bordes, Léon Bottou, Patrick Gallinari and Jason Weston: **Solving MultiClass Support Vector Machines with LaRank**, //Proceedings of the 24th International Machine Learning Conference//, 89-97, Edited by Zoubin Ghahramani, OmniPress, Corvallis, Oregon, 2007.
[[:papers/bordes-2007|more...]]
//**Large Scale Kernel Machines**//,
Edited by Léon Bottou, Olivier Chapelle, Dennis DeCoste, and Jason Weston,
Neural Information Processing Series, MIT Press, Cambridge, MA., 2007.
[[:papers/lskm-2007|more...]]
Seyda Ertekin, Jian Huang, Léon Bottou and C. Lee Giles: **Learning on the Border: Active Learning in Imbalanced Data Classification**, //Proceedings of the 16th Conference on Information and Knowledge Management, CIKM2007//, ACM Press, Lisboa, November 2007.
[[:papers/ertekin-2007|more...]]