LASVM is an approximate SVM solver that uses online approximation. It reaches accuracies similar to that of a real SVM after performing a single sequential pass through the training examples. Further benefits can be achieved using selective sampling techniques to choose which example should be considered next.
As show in the graph, LASVM requires considerably less memory than a regular SVM solver. This becomes a considerable speed advantage for large training sets. In fact LASVM has been used to train a 10 class SVM classifier with 8 million examples on a single processor.
See the LaSVM paper for the details.
We provide a complete implementation of LASVM under the well known GNU Public License.
|lasvm-source-1.0.tar.gz||LaSVM Source code||Initial version (2005)|
|lasvm-source-1.1.tar.gz||LaSVM Source code||Minor fixes (7/2009)|
This source code contains a small C library implementing the kernel cache and the basic process and reprocess operations. Two additional C++ programs,
la_test can be used to run experiments. Calling these programs without arguments gives a summary of the command line arguments and options.
These programs can handle three data file format. Sample files can be downloaded from the publication page.
These files represent examples using a simple text format. Each example is represented by a line in the following format
<line> = <target> <feature>:<value> ... <feature>:<value> <target> = +1 | -1 <int> <feature> = <integer> <value> = <float>
The target value and each of the feature/value pairs are separated by a space character. The target value denotes the class of the example, +1 or -1. For example, the line -1 1:0.43 3:0.12 9284:0.2 specifies a negative example for which feature number 1 has the value 0.43, feature number 3 has the value 0.12, feature number 9284 has the value 0.2, and all the other features have value 0.
Binary files take less space and load faster. One can convert between LIBSVM/SVMLight text files and binary files using the programs
bin2libsvm included in the source.
Split files are a handy way to save disk space if you have copies of the same data with different training/ test set splits, and/or different target classes. They work by loading an original data file, but then specifying a subset of the data to load (by index) as well as a possible relabeling of the data points. The split file format is best shown by an example:
file_name: mnist.trn.bin binary_file: 1 supply_indices: 1 supply_new_labels: 1 3 -1 4 1 7 -1 ...
The first line specifies a data file in either Libsvm or binary format to load. The second line indicates if that file is binary (set to 1) or otherwise (set to 0). The third line specifies whether you wish to load a subset of the given file (set to 1) and the fourth line,
supply_new_labels indicates whether you wish to relabel the data differently to the original file. Following the first four lines is a list, of either
pairs (if supply_new_labels is set to 1) or else only an
<index> is given on each line. These indices (starting from 1) specify examples in the original file.