SVMTorch: A new SVM program for Large-Scale Regression and Classification Problems

Samy Bengio bengio at idiap.ch
Tue Jul 25 08:49:37 EDT 2000


I would like to inform you of the following new SVM software
for large-scale regression and classification problems, available
at http://www.idiap.ch/learning/SVMTorch.html.

Information about this new software follows:

                                  SVMTorch

   A Support Vector Machine for Large-Scale Regression and Classification
                                  Problems

                     Ronan Collobert (collober at idiap.ch)

         IDIAP, CP 592, rue du Simplon 4, 1920 Martigny, Switzerland

Description

SVMTorch is a new implementation of Vapnik's Support Vector Machine that
works both for classification and regression problems, and that has been
specifically tailored for large-scale problems (such as more than 20000
examples, even for input dimensions higher than 100).

Source Code

The source code is free for academic use. It must not be modified or
distributed without prior permission the author. When using SVMTorch in your
scientific work, please cite the following article:

Ronan Collobert and Samy Bengio, Support Vector Machines for Large-Scale
Regression Problems, IDIAP-RR-00-17, 2000. (available at 
ftp://ftp.idiap.ch/pub/reports/2000/rr00-17.ps.gz).

The software has been successfully compiled on Sun/SOLARIS, Intel/LINUX and
Alpha/OSF operating systems. Your can download it from
ftp://ftp.idiap.ch/pub/learning/SVMTorch.tgz.


Try It !!!

First, you should download  the source code from
ftp://ftp.idiap.ch/pub/learning/SVMTorch.tgz and the examples from
ftp://ftp.idiap.ch/pub/learning/TrainData.tgz. Put this two archive files in 
the same directory, and decompress them with

zcat SVMTorch.tgz | tar xf -
zcat TrainData.tgz | tar xf -

It creates two new directories : "SVMTorch" and "TrainData".

Now, go in the "SVMTorch" directory and edit the Makefile. You should only
have to change the following lines, depending on your specific platform :

# C-compiler
#CC=gcc
CC=cc
# C-Compiler flags
#CFLAGS=-Wall -W -O9 -funroll-all-loops -finline -fomit-frame-pointer
-ffast-math
CFLAGS=-native -fast -xO5
# linker
#LD=gcc
LD=cc
# linker flags
#LFLAGS=-Wall -W -O9 -funroll-all-loops -finline -fomit-frame-pointer
-ffast-math
LFLAGS=-native -fast -xO5
# libraries
LIBS=-lm

The default configuration is set for a machine running with the Sun Workshop
compiler. An alternate (commented) configuration is proposed for the GNU gcc
compiler.

Type  "make all" and pray.

It should compile without any warning.

For some platform, you could have to change the include files needed for
"times", a non-standard function used by svm_torch. You would have to edit
the file "general.h" and change the lines

#ifdef I_WANT_TIME
#include <sys/times.h>
/*#include <limits.h>*/
#include <time.h>
#endif

If it doesn't work or if you don't want to measure the time of the learning
machine, just comment the line :
#define I_WANT_TIME

Note that in "general.h" you can comment the line
#define USEDOUBLE
in order to do the computations in float. IT'S A BAD IDEA : svm_torch needs
precision.

If everything went well, you should have two programs : "svm_torch" and
"svm_test". The first one is the learning machine and the second one is the
testing machine.
If you want to show all the options, just run svm_torch or svm_test without
any parameter.

To test the program in classification, try :
svm_torch -v -ae ../TrainData/classif_train.dat ../TrainData/model_dummy

It takes less than two minutes on a 300Mhz computer. You should have around
914 support vectors (this number could slightly change depending on the
precision of your machine).

To test the SVM on the train data, try :
svm_test -ae ../TrainData/model_dummy ../TrainData/classif_train.dat
You should have around 0.78% missclassified.

To test the program in regression, try :
svm_torch -v -ae -rm -st 900 -eps 20 ../TrainData/regress_train.dat
./TrainData/model_dummy
You should have around 597 support vectors.

Test the model with :
svm_test -ae ../TrainData/model_dummy ../TrainData/regress_train.dat
The mean squared error should be around 187.2.


Options

The general syntax of svm_torch and svm_test is
svm_torch [options] example_file model_file
svm_test [options] model_file test_file

Where "example_file" is your training set file, "test_file" is your testing
set file and "model_file" is the SVM-model created by svm_torch.

All options are described when you launch svm_torch or svm_test without any
argument.
By default, svm_torch is a classification machine. If you want the
regression machine, use option -rm.
You should always use option -v with svm_torch : it gives a current error
during learning. This error is only an indicator. It can oscillate.


File format

There are two main input formats for "input_file" and "test_file" in
SVMTorch : an ASCII format, and a binary one.

The ASCII format is the following:
<Number n of training/testing samples>  <Dimension d of each sample+1>
<a11> <a12> <a13> .... <a1d> <a1_out>
 .
 .
 .
<an1> <an2> <an3> .... <and> <an_out>

where <aij> is an ASCII floating point number corresponding to the j-th
value of the i-th example and <ai_out> is the i-th desired output (in
classification, it should be +1/-1).

With the same notation, the binary format is:
<Number n of training/testing samples> <Dimension d of each sample>
<a11>...<a1d> ....... <an1>...<and> <a1_out>... <an_out>
(First save the input table, then the output table, all in binary)

There is another special input format for svm_test, when you don't have the
desired output. (To use with the -no option).
The ASCII version of this format is :
<Number n of training/testing samples>  <Dimension d of each sample>
<a11> <a12> <a13> .... <a1d>
 .
 .
 .
<an1> <an2> <an3> .... <and>

And the binary version is :
<Number n of training/testing samples> <Dimension d of each sample>
<a11>...<a1d> ....... <an1>...<and>

-----
Samy Bengio  
Research Director. Machine Learning Group Leader.
IDIAP, CP 592, rue du Simplon 4, 1920 Martigny, Switzerland.
tel: +41 27 721 77 39, fax: +41 27 721 77 12.
mailto:bengio at idiap.ch, http://www.idiap.ch/~bengio 





More information about the Connectionists mailing list