NODElib Documentation

By Gary William Flake

NAME

svm.h - support vector machines and the SMORCH algorithm.

SYNOPSIS

This package defines support vector machines (SVMs) for both classification and regression problems. The SVMs can use a wide variety of kernel functions. Optimization of the SVMs is performed by a variation of John Platt's sequential minimal optimization (SMO) algorithm. This version of SMO is generalized for regression, uses kernel caching, and incorporates several heuristics; for these reasons, we refer to the optimization algorithm as SMORCH. SMORCH has been shown to be over an order magnitude faster than SMO, QP, and decomposition.

#include <svm.h>

SVM *smorch_train(SMORCH *smorch);

double smorch_objective(SMORCH *smorch);

double smorch_constraint(SMORCH *smorch);

double svm_output(SVM *svm, double *x);

void svm_write_fp(SVM *svm, FILE *fp);

int svm_write(SVM *svm, char *fname);

SVM *svm_read_fp(FILE *fp);

SVM *svm_read(char *fname);

void svm_destroy(SVM *svm);

double svm_kernel_linear(double *x1,
      double *
x2, double aux,
      int 
dim);

double svm_kernel_gauss(double *x1,
      double *
x2, double aux,
      int 
dim);

double svm_kernel_poly(double *x1,
     double *
x2, double aux,
     int 
dim);

double svm_kernel_tanh(double *x1,
     double *
x2, double aux,
     int 
dim);

double svm_kernel_coulomb(double *x1,
      double *
x2, double aux,
      int 
dim);

DESCRIPTION

Quick Start

If you want to use SVMs and SMORCH, but don't want to think about it too much, read this section, read about the mandatory fields of the SMORCH data type, skip ahead to the HEURISTICS section to learn about the runtime options, and use the demo program, smorch, which should be in the examples subdirectory of the NODElib distribution.

You only need to consult this entire document if you wish to modify the internals of this package for research purposes.

The quickest way to get going with this package is to just use the smorch program. Essentially, the smorch program is nothing but a convenient wrapper around these routines. It accepts a number of command-line options for selecting where your training data comes from, and for selecting how the optimization procedure should behave.

The optimization parameters are summarized under the SMORCH data type and in the HEURISTICS section. The remainder of this Quick Start subsection will only describe the command line options of smorch not covered elsewhere in this document.

In general, you can have smorch read data from an ASCII file or a binary file. ASCII files work just as you would think; Numbers can look like anything that scanf() can read, and they can be broken by whitespace in any manner you choose. The command line options -xdim and -ydim specify the input and output dimensionality, respectively. Lines that begin with "^ *#" are interpreted as comments and are skipped.

Binary files are a little more subtle and behave as follows:

In all cases for binary files, the -xdim and -ydim options behave just as they do for ASCII files. For all file types, you'll want to use -yindex UINT as an option if you are training on data sets that have multiple outputs.

If you are training on a regression file, and you are reading data from an ASCII file, you can use -xdelta UINT and -offset UINT to specify a delayed coordinate embedding of the data. This is A Good Thing because it preserves memory, but it forces the input vectors to be constructed at runtime each time they are used. -xdelta UINT specifies the space between successive inputs in the time series, and -offset UINT specifies the distance between the last input and the single target output value in the time series. See the SERIES type for more information on how to interpret these values for time series data.

Finally, you can use -dump to force the program to dump an output file named smorch.tst which contains the results of testing the trained SVM on the training set. Each line in the dump file will contain the input vector, the target output, and the SVM output.

When in doubt, run smorch -help for a list of command line options.

Type Declarations

The following types are defined in the header file svm.h.

SVM

typedef struct SVM {
  unsigned xdim;          /* The input dimensionality.        */
  unsigned sz;            /* The number of support vectors.   */
  unsigned regression;    /* Is this a regression SVM?        */
  double b;               /* The output bias value.           */
  double *alpha;          /* The Lagrange multipliers.        */
  double **x;             /* The input vectors.               */
  double *y;              /* The target output values.        */
  double aux;             /* Auxiliary parameter for kernels. */
  /*
   * The kernel function.
   */
  double (*kernel)(double *x1, double *x2,
		   double aux, int dim);
} SVM;

The SVM data type contains the minimal amount of information required to define a support vector machine. You probably do not need to know about the internals of this data type because most of what you may want to do with an SVM can be done with one of the functions in this package. Morevover, you never actually create an SVM yourself because it is the return type of smorch_train().

SMORCH_CACHE

typedef struct SMORCH_CACHE {
  double **vals;      /* The cached values. */  
  char **status;      /* Indicates if a value is already computed. */
  unsigned *index;    /* The cache index for some exemplar. */
  unsigned *invindex; /* The dataset index of a cache index. */
  unsigned size;      /* The size of the cache. */
  unsigned used;      /* The number of cache cells used. */
  unsigned limit;     /* The number of dataset exemplars. */
  unsigned notickle;  /* When set, prevents changes in the cache. */
  LIST *order;        /* The LRU cache elements. */
  LIST *free;         /* A list of unused cache nodes. */
  LIST_NODE *nodes;   /* The array of cache nodes. */
} SMORCH_CACHE;

The SMORCH_CACHE data type is an internal structure used to save kernel outputs for execution efficiency. If your cache is set up to have a size of N, then this data structure will hold an N x N matrix of kernel values. Again, you probably don't need to look at this data structure at all; the information here is only for the curious.

SMORCH

typedef struct SMORCH {
  /* 
   * The following fields define the basic structure of the problem
   * and the data set.
   */
/* The input dimensionality (automatically set). */ unsigned xdim; /* The number of exemplars (automatically set). */ unsigned sz; /* MANDATORY: the training dataset. */ DATASET *data; /* OPTIONAL: the kernel function. */ double (*kernel)(double *x1, double *x2, double aux, int dim);
/* * The next set of fields are basic optimization options. */
/* OPTIONAL: the size of the cache (remember, this is squared). */ unsigned cache_size; /* OPTIONAL: the output in the dataset that is the target. */ unsigned yindex; /* OPTIONAL: if greater than zero, the decomposition subset size. */ unsigned subset_size; /* OPTIONAL: the box constraint for Lagrange multipliers. */ double C; /* OPTIONAL: classification tolerance value. */ double tol; /* OPTIONAL: machine floating point epsilon. */ double eps; /* OPTIONAL: the auxiliary parameter for the kernel functions. */ double aux;
/* * Regression specific fields: */
/* OPTIONAL: is this a regression problem? */ unsigned regression; /* OPTIONAL: value for the epsilon-insensitive regression loss function. */ double regeps;
/* * The next four fields are special training options. */
/* OPTIONAL: use on-demand incremental SVM output calculation? */ unsigned ultra_clever; /* OPTIONAL: look for the best pairing in the cache? */ unsigned best_step; /* OPTIONAL: attempt to optimize exemplars that violate KKT conditions * by the greatest margin. This is a bad idea. */ unsigned worst_first; /* OPTIONAL: avoid thrashing problem. */ unsigned lazy_loop;
/* * The next set of fields are very experimental. Don't use them. */
/* OPTIONAL: use a shrinking tube for machine tolerance? */ unsigned tube; /* OPTIONAL: shrinking factor for each tube iteration. */ double tube_rate; /* OPTIONAL: value to halt shrinking tube. Don't ask. I can't explain it. */ double tube_final;
/* * The next set of fields are hooks that you can set to do something * periodically. */
/* OPTIONAL: a hook that is called on every SMORCH iteration. */ int (*hook)(struct SMORCH *smorch); /* OPTIONAL: the hook that is called when SMORCH is finished. */ int (*finalhook)(struct SMORCH *smorch); /* OPTIONAL: a field that you can pass your own data to. */ void *obj;
/* * The epoch number, the number of Lagrange multipliers changed at * the last SMORCH epoch, and the last known value of the objective functions. */ unsigned epoch, num_changed; double objective;
/* * The SVM output bias, the Lagrange multipliers, and the target * outputs. */ double b; double *alpha; double *y;
/* * The actual SVM outputs (if known), the kernel values of * K(x_i,x_i), and some temporary scratch space. */ double *out; double *kii; double *temp;
/* * The next two arrays allow us to index directly into the two lists * that follow. */ LIST_NODE **nonzero_node; LIST_NODE **nonbound_node;
/* * The list of nonzero Lagrange multipliers, the list of non-bounded * Lagrange multipliers, and a free list of nodes. */ LIST *nonzero; LIST *nonbound; LIST *freelist;
/* The cache structure used for this SMORCH instantiation. */ SMORCH_CACHE *cache;
/* Index arrays for random sampling and sub-sampling. */ unsigned *random_index; unsigned *sub_index;
/* * These fields are used for on-demand incremental SVM output * calculation. They store incremental changes to the Lagrange * multipliers so that SVM output evaluation can use the fastest * method possible. */ unsigned time; unsigned *alpha_index; unsigned *update_time; double *delta_alpha; double *store_b;
/* This is useless. it just stores the number of kernel evaluations * required for the last SVM output evaluation. */ unsigned total_output_kern;
/* This is used internally to force SMORCH to consider all exemplars * when looking for something to optimize. */ unsigned examine_all;
/* Some cache statistics. */ unsigned cache_hit, cache_miss, cache_avoid;
/* Just as the name says. */ unsigned prev_examine_all;
/* Internal field used to indicate if input vectors need to be * copied. */ unsigned safe_ds_pointers; } SMORCH;

The SMORCH data type is used to completely define how calls to smorch_train() behaves. If you want to use this package in a sophisticated manner, then you need to pay attention to the fields marked as MANDATORY or OPTIONAL. (If you are using the SMORCH data type at this level, then you are probably not using the included smorch example program.)

Since you are still reading this, you probably want to know how to use this data type in greater detail. Here's the scoop. The first thing you should do is to declare a SMORCH in your code. Do this so that you have sane initial values for the fields with:

SMORCH smorch = SMORCH_DEFAULT;

After this declaration, you can set the fields of smorch to your desired values. The only thing that you must set is the source of your training data:

smorch.data = my_data_set;

where my_data_set is a proper DATASET. After that you can call:

SVM *svm = smorch_train(&smorch);

And that's it! The returned result in svm contains all of your support vectors and optimal Lagrange multipliers. With svm, you can evaluate your SVM with respect to an arbitrary input, or save everything to disk for later use.

However, using a SMORCH in this manner will probably give you undesirable results. To really use a SMORCH, you need to set the OPTIONAL runtime parameters to reasonable values. A short list of things that you'll want to think about are:

There are a few other optimization options. These are covered in the HEURISTICS section.

Function Definitions

The following function prototypes are given in the header file svm.h.

SVM *smorch_train(SMORCH *smorch);

Call the SMORCH algorithm as specified by the smorch argument, and return an optmized SVM. See the documentation for the SMORCH data type, and the HEURISTICS section for more details.

double smorch_objective(SMORCH *smorch);

Evaluates the objective function of the SVM being optimized by this SMORCH instance. Note that this function sums over all non-zero Lagrange multipliers, so it can be very slow. Instead, you can peak at smorch->objective inside of a hook, which will give you the objective function value that has been incrementally updated. This latter method is preferred because it takes constant time.

double smorch_constraint(SMORCH *smorch);

Evaluates the linear constraints of the SVM being optimized by this SMORCH instance. At all times, the constraints should be obeyed within machine tolerance. You should only call this if you doubt that last statement because this function must sum over all non-zero Lagrange multiplier, which can take a considerable amount of time.

double svm_output(SVM *svm, double *x);

Evaluates svm with the input vector x. Make sure that x is of a dimensionality that is identical to the input dimensionality of the training set used to optimize svm.

void svm_write_fp(SVM *svm, FILE *fp);

Writes a description of svm to the FILE pointer referred to by fp. The input functions svm_read_fp() and svm_read() can be used to read in the SVM at a later time.

int svm_write(SVM *svm, char *fname);

Writes a description of svm to the file named by fname. The input functions svm_read_fp() and svm_read() can be used to read in the SVM at a later time.

SVM *svm_read_fp(FILE *fp);

Reads and returns an SVM that resides in the FILE pointer referred to by fp.

SVM *svm_read(char *fname);

Reads and returns an SVM that resides in the file named by fname.

void svm_destroy(SVM *svm);

Frees all memory associated with svm.

double svm_kernel_linear(double *x1,
      double *
x2, double aux,
      int 
dim);

A linear kernal defined as x1 . x2.

double svm_kernel_gauss(double *x1,
      double *
x2, double aux,
      int 
dim);

A Gaussian kernel defined as exp(-||x1-x2||^2 / aux^2).

double svm_kernel_poly(double *x1,
     double *
x2, double aux,
     int 
dim);

A polynomial kernal defined as (x1 . x2 + 1)^aux.

double svm_kernel_tanh(double *x1,
     double *
x2, double aux,
     int 
dim);

A hyperbolic tangent kernel defined as tanh(x1 . x2 - aux).

double svm_kernel_coulomb(double *x1,
      double *
x2, double aux,
      int 
dim);

A Coulomb kernel defined inspired from physics.

HEURISTICS

SMORCH uses several tricks to speed up convergence. Most of the heuristics are summarized in a recent paper:

The main trick is that kernel outputs are cached. However, a simple LRU policy fails miserably, so our caching policy works around this by effectively turning off cache "tickles" in certain situations. The other heuristics are designed to exploit the fact that kernel outputs are cached by reducing the cost of each SMORCH step, or by reducing the total number of steps.

The heuristics are summarized below. The item label in the list corresponds to the field in the SMORCH data structure that must be set in order to activate the heuristic. The end of each heuristic description also lists the command line option in the smorch program that activates the heuristic.

If you have looked closely at the source code, you may have seen references to some tube shrinking heuristics. This is essentially a poor man's homotopy. It's a work in progress, so ignore those options.

SMORCH EXAMPLES

Suppose you have an ASCII data file consisting of 1000 exemplars with each exemplar consisting of 100 input features followed by the target class. The suggested way of solving this problem would be to use:

The above command will use linear kernels. However, if you wanted to use Gaussian kernels with a variance of 0.5, you would use:

Now, imagine that everything is the same as above, but now you it is a regression problem that you are working on. Then, you would use:

Finally, if you had on the order of 10,000 exemplars instead, a better set of options would be:

When in doubt, use all of the recommended heuristics except the command line option -ssz UINT. If you are working with a huge dataset, then use -ssz UINT

TO DO

In an ideal world, the SVM and NN packages would be unified in such a way that an SVM is just another NN that is a linear combination of kernels. In this framework, SMORCH would just be another algorithm in the OPTIMIZER package. Some day, NODElib will be restructured in this way. But not very soon.

BUGS

This is highly experimental code and it is the subject of active research. As such, there are probably numerous bugs. Moreover, this package may be radically changed in the future.

AUTHOR

Gary William Flake (gary.flake@usa.net).