# ClassificationSVM class

Superclasses: CompactClassificationSVM

Support vector machine for binary classification

## Description

`ClassificationSVM` is a support vector machine classifier for one- or two-class learning. Use `fitcsvm` and the training data to train a `ClassificationSVM` classifier.

Trained `ClassificationSVM` classifiers store the training data, parameter values, prior probabilities, support vectors, and algorithmic implementation information. You can use these classifiers to:

## Construction

```SVMModel = fitcsvm(X,Y)``` returns a trained SVM classifier (`SVMModel`) based on the input variables (also known as predictors, features, or attributes) `X` and output variables (also known as responses or class labels) `Y`. For details, see `fitcsvm`.

`SVMModel = fitcsvm(X,Y,Name,Value)` returns a trained SVM classifier with additional options specified by one or more `Name,Value` pair arguments. For name-value pair argument details, see `fitcsvm`.

If you set one of the following five options, then `SVMModel` is a `ClassificationPartitionedModel` model: `'CrossVal'`, `'CVPartition'`, `'Holdout'`, `'KFold'`, or `'Leaveout'`. Otherwise, `SVMModel` is a `ClassificationSVM` classifier.

collapse all

### `X` — Predictor datamatrix of numeric values

Predictor data to which the SVM classifier is trained, specified as a matrix of numeric values.

Each row of `X` corresponds to one observation (also known as an instance or example), and each column corresponds to one predictor.

The length of `Y` and the number of rows of `X` must be equal.

It is good practice to:

• Cross validate using the `KFold` name-value pair argument. The cross-validation results determine how well the SVM classifier generalizes.

• Standardize the predictor variables using the `Standardize` name-value pair argument.

To specify the names of the predictors in the order of their appearance in `X`, use the `PredictorNames` name-value pair argument.

Data Types: `double` | `single`

### `Y` — Class labelscategorical array | character array | logical vector | vector of numeric values | cell array of strings

Class labels to which the SVM classifier is trained, specified as a categorical or character array, logical or numeric vector, or cell array of strings.

If `Y` is a character array, then each element must correspond to one row of the array.

The length of `Y` and the number of rows of `X` must be equal.

It is good practice to specify the order of the classes using the `ClassNames` name-value pair argument.

To specify the response variable name, use the `ResponseName` name-value pair argument.

 Note:   The software treats `NaN`, empty string (`''`), and `` elements as missing values. If a row of `X` or an element of `Y` contains at least one `NaN`, then the software removes those rows and elements from both arguments. Such deletion decreases the effective training or cross-validation sample size.

## Properties

`Alpha`

Numeric vector of trained classifier coefficients from the dual problem (i.e., the estimated Lagrange multipliers). `Alpha` has length equal to the number of support vectors in the trained classifier (i.e., `sum(SVMModel.IsSupportVector)`).

`Beta`

Numeric vector of linear predictor coefficients. `Beta` has length equal to the number of predictors (i.e., `size(SVMModel.X,2)`).

If `KernelParameters``.Function` is `'linear'`, then the software estimates the classification score for the observation x using

$f\left(x\right)=\left(x/s\right)\prime \beta +b.$

`SVMModel` stores β, b, and s in the properties `Beta`, `Bias`, and `KernelParameters``.Scale`, respectively.

If `KernelParameters``.Function` is not `'linear'`, then `Beta` is empty (`[]`).

`Bias`

Scalar corresponding to the trained classifier bias term.

`BoxConstraints`

Numeric vector of box constraints.

`BoxConstraints` has length equal to the number of observations (i.e., `size(SVMModel.X,1)`).

`CacheInfo`

Structure array containing:

• The cache size (in MB) that the software reserves to train the SVM classifier (`SVMModel.CacheInfo.Size`). To set the cache size to `CacheSize` MB, set the `fitcsvm` name-value pair argument to `'CacheSize',CacheSize`.

• The caching algorithm that the software uses during optimization (`SVMModel.CacheInfo.Algorithm`). Currently, the only available caching algorithm is `Queue`. You cannot set the caching algorithm.

`CategoricalPredictors`

List of categorical predictors, which is always empty (`[]`) for SVM and discriminant analysis classifiers.

`ClassNames`

List of elements in `Y` with duplicates removed. `ClassNames` has the same data type as the data in the argument `Y`, and therefore can be a categorical or character array, logical or numeric vector, or cell array of strings.

`ConvergenceInfo`

Structure array containing convergence information.

FieldDescription
`Converged`Logical flag indicating whether the algorithm converged (`1` indicates convergence)
`ReasonForConvergence`String indicating the criterion the software uses to detect convergence
`Gap`Scalar feasibility gap between the dual and primal objective functions
`GapTolerance`Scalar feasibility gap tolerance. Set this tolerance to, e.g., `gt`, using the name-value pair argument `'GapTolerance',gt` of `fitcsvm`.
`DeltaGradient`Scalar-attained gradient difference between upper and lower violators
`DeltaGradientTolerance`Scalar tolerance for gradient difference between upper and lower violators. Set this tolerance to, e.g., `dgt`, using the name-value pair argument `'DeltaGradientTolerance',dgt` of `fitcsvm`.
`LargestKKTViolation`Maximal, scalar Karush-Kuhn-Tucker (KKT) violation value
`KKTTolerance`Scalar tolerance for the largest KKT violation. Set this tolerance to, e.g., `kktt`, using the name-value pair argument `'KKTTolerance',kktt` of `fitcsvm`.
`History`Structure array containing convergence information at set optimization iterations. The fields are:
• `NumIterations`: numeric vector of iteration indices for which the software records convergence information

• `Gap`: numeric vector of `Gap` values at the iterations

• `DeltaGradient` numeric vector of `DeltaGradient` values at the iterations

• `LargestKKTViolation`: numeric vector of `LargestKKTViolation` values at the iterations

• `NumSupportVectors`: numeric vector indicating the number of support vectors at the iterations

• `Objective`: numeric vector of `Objective` values at the iterations

`Objective`Scalar value of the dual objective function

`Cost`

Square matrix, where `Cost(i,j)` is the cost of classifying a point into class `j` if its true class is `i`.

During training, the software updates the prior probabilities by incorporating the penalties described in the cost matrix. Therefore,

• For two-class learning, `Cost` always has this form: `Cost(i,j) = 1` if `i ~= j`, and `Cost(i,j) = 0` if `i = j` (i.e., the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of `Cost` corresponds to the order of the classes in `ClassNames`.

• For one-class learning, `Cost = 0`.

This property is read-only. For more details, see Algorithms.

`Gradient`

Numeric vector of training data gradient values. `Gradient` has length equal to the number of observations (i.e., `size(SVMModel.X,1)`).

`IsSupportVector`

Logical vector indicating whether a corresponding row in the predictor data matrix is a support vector. `IsSupportVector` has length equal to the number of observations (i.e., `size(SVMModel.X,1)`).

`KernelParameters`

Structure array containing the kernel name and parameter values.

To display the values of `KernelParameters`, use dot notation, e.g., `SVMModel.KernelParameters.Scale` displays the scale parameter value.

The software accepts `KernelParameters` as inputs, and does not modify them. Alter `KernelParameters` by setting the appropriate name-value pair arguments when you train the SVM classifier using `fitcsvm`.

`ModelParameters`

Object containing parameter values, e.g., the name-value pair argument values, used to train the SVM classifier. `ModelParameters` does not contain estimated parameters.

Access fields of `ModelParameters` using dot notation. For example, access the initial values for estimating `Alpha` using `SVMModel.ModelParameters.Alpha`.

`Mu`

Numeric vector of predictor means.

If you specify `'Standardize',1` or `'Standardize',true` when you train an SVM classifier using `fitcsvm`, then `Mu` has length equal to the number of predictors (i.e., `size(SVMModel.X,2)`). Otherwise, `Mu` is an empty vector (`[]`).

`NumIterations`

Positive integer indicating the number of iterations required by the optimization routine to attain convergence.

To set a limit on the number of iterations to, e.g., `k`, specify the name-value pair argument `'IterationLimit',k` of `fitcsvm`.

`Nu`

Positive scalar representing the ν parameter for one-class learning.

`NumObservations`

Numeric scalar representing the number of observations in the training data. If the input arguments `X` or `Y` contain missing values, then `NumObservations` is less than the length of `Y`.

`OutlierFraction`

Scalar indicating the expected proportion of outliers in the training data.

`PredictorNames`

Cell array of strings containing the predictor names, in the order that they appear in `X`.

`Prior`

Numeric vector of prior probabilities for each class. The order of the elements of `Prior` corresponds to the elements of `SVMModel.ClassNames`.

For two-class learning, if you specify a cost matrix, then the software updates the prior probabilities by incorporating the penalties described in the cost matrix.

This property is read-only. For more details, see Algorithms.

`ResponseName`

String describing the response variable `Y`.

`ScoreTransform`

String representing a built-in transformation function, or a function handle for transforming predicted classification scores.

To change the score transformation function to, e.g., `function`, use dot notation.

• For a built-in function, enter a string.

`SVMModel.ScoreTransform = 'function';`

This table contains the available, built-in functions.

StringFormula
`'doublelogit'`1/(1 + e–2x)
`'invlogit'`log(x / (1–x))
`'ismax'`Set the score for the class with the largest score to `1`, and scores for all other classes to `0`.
`'logit'`1/(1 + ex)
`'none'`x (no transformation)
`'sign'`–1 for x < 0
0 for x = 0
1 for x > 0
`'symmetric'`2x – 1
`'symmetriclogit'`2/(1 + ex) – 1
`'symmetricismax'`Set the score for the class with the largest score to `1`, and scores for all other classes to `-1`.

• For a MATLAB® function, or a function that you define, enter its function handle.

`SVMModel.ScoreTransform = @function;`

`function` should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

`ShrinkagePeriod`

Nonnegative integer indicating the shrinkage period, i.e., number of iterations between reductions of the active set.

To set the shrinkage period to, e.g., `sp`, specify the name-value pair argument `'ShrinkagePeriod',sp` of `fitcsvm`.

`Sigma`

Numeric vector of predictor standard deviations.

If you specify `'Standardize',1` or `'Standardize',true` when you train the SVM classifier, then `Sigma` has length equal to the number of predictors (i.e., `size(SVMModel.X,2)`). Otherwise, `Sigma` is an empty vector (`[]`).

`Solver`

String indicating the solving routine that the software used to train the SVM classifier.

To set the solver to, e.g., `solver`, specify the name-value pair argument `'Solver',solver` of `fitcsvm`.

`SupportVectors`

Matrix containing rows of `X` that the software considers the support vectors.

If you specify `'Standardize',1` or `'Standardize',true`, then `SupportVectors` are the standardized rows of `X`.

`SupportVectorLabels`

Numeric vector of support vector class labels. `SupportVectorLabels` has length equal to the number of support vectors (i.e., `sum(SVMModel.IsSupportVector)`).

`+1` indicates that the corresponding support vector is in the positive class (`SVMModel.ClassNames{2}`). `-1` indicates that the corresponding support vector is in the negative class (`SVMModel.ClassNames{1}`).

`W`

Numeric vector of observation weights that the software used to train the SVM classifier.

The length of `W` is `SVMModel.NumObservations`.

`fitcsvm` normalizes `Weights` so that the elements of `W` within a particular class sum up to the prior probability of that class.

`X`

Numeric matrix of unstandardized predictor values that the software used to train the SVM classifier.

Each row of `X` corresponds to one observation, and each column corresponds to one variable.

The software excludes predictor data rows removed due to `NaN`s from `X`.

`Y`

Categorical or character array, logical or numeric vector, or cell array of strings representing the observed class labels that the software used to train the SVM classifier. `Y` is the same data type as the input argument `Y` of `fitcsvm`.

Each row of `Y` represents the observed classification of the corresponding row of `X`.

The software excludes elements removed due to `NaN`s from `Y`.

## Methods

 compact Compact support vector machine classifier crossval Cross-validated support vector machine classifier fitPosterior Fit posterior probabilities resubEdge Classification edge for support vector machine classifiers by resubstitution resubLoss Classification loss for support vector machine classifiers by resubstitution resubMargin Classification margins for support vector machine classifiers by resubstitution resubPredict Predict support vector machine classifier resubstitution responses resume Resume training support vector machine classifier

### Inherited Methods

 compareHoldout Compare accuracies of two classification models using new data discardSupportVectors Discard support vectors for linear support vector machine models edge Classification edge for support vector machine classifiers fitPosterior Fit posterior probabilities loss Classification error for support vector machine classifiers margin Classification margins for support vector machine classifiers predict Predict labels for support vector machine classifiers

## Definitions

### Box Constraint

A parameter that controls the maximum penalty imposed on margin-violating observations, and aids in preventing overfitting (regularization).

If you increase the box constraint, then the SVM classifier assigns fewer support vectors. However, increasing the box constraint can lead to longer training times.

### Gram Matrix

The Gram matrix of a set of n vectors {x1,..,xn; xjRp} is an n-by-n matrix with element (j,k) defined as G(xj,xk) = <ϕ(xj),ϕ(xk)>, an inner product of the transformed predictors using the kernel function ϕ.

For nonlinear SVM, the algorithm forms a Gram matrix using the predictor matrix columns. The dual formalization replaces the inner product of the predictors with corresponding elements of the resulting Gram matrix (called the "kernel trick"). Subsequently, nonlinear SVM operates in the transformed predictor space to find a separating hyperplane.

### Karush-Kuhn-Tucker Complementarity Conditions

KKT complementarity conditions are optimization constraints required for optimal nonlinear programming solutions.

In SVM, the KKT complementarity conditions are

$\left\{\begin{array}{l}{\alpha }_{j}\left[{y}_{j}\left(w\prime \varphi \left({x}_{j}\right)+b\right)-1+{\xi }_{j}\right]=0\\ {\xi }_{j}\left(C-{\alpha }_{j}\right)=0\end{array}$

for all j = 1,...,n, where wj is a weight, ϕ is a kernel function (see Gram matrix), and ξj is a slack variable. If the classes are perfectly separable, then ξj = 0 for all j = 1,...,n.

### One-Class Learning

One-class learning, or unsupervised SVM, aims at separating data from the origin in the high-dimensional, predictor space (not the original predictor space), and is an algorithm used for outlier detection.

The algorithm resembles that of SVM for binary classification. The objective is to minimize dual expression

$0.5\sum _{jk}{\alpha }_{j}{\alpha }_{k}G\left({x}_{j},{x}_{k}\right)$

with respect to ${\alpha }_{1},...,{\alpha }_{n}$, subject to

$\sum {\alpha }_{j}=n\nu$

and $0\le {\alpha }_{j}\le 1$ for all j = 1,...,n. G(xj,xk,) is element (j,k) of the Gram matrix.

A small value of ν leads to fewer support vectors, and, therefore, a smooth, crude decision boundary. A large value of ν leads to more support vectors, and therefore, a curvy, flexible decision boundary. The optimal value of ν should be large enough to capture the data complexity and small enough to avoid overtraining. Also, 0 < ν ≤ 1.

For more details, see [5].

### Support Vector

Support vectors are observations corresponding to strictly positive estimates of α1,...,αn.

SVM classifiers that yield fewer support vectors for a given training set are more desirable.

### Support Vector Machines for Binary Classification

The SVM binary classification algorithm searches for an optimal hyperplane that separates the data into two classes. For separable classes, the optimal hyperplane maximizes a margin (space that does not contain any observations) surrounding itself, which creates boundaries for the positive and negative classes. For inseparable classes, the objective is the same, but the algorithm imposes a penalty on the length of the margin for every observation that is on the wrong side of its class boundary.

The linear SVM score function is

$f\left(x\right)=x\prime \beta +{\beta }_{0},$

where:

• x is an observation (corresponding to a row of `X`).

• The vector β contains the coefficients that define an orthogonal vector to the hyperplane (corresponding to `SVMModel.Beta`). For separable data, the optimal margin length is $2/‖\beta ‖.$

• β0 is the bias term (corresponding to `SVMModel.Bias`).

The root of f(x) for particular coefficients defines a hyperplane. For a particular hyperplane, f(z) is the distance from point z to the hyperplane.

An SVM classifier searches for the maximum margin length, while keeping observations in the positive (y = 1) and negative (y = –1) classes separate. Therefore:

• For separable classes, the objective is to minimize $‖\beta ‖$ with respect to the β and β0 subject to yjf(xj) ≥ 1, for all j = 1,..,n. This is the primal formalization for separable classes.

• For inseparable classes, SVM uses slack variables (ξj) to penalize the objective function for observations that cross the margin boundary for their class. ξj = 0 for observations that do not cross the margin boundary for their class, otherwise ξj ≥ 0.

The objective is to minimize$0.5{‖\beta ‖}^{2}+C\sum {\xi }_{j}$ with respect to the β, β0, and ξj subject to ${y}_{j}f\left({x}_{j}\right)\ge 1-{\xi }_{j}$ and ${\xi }_{j}\ge 0$ for all j = 1,..,n, and for a positive scalar box constraint C. This is the primal formalization for inseparable classes.

SVM uses the Lagrange multipliers method to optimize the objective. This introduces n coefficients α1,...,αn (corresponding to `SVMModel.Alpha`). The dual formalizations for linear SVM are:

• For separable classes, minimize

$0.5\sum _{j=1}^{n}\sum _{k=1}^{n}{\alpha }_{j}{\alpha }_{k}{y}_{j}{y}_{k}{x}_{j}\prime {x}_{k}-\sum _{j=1}^{n}{\alpha }_{j}$

with respect to α1,...,αn, subject to $\sum {\alpha }_{j}{y}_{j}=0$, αj ≥ 0 for all j = 1,...,n, and Karush-Kuhn-Tucker (KKT) complementarity conditions.

• For inseparable classes, the objective is the same as for separable classes, except for the additional condition $0\le {\alpha }_{j}\le C$ for all j = 1,..,n.

The resulting score function is

$f\left(x\right)=\sum _{j=1}^{n}{\stackrel{^}{\alpha }}_{j}{y}_{j}x\prime {x}_{j}+\stackrel{^}{b}.$

The score function is free of the estimate of β as a result of the primal formalization.

In some cases, there is a nonlinear boundary separating the classes. Nonlinear SVM works in a transformed predictor space to find an optimal, separating hyperplane.

The dual formalization for nonlinear SVM is

$0.5\sum _{j=1}^{n}\sum _{k=1}^{n}{\alpha }_{j}{\alpha }_{k}{y}_{j}{y}_{k}G\left({x}_{j},{x}_{k}\right)-\sum _{j=1}^{n}{\alpha }_{j}$

with respect to α1,...,αn, subject to $\sum {\alpha }_{j}{y}_{j}=0$, $0\le {\alpha }_{j}\le C$ for all j = 1,..,n, and the KKT complementarity conditions.G(xk,xj) are elements of the Gram matrix. The resulting score function is

$f\left(x\right)=\sum _{j=1}^{n}{\alpha }_{j}{y}_{j}G\left(x,{x}_{j}\right)+b.$

For more details, see Understanding Support Vector Machines, [1], and [3].

## Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB documentation.

## Examples

collapse all

### Train a Support Vector Machine Classifier

Load Fisher's iris data set. Remove the sepal lengths and widths, and all observed setosa irises.

```load fisheriris inds = ~strcmp(species,'setosa'); X = meas(inds,3:4); y = species(inds); ```

Train an SVM classifier using the processed data set.

```SVMModel = fitcsvm(X,y) ```
```SVMModel = ClassificationSVM PredictorNames: {'x1' 'x2'} ResponseName: 'Y' ClassNames: {'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 100 Alpha: [24x1 double] Bias: -14.4149 KernelParameters: [1x1 struct] BoxConstraints: [100x1 double] ConvergenceInfo: [1x1 struct] IsSupportVector: [100x1 logical] Solver: 'SMO' ```

The Command Window shows that `SVMModel` is a trained `ClassificationSVM` classifier and a property list. Display the properties of `SVMModel`, for example, to determine the class order, by using dot notation.

```classOrder = SVMModel.ClassNames ```
```classOrder = 'versicolor' 'virginica' ```

The first class (`'versicolor'`) is the negative class, and the second (`'virginica'`) is the positive class. You can change the class order during training by using the `'ClassNames'` name-value pair argument.

Plot a scatter diagram of the data and circle the support vectors.

```sv = SVMModel.SupportVectors; figure gscatter(X(:,1),X(:,2),y) hold on plot(sv(:,1),sv(:,2),'ko','MarkerSize',10) legend('versicolor','virginica','Support Vector') hold off ```

The support vectors are observations that occur on or beyond their estimated class boundaries.

You can adjust the boundaries (and therefore the number of support vectors) by setting a box constraint during training using the `'BoxConstraint'` name-value pair argument.

### Train and Cross Validate Support Vector Machine Classifiers

Load the `ionosphere` data set.

```load ionosphere ```

Train and cross validate an SVM classifier. It is good practice to standardize the predictors and specify the order of the classes.

```rng(1); % For reproducibility CVSVMModel = fitcsvm(X,Y,'Standardize',true,... 'ClassNames',{'b','g'},'CrossVal','on') ```
```CVSVMModel = classreg.learning.partition.ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {1x34 cell} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' ```

`CVSVMModel` is not a `ClassificationSVM` classifier, but a `ClassificationPartitionedModel` cross-validated, SVM classifier. By default, the software implements 10-fold cross validation.

Alternatively, you can cross validate a trained `ClassificationSVM` classifier by passing it to `crossval`.

Inspect one of the trained folds using dot notation.

```CVSVMModel.Trained{1} ```
```ans = classreg.learning.classif.CompactClassificationSVM PredictorNames: {1x34 cell} ResponseName: 'Y' ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [78x1 double] Bias: -0.2209 KernelParameters: [1x1 struct] Mu: [1x34 double] Sigma: [1x34 double] SupportVectors: [78x34 double] SupportVectorLabels: [78x1 double] ```

Each fold is a `CompactClassificationSVM` classifier trained on 90% of the data.

Estimate the generalization error.

```genError = kfoldLoss(CVSVMModel) ```
```genError = 0.1168 ```

On average, the generalization error is approximately 12%.

## Algorithms

• All solvers implement L1 soft-margin minimization.

• `fitcsvm` and `svmtrain` use, among other algorithms, SMO for optimization. The software implements SMO differently between the two functions, but numerical studies show that there is sensible agreement in the results.

• For one-class learning, the software estimates the Lagrange multipliers, α1,...,αn, such that

$\sum _{j=1}^{n}{\alpha }_{j}=n\nu .$

• For two-class learning, if you specify a cost matrix C, then the software updates the class prior probabilities (p) to pc by incorporating the penalties described in C. The formula for the updated prior probability vector is

${p}_{c}=\frac{p\prime C}{\sum p\prime C}.$

Subsequently, the software resets the cost matrix to the default:

$C=\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right].$

• If you set `'Standardize',true` when you train the SVM classifier using `fitcsvm`, then the software trains the classifier using the standardized predictor matrix, but stores the unstandardized data in the classifier property `X`. However, if you standardize the data, then the data size in memory doubles until optimization ends.

• If you set `'Standardize',true` and any of `'Cost'`, `'Prior'`, or `'Weights'`, then the software standardizes the predictors using their corresponding weighted means and weighted standard deviations.

• Let `p` be the proportion of outliers you expect in the training data. If you use `'OutlierFraction',p` when you train the SVM classifier using `fitcsvm`, then:

• For one-class learning, the software trains the bias term such that 100`p`% of the observations in the training data have negative scores.

• The software implements robust learning for two-class learning. In other words, the software attempts to remove 100`p`% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude.

## References

[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.

[2] Scholkopf, B., J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. "Estimating the Support of a High-Dimensional Distribution." Neural Comput., Vol. 13, Number 7, 2001, pp. 1443–1471.

[3] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000.

[4] Scholkopf, B. and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Adaptive Computation and Machine Learning Cambridge, MA: The MIT Press, 2002.