Support vector machines for classification. The basic support vector machine
is a binary linear classifier which chooses the hyperplane that represents
the largest separation, or margin, between the two classes. If such a
hyperplane exists, it is known as the maximum-margin hyperplane and the
linear classifier it defines is known as a maximum margin classifier.
If there exists no hyperplane that can perfectly split the positive and
negative instances, the soft margin method will choose a hyperplane
that splits the instances as cleanly as possible, while still maximizing
the distance to the nearest cleanly split instances.
The nonlinear SVMs are created by applying the kernel trick to
maximum-margin hyperplanes. The resulting algorithm is formally similar,
except that every dot product is replaced by a nonlinear kernel function.
This allows the algorithm to fit the maximum-margin hyperplane in a
transformed feature space. The transformation may be nonlinear and
the transformed space be high dimensional. For example, the feature space
corresponding Gaussian kernel is a Hilbert space of infinite dimension.
Thus though the classifier is a hyperplane in the high-dimensional feature
space, it may be nonlinear in the original input space. Maximum margin
classifiers are well regularized, so the infinite dimension does not spoil
the results.
The effectiveness of SVM depends on the selection of kernel, the kernel's
parameters, and soft margin parameter C. Given a kernel, best combination
of C and kernel's parameters is often selected by a grid-search with
cross validation.
The dominant approach for creating multi-class SVMs is to reduce the
single multi-class problem into multiple binary classification problems.
Common methods for such reduction is to build binary classifiers which
distinguish between (i) one of the labels to the rest (one-versus-all)
or (ii) between every pair of classes (one-versus-one). Classification
of new instances for one-versus-all case is done by a winner-takes-all
strategy, in which the classifier with the highest output function assigns
the class. For the one-versus-one approach, classification
is done by a max-wins voting strategy, in which every classifier assigns
the instance to one of the two classes, then the vote for the assigned
class is increased by one vote, and finally the class with most votes
determines the instance classification.
References
- Christopher J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2:121-167, 1998.
- John Platt. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines.
- Rong-En Fan, Pai-Hsuen, and Chih-Jen Lin. Working Set Selection Using Second Order Information for Training Support Vector Machines. JMLR, 6:1889-1918, 2005.
- Antoine Bordes, Seyda Ertekin, Jason Weston and Leon Bottou. Fast Kernel Classifiers with Online and Active Learning, Journal of Machine Learning Research, 6:1579-1619, 2005.
- Tobias Glasmachers and Christian Igel. Second Order SMO Improves SVM Online and Active Learning.
- Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a Library for Support Vector Machines.