These are exam preparation notes, subpar in quality and certainly not of divine quality.
If you are stuck, read Wikipedia in parallel.
The goal of SVMs is to divide two groups with a line that separates the data points as clearly as possible.
There are two cases:
- Data points can be cleanly split into their classes
- At least some data points overlap making it impossible to lay a line between datapoints.
With an SVM you want to make a binary classification of points and predict class membership.
From ZackWeinberg Wikipedia CC BY_SA 3.0
The label is assigned by calculating on which side of a hyperplane a point lies.
Where and are your set of parameters for a linear connectionist neuron.
Our goal is to find a line that cuts as “clean” as possible throught the two groups. Finding lines that separate the two groups
is ambiguous as can be seen in the graphic above (H2 and H3). In this graphic H3 is the optimal solution. How do we maximize the distance to the two point clouds?
What we thereby do is maximize the distance of the split line to the closest points. This is very prone to outliers, as the closest values determine the parameters of the line.
Dual: Provides lower bound to the solution of the primal problem
Unclean split - C-SVM
When we cannot put a straight line through the data points, we cannot only optimize for the above, but need to add a regularization term. (Kind of adding prior knowledge)
- C regularization parameter
- (squared???) error
Kernels are often used to identify more complex shapes. Typical kernel functions are:
polynomial of degree d
RBF kernel with range
neural network (excluded from exam)
plummer kernel (excluded from exam)
As SVMs were designed as binary classifiers, they cannot be directly used for multiclass classification.
What is usually done is combining multiple binary classifiers via a couple of perceptrons.