## Requirement

In this assignment, we use the scikit-learn package to train an SVM classifier. To do so, we need to tune 2 hyperparameters: the cost C and precision γ (gamma). We are going to use K-fold cross-validation to determine the best combination of values for this pair.

## Question 0

Have a look at the 3 first cells. In the third one, take note of how the SVC object is instantiated and trained, how labels are predicted, and finally how the fitting error is computed. In this assignment, the prediction error after a given training is simply defined as the number of misclassified labels.

## Question 1

Using an SVM classifier with an RBF kernel, use 10-fold cross-validation to find the best cost and precision parameters. The range of test values for each parameter is provided. a. First compute the cross-validation error matrix: for each parameter combination, instantiate an SVM classifier; for each split provided by the KFold object, re-train this classifier and compute the prediction error; the cross- validation error is the average of these errors over all splits. b. Use the error matrix to select the best parameter combination. c. Visualize the error matrix using imshow and the ‘hot’ colormap.

## Question 2

Plot the decision boundaries of this classifier, by appropriately modifying the code from the previous assignments. Display the support vectors on the same figure.

## Question 3

Evaluate and print the generalization error of this classifier, computed on the test set.

## Code

### imports

1 | import matplotlib.pyplot as plt |

### Load and display the training data

1 | features = np.load("features.npy") |

features size: (500, 2)

labels size: (500,)

### Training the SVM classifier with arbitrary hyperparameters

1 | cost = 1 |

Prediction error: 98

## Training with K-fold cross-validation

### Define test values for the cost and precision parameters

1 | def logsample(start, end, num): |

### Compute the cross-validation error for each parameter combination

The KFold class from scikit-learn is a “cross-validation” object, initialized with a number of folds. For each fold, it randomly partitions the input data into a training set and a validation set. The documentation (http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) provides an example of use.

1 | K = 10 # number of folds for cross validation |

### Train the classifier with the best parameter combination

1 | # Find gamma and cost giving the smallest error |

### Display cross-validation results and decision function

1 | # Sample points on a grid |

## Generalization error

### Load the test data

1 | # Load the training data |

(500, 2)

(500,)

### Print the number of misclassified points in the test set

1 | # TODO (Question 3) |

90