## Support Vector Machine classifiers

Use the LibSVM package to train a SVM classifier: a simple example of its use is provided in demo libsvm.m. The first 3 bullet items outline the steps followed in SVM.

1. Use a Gaussian kernel, and use 10-fold cross-validation to choose the cost and precision parameters (referred to as -c: cost / -g: gamma parameters in LibSVM documentation).
Visualize the cross-validation error (a 2D function) using imagesc.
Once you have picked the best combination of cost and precision evaluate the classifier’s generalization error on the test-set - report the number of misclassified points.
2. Plot the decision boundaries, by appropriately modifying the code from the previous exercises. Superimpose on your plot also the support vectors, using the code outlined in SVM.m.
If you work is working right, you should be getting results of the following form for the cross-validation error and the decision boundaries:

In Adaboost.m your task is to train an Adaboost classifier with synthetic data. For reference, you are provided with the posterior P(y = 1|x), x {[0, 1] [0, 1]} regularly sampled on the [0, 1] [0, 1] domain, so that you will see how the output of the Adaboost classifier better approximates the posterior at each round.

• Implement the Adaboost algorithm described in class. This involves iterating over the following steps:
• Find the best decision stump at each round.
• Evaluate the weak learner’s weighted error, and estimate t.
• Update the distribution over the training samples.
• Plot as a function of the boosting round. Make sure that E is monotonically decreasing with time (otherwise your code is wrong). Verify that E(y(x) f(x)) provides an upper bound for the number of errors.
• Show the response of your strong learner side-by-side with the posterior. Make sure that they look increasingly similar at larger iterations.

If you code is working right, you should be getting results of the following form for the cross-validation error and the decision boundaries: strong learner round 10 strong learner round 50 strong learner round 100strong learner round 400

Top row: strong learner scores, displayed within [-1, 1] for dierent numbers of boosting rounds.
Bottom row: left: loss function of Adaboost for increasing training rounds; middle: class posterior (groundtruth); right: adaboost-based approximatiot to posterior, at round 400.

## Some clarifications

• The form of a decision stump, where d is the the dimension along which the decision is taken, s is 1 if is true and 0 otherwise, is the threshold applied along dimension d and s {-1, 1} is the polarity of the decision stump.

## SVM vs. Adaboost

• When searching for the best decision stump you need in principle to consider all possible combinations of d. As discussed in class, for a given dimension d the only values of that you actually need to work with are the values taken on by the training set features along that dimension.

### SVM vs. Adaboost BONUS

Compare the performance of the classifier you obtained from the previous part of the exercise with that of Adaboost. Use the same training and test sets for both classifiers. As above, make sure you have reproducible results, by appropriately setting the seed variable in rand,randn+.
Report their average misclassification errors on the training set and on the test set. What do you observe?

### Fast weak learner selection BONUS

When picking the best weak learner at each round a substantial speedup can be gained by a clever implementation of the threshold selection. Below some hints are provided, you can try to implement it quickly based on these hints.
Consider that we are searching for the best weak learner that operates with the k th feature dimension, f(k) . Our task is to choose the best threshold for a decision stump of the form.
The naive implementation is to consider all possible thresholds, where N is the training sample size, and for each of those evaluate the associated weighted error.

The complexity of this computation is O(N^2), since for every of the N threshold values we need to perform N summations and multiplications.

Instead of this, a faster algorithm will first sort the N training samples according to their value of k; namely k consider that we have a permutation of the training set such that f[i], respectively. Using these, we then have the following recursion for C(i).

The first and second equations follow from the definition in Eq. 2 and the fact that f have been sorted. The third equation can be verified by substitution. Sorting has O(N log N) complexity, while the recursion can be implemented in O(N) time, so the overall complexity goes down from O(N^2) to O(N log N). In the code provided to you, this idea has been used to accelerate the weak learner selection.

Inspect the code provided to you and reply to the following questions in no more than 100 words (the shorter, the better):

• What is the code in lines 22-26 doing?
• What is the code in lines 28-38 doing?

Note: if you do not understand the code by looking at it, this is perfectly normal. Before “leaking” the solution, the assignment was asking you to write all of this code. Now, since the solution is leaked, we are asking you to spend time to understand what is going on in the provided solution. Running the code and observing the variables is an option to help you with doing this.

But, since this is a bonus exercise, you are expected to “go the extra mile” to get the bonus.