Instructions and submission guidelines:
- You must sign an assessment declaration coversheet to submit with your assignment. - Submit your assignment via the Canvas MyUni.
You are required to write code for performing PCA and kernel PCA (KPCA) on a given dataset. After performing PCA/KPCA, you will use various classifiers to perform classification. You will investigate the effect of applying PCA, KPCA and KPCA with different types of kernels. Specifically, the tasks you need to complete are:
- Write code for performing PCA
- Apply PCA to the given dataset and then perform classification with the nearest neighbour classifier. Analyse the performance change against different reduced dimensions. (suggestion: from 256 to 10)
- Write code for performing KPCA. Use three kernel functions, Linear Kernel (equivalent to PCA), Gaussian-RBF kernel and Hellinger kernel (See the lecture slides)
- Apply KPCA to the given dataset and then perform classification with nearest neighbour classifier. Analyse the performance change against different reduced dimensions. (suggestion: from 256 to 10)
- [Master student only] Append 256 noisy dimensions to the original data, that is, for each sample xi, appending a 256-dimensional random feature vector to xi to make it a 512-dimensional feature. Then repeat the KPCA experiments by using a linear SVM as the classifier. Test and analyse the results. The following lines give an example code (Matlab) for appending noise features
- % Let X be the original data
- [N,d] = size(X);
- R = randn(N,d); % using Gaussian noise in this experiment
- X_ = [X,R];
For Gaussian-RBF kernel, you need to set the bandwith hyper-parameter. An empirical approach is to choose it. You are encouraged to scale up and down this empirical chosen parameter a little bit to find out the best performed hyper-parameter.
You can choose either Matlab, Python, or C/C++. I would personally suggest Matlab or Python.
The PCA and KPCA part of your code should not rely on any 3rd-party toolbox. Only Matlab’s built-in API’s or Python/ C/C++’s standard libraries are allowed. party implementation of linear SVM for your experiments.
You are also required to submit a report ([10 pages in PDF format), which should have the following sections (report contributes 50% to the mark; code 50%):
- An algorithmic description of PCA and KPCA. (5%)
- Your understanding of PCA and KPCA (anything that you believe is relevant to this algorithm) (5%)
- Some analyses of your implementation. You should plot an error curve against the number of reduced dimensions. For KPCA, results from different kernels should be plotted on the same figure with different colours. (20% for master students and 40% for undergraduate students)
- You should present and analyse the results of adding noisy dimensions before performing PCA/KPCA. For example, you can compare the impact of noisy dimensions to different kernels in KPCA. (20% for master students only)
In summary, you need to submit (1) the code that implements PCA/KPCA and (2) a report in PDF.
You will use USPS dataset to test your model. It contains 10 classes with 7291 training samples and 2007 testing samples. The feature dimensionality is 256. The training data can be found in usps.libsvm and testing data can be found in usps.t.libsvm.