实现Linear and logistic regression算法，并在Seoul Bike sharing Demand数据集上进行模拟。
In this assignment, we will be implementing linear and logistic regression on a given dataset. In addition, we will experiment with design and feature choices.
We will be using the Seoul Bike sharing Demand Data Set available for download.
Implement a linear regression model on the dataset to predict the rented bike count. You are not allowed to use any available implementation (library/package, etc.) of the regression model. You should implement the gradient descent algorithm with batch update (all training examples used at once). Use the sum of squared error normalized by 2*number of samples as your cost and error measures, where m is number of samples. You should use all the features.
Also implement a logistic regression model as described in Part 4. Again, you are not allowed to use any available implementation of the logistic regression model. You should implement the gradient descent algorithm with batch update (all training examples used at once). You should use the logistic regression cost/error function from the class. In addition you can also use accuracy/ROC/etc.
Download the dataset and partition it randomly into train and test set using a good train/test split percentage.
Design a linear regression model to model rented bike count. Include your regression model equation in the report.
Implement the gradient descent algorithm with batch update rule. Use the same cost function as in the class (sum of squared error). Report your initial parameter values.
Convert this problem into a binary classification problem. The target variable should have two categories. Implement logistic regression to carry out classification on this data set. Report accuracy/error metrics for train and test sets.
- Experiment with various parameters for linear and logistic regression (e.g. learning rate ) and report on your findings as how the error/accuracy varies for train and test sets with varying these parameters. Plot the results. Report the best values of the parameters.
- Experiment with various thresholds for convergence for linear and logistic regression. Plot error results for train and test sets as a function of threshold and describe how varying the thresholdaffects error. Pick your best threshold and plot train and test error (in one figure) as a function of number of gradient descent iterations.
- Pick eight features randomly and retrain your models (both linear and logistic) only on these eight features. Compare train and test error results for the case of using your original set of all features and eight random features. Report the eight randomly selected features.
- Now pick eight features that you think are best suited to predict the output, and retrain your models (both linear and logistic) using these eight features. Compare to the case of using your original set of features and to the random features case. Did your choice of features provide better results than picking random features? Why? Did your choice of features provide better results than using all features? Why?
You are required to turn in your code and a report. We should be able to run the code as is and get the results and plots that you have included in the report. You should include and describe results for all the experiments above. You should also mention how you constructed the classes for the classification problem (value of threshold and why you picked it). You can be creative and include other plots/results too. However, the report should not exceed 10 pages. Also describe your interpretation of the results. What do you think matters the most for predicting the value and category/class of rented bike count? What other steps you could have taken with regards to modeling to get better results?
Total weightage: 12.5% of final grade
Report: 100 points
If your code doesn’t run or doesn’t produce the same results then you get zero points. Points will be awarded based not only on how good your results are, but also on how well you describe them as well as underlying experimentation.
Experiment 1: 20 points
Experiment 2: 20 points
Experiment 3: 20 points
Experiment 4: 20 points
Discussion: 20 points
Describe your interpretation of the results. What do you think matters the most for predicting the rented bike count? What other steps you could have taken with regards to modeling to get better results