Machine Learning代写:CS316 Logistic Regression

代写Machine Learning中的Logistic Regression算法。

Overview

This goes without saying at this point, but do not wait until the last minute for this assignment. You may discuss the assignment with others, but your code must be your own. I encourage the use of the Discussion board to ask questions, preferably prior to the night before the assignment is due.

For this homework, you’ll implement the update rule for logistic regression with stochastic gradient descent, and you’ll apply it to the task of determining whether documents are talking about hockey or baseball.

You should not use any libraries that implement any of the functionality of logistic regression for this assignment. Logistic regression is implemented in scikit learn, but you should do everything by hand now. You’ll be able to use library implementations of logistic regression in the future.

What you have to do

Coding

  1. Understand how the code is creating feature vectors (this will help you code the solution and to do the later analysis). You don’t actually need to write any code for this, however.
  2. (Optional) Store necessary data in the constructor so you can do classification later.
  3. You’ll likely need to write some code to get the best/worst features (see below).
  4. Modify the sg update function to perform non-regularized updates.
  5. Modify the sg update function so that it finds regularized updates. NOTE: You should only update non-zero dimensions (https://lingpipe.files.wordpress.com/2008/04/lazysgdregression.pdf)

Analysis

Include the formulas you implemented and how/why you implemented them the way you did in the code. Sloppily written and/or excessively sparse reports will receive no credit.
Also address at least the following:

  1. What is the role of the learning rate?
  2. How many passes over the data do you need to complete?
  3. What words are the best predictors of each class? How (mathematically) did you find them?
  4. What words are the poorest predictors of classes? How (mathematically) did you find them?
  5. What happens to regularization if mu is 0?

You can download the code and data for this assignment here: https://github.com/acgrissom/2016-ml-course/tree/master/assignments/programming2_code

Extra credit

  1. Use a schedule to update the learning rate.
    • Supply an appropriate argument to step parameter
    • Support it in your sg update
    • Show the effect in your analysis document
  2. Compare your performance with Vowpal Wabbit’s logistic regression.

Caution: When implementing extra credit, make sure your implementation of the regular algorithms doesn’t change.

What to turn in

  1. Submit your logreg.py file (include your name at the top of the source)
  2. Submit your analysis.pdf file. Your analysis PDF should be in the ICML template format and should follow the format of the research papers you have read thus far.
    • no more than one page
    • graphs are better than text
    • include your name at the top of the PDF

Hints

  1. Certainly make sure that you do the unregularized version first and get it to work well. The numpy dot function may be helpful.
  2. Use numpy functions whenever you can to make the computation faster.
  3. The Example class has feature values for each feature in a given example.
  4. The mu term is used in regularization.
  5. For “lazy” regularization equation, see Section 4 here.
  6. Also see: https://lingpipe.files.wordpress.com/2008/04/lazysgdregression.pdf