R代写:PPHA311 Problem Set

用R语言代写算术相关练习题。

Requirement

Group work: You may work in groups, but each person must submit individual answers. These answers must reflect the individual’s own work and may not be copied from others.
Scratch work and code: Please show your work (where relevant) and append all code written for this assignment to the end of your submission. Please use brief but clear comments in the code to reference the applicable assignment section.

Mathematical Background

Given random variables X and Y and constants a and b, state whether the following expressions are correct. If the expression is incorrect, please give the appropriate formula. If the expression is correct only under certain conditions, state those conditions.

  1. E(aX+b)=aE(X)
  2. Var(aX) = a^2Var(X)
  3. E(XY) = E(X)E(Y)
  4. Var(aX+bY)=a^2Var(X)+b^2Var(Y)
  5. E[E(Y|X)] = E(Y)
  6. Cov(X,Y)=E(XY)

Now consider a set of random variables: X1, X2, …, Xn. Are each of the below expressions correct? If not, state what must be assumed for each to be correct.

Exam-Style Questions

  1. True or false: The OLS estimator is biased when the assumption of homoskedasticity is violated.
  2. A dataset based on the U.S. National Longitudinal Survey is used to investigate the returns to education. A linear regression of hourly wage on highest (educational) grade completed in this dataset (n = 2244) yields the following: wage = -1.97 + 0.74grade
    • (a) What is the predicted wage for those who have finished up to 9th grade? What is the predicted wage for those who have finished high school (12th grade)?
    • (b) Is this regression likely to capture a causal relationship between education and wages? Why or why not? What are some potential confounding factors?

Hypothetical Experiment

The table below describes a hypothetical experiment with 2,400 participants.

Category #participants D T Yc Yt Y
1 300 0 0 4 6
2 300 1 0 4 6
3 500 0 1 4 6
4 500 1 1 4 6
5 200 0 0 10 12
6 200 1 0 10 12
7 200 0 1 10 12
8 200 1 1 10 12

where D is a predetermined characteristic, T the treatment status, and Yc and Yt, the potential outcomes.

  1. Complete the last column in the table (Y).
  2. What is the average treatment effect (ATE)?
  3. Is it plausible that these data come from a RCT?

Data-Driven Question

For this problem, you will analyze data on voting behavior in Colombias 2016 peace referendum. The data includes five variables: department (equivalent to U.S. states), total of NO votes, total of YES votes, number of registered voters, and number of rebel attacks during the height of the insurgency. The raw data is on our Canvas site.

  1. In Stata/R, import the raw data and generate two new variables. First, calculate the vote share of the NO vote. This is the NO vote share of all ballots cast. Call this variable NO_VS. Second, calculate departmental turnout. This is the sum of all ballots cast divided by the number of registered voters in the municipality. Name this variable DEPT_TO. Report the mean for each variable. Produce a clearly labeled histogram of each variable.
  2. Report the bivariate correlation between NO_VS and the variable measuring exposure to rebel violence during the height of the insurgency (RV_EXPOS). Then report the bivariate correlation between DEPT_TO and RV_EXPOS. What, if anything, do you learn?
  3. Use the Stata collapse command or comparable command in R to sum all ballots cast by type (YES, NO) as well as the total number of registered voters for the entire country. Although the collapse command usually includes a by() argument (e.g., collapse (sum) X Y Z, by(unit, time)), we do not need one for this exercise. Recalculate the mean of the NO vote share and departmental turnout. Report these values. Why do these values dier from the reported means of NO_VS and DEPT_TO above?