R代写:CSC121 Small Assignment

代写R语言小作业,学习基本的R语言语法。

Requirement

This assignment may be handed in late. Assignments will not usually be accepted after that. Contact the instructor as soon as possible if you have a legitimate excuse (such as documented illness) for handing in the assignment late (without penalty).

In this assignment, you will write and test an R function filling in missing observations in a vector of sequential observations, and apply it in an R script file for reading and modifying data in an R data frame.

As discussed in the week 7 lectures, a “missing” observation can be indicate in R by a special NA value, which can occur anywhere a number, string, or logical value would otherwise appear. Since missing observations are very common in practice, handling them in some way is an important part of statistical analysis. One approach is to replace each missing observation with some value found from the other observations. A simple approach, for example, is to replace them with the average of all the non-missing observations, but this will often be too simple, producing misleading conclusions.

In this assignment you will implement another approach, that is applicable when the observations come in a sequence (eg, are a time series), for which order is meaningful. In this situation, one might decide to replace a missing observation by the average of the observation before and the observation after. However, either or both of those might be missing as well. So more generally, the method is to fill in a missing observation by linearly interpolating between the nearest non-missing observations that come before and after the missing observation. Missing observations at the beginning of the sequence (with no non-missing observations before) are filled in with the first non-missing observation, and similarly for missing observations at the end of the sequence. If all observations are missing, they remain missing, since there is no data at all to use to fill them in.

You should write a function called na_interpolate that takes as its only argument a numeric vector, and returns as its value the result of replacing missing values (if any) in this vector according to the method described above.
Here is an example call of this function:

> na_interpolate (c (4,NA,5,NA,NA,NA,8,10,NA,NA))
  [1] 4.00 4.50 5.00 5.75 6.50 7.25 8.00 10.00 10.00 10.00

The general formula for filling in a missing value by linear interpolation is as follows.

Here, d1 is the distance to the closest non-missing observation before the one to be filled in, x1 is the value of this observation, d2 is the distance to the closest non-missing observation after the one to be filled in, and x2 is the value of this observation. So, for example, the missing observation after the 5 in the example above is filled in.

You should create a test script with examples such as the one above in order to test your na_interpolate function.

Once your na_interpolate function is working, you should create a script to apply it to a made-up data set recording weather hour-by-hour on two days, which you can read (as a data frame) from the course web page.

This data frame has several variables (columns), including temperature and pressure. Your script should fill in missing values (separately) in these two variables using your na_interplate function, and then print the modified data frame.

You should hand in three script files, one with only the definition of your na_interpolate function, one with your tests of this function, and one that uses this function to fill in missing values in the weather data described above. You should also hand in the output of the last two scripts, as two text files.

Here is a suggested approach to writing the na_interpolate function. Start by creating a vector the same length as the argument vector, which at position i contains the index in the argument vector of the last non-missing observation at or before i (or zero if there is none). Similarly, create a vector that at index i contains the index of the earliest non-missing observation at or after i. Then use these two vectors to modify elements of the argument vector by filling in interpolated missing values.

You should not use features that we have not covered yet when doing this assignment (except for minor features that don’t affect the overall method used). In particular, you should not try to use some R package for filling in missing values that may already implement this method!