R代写:STA248 Problems

使用R计算结果后,回答作业中的问题。

Requirement

Remember to show all your work in the assignment. Solutions without justifications will not earn any marks. Your assignment should consist of a document or image of your answers/analysis/explanations, followed separately by a PDF file of your R code with appropriate comments for your work as an appendix item in each question that used R.

Do not submit just the R code with embedded outputs. All the R code needed has been provided for you either in the assignment or under ‘Course Materials’ on Blackboard.

Note that contextual problems should include some form of conclusion or interpretation, not just a numerical answer!

The goal of the assignments is to assess whether you can apply the concepts discussed in lecture. While there may be different paths to the solution, it is your responsibility to demonstrate in your solutions that you have learned and can apply the course material appropriately. This includes verifying any assumptions as necessary.

Problem 1

A local product manager is under the impression that the three local factories have the same production efficiency and hourly outputs. The regional product manager believes otherwise and decided to study the hourly outputs between factories A, B, and C over ten days. The data is recorded below:

A B C
24 37 36
31 34 33
35 35 33
18 39 37
32 44 32
33 38 28
27 36 38
27 34 30
27 35 35
26 51 30

a) State the appropriate null and alternative hypotheses. Use side-by-side boxplots to verify the assumptions before proceeding. Comment on what the boxplots tell you about the assumptions. You may use R to construct your boxplots, be sure to label appropriately. A plot should convey all necessary information without words!.

b) Does the collected data support the claim that all three factories have the same hourly outputs? Show all your work, state and interpret your p-value. Describe one extraneous variable in this study. This problem should be done by hand.

Problem 2

Use R to answer this problem. Download the Olympics.csv data set on Blackboard Dataset obtained from the Journal of Statistics Education (http://ww2.amstat.org/publications/jse). Used by permission of author.
You can read the data set into R by selecting the file from your downloads:

cdt<−read.csv(file.choose(), header=T)

You can read more about the format of the data and variable descriptions here, scrolling down to London 2012 Olympics Data. We will be using this data to examine the possible relationship between a country’s GDP and the total number of medals won during the London 2012 Olympics.

a) In your own words, describe (or hypothesize) what association, if any, there might exist between a country’s wealth and their performance in the olympics. What would be a purpose to modeling this possible association?

b) Fit a simple linear regression in R to model the response Total Medals against the predictor GDP (measured in $10 trillion). What is the regression line? Interpret the estimates b0 and b1 in this context - what do they measure?

c) Produce the 4 diagnostic plots that we discussed in class and use them to comment on the assumptions and appropriateness of the model for this data.

d) Assuming the model is appropriate, predict the number of medals a fictional country would earn, if their GDP is $9 billion for the year of 2012. Produce an appropriate 95% interval for your estimate, and interpret.

Problem 3

In this problem, you will be examining data to determine if an association exists between marijuana use and dance or party participation. You can read more about the format of the data and variable descriptions here. The study was conducted to determine if behaviour correlates with marijuana use among middle class youths and was published in 1976 (a bit outdated but interesting nonetheless!).

a) State the null and alternative hypotheses and perform an appropriate test for association. State and interpret your p-value and conclusions.

b) Do you think this was an experimental study or an observational study, why or why not? Can the results of this study be generalized to the population? Can we conclude any causal relationships?