The purpose of this project is to allow you to carry out a statistical analysis from start to finish. This project provides the opportunity to apply the principles you learn to an actual problem. Since the focus of STAT 318 is application of statistical methods, this project is the most important part of the class. You will be working independently on this project. This project must be related to Chapter 6 (multiple linear regression (with at least 4 predictors) with normal response.)
As such, this project paper will serve as a substitute for the final exam and account for 25% of your final grade. The following is a basic outline of what will be expected from this project.
Rubric for paper:
- (1) Research question and data set approval: 5 points
- (2) Project report and analysis: 85 points
- (3) Formatting of paper (correct grammar, clear and concise writing, etc. All math equations should be written using the equation editor in Word or in math type in TeX such as RMarkdown): 10 points
Total: 100 points
- Decide on a research question
- As we learn different methods, you will hopefully have ideas about how to apply them to problems you may be interested in. While this is one way to come up with a good research question, you may find the project more enjoyable if you first think of a topic you are interested in and then a question that can be answered using multiple regression (with *at least* 4 predictors).
- I need to approve your research question to ensure that you will be able to carry out the project.
- You may submit these early. I will give feedback as they come in.
- Collect data
- There are plenty of data sets available online. You can use the links below to find data sets. However, I highly recommend that you spend time finding a data set that you are interested in analyzing and not limiting yourself to the data sets from the links below. I also need the source for your data.
- Research paper
- (1) Introduction (1-2 paragraphs): State the research objectives. State any background information of the topic that is important to know.
- (2) Methodology (1-2 paragraphs): Describe in detail what the method is and why it is appropriate to use.
- (3) Results (multiple paragraphs): Answer your original research question by conducting a full analysis and discussing the results. Reports should address model assumptions, model selection (any additional forms or interactions?), model validation, final regression model, interpreting coefficients, providing confidence and prediction intervals. Provide any appropriate plots and tables to help summarize the results. Do NOT copy output from R as any tables you make should be formatted cleanly.
- (4) Discussion (~ 2 paragraphs): Summarize the study and the results. Were your results expected or surprising? Were there any problems with your data? Would a different method work better? Does your analysis raise additional questions that could be investigated? What are the strengths and weaknesses of your model?
- (5) References: Your entire analysis needs to be written in a report. This will need to be written like an essay that you would write for any other class. While I will be mostly grading your report on correct application of statistical principles, I will take points off for sloppy writing. You will need to make sure your equations are written in appropriate equation form (such as using the math editor in Word and math type in TeX). An important part of any statistical analysis is clearly communicating your results. I want to see you put in some effort to that end. There is no page length required because the page length may vary depending on plots and tables you may need to include, but the paper you submit should clearly answer and summarize the research question.
- R Code
- You will need to upload your R code with your final report. This code should be easy to follow, commented, and include all components of your analysis.