R代写:IB9CSB Your Data Science Project

代写数据处理作业,任意选取一个数据集,对数据特征进行猜测,然后编写程序进行提炼和分析。

Overview

In this course, you have been learning how our everyday interactions with technology are creating huge amounts of data capturing human behaviour worldwide. You have learned how this sort of data can help data scientists measure what is going on in the world, and even make better predictions about how people might behave in the future.

In this final assessment, you are asked to pose an interesting question which can be answered using these new datasets and the data science skills you now possess. You then need to acquire the relevant data, process it into a form which you can analyse, carry out the statistical analysis, and produce relevant visualisations to illustrate your results. You also need write up your results in a clear and engaging style.

The aim of this project is for you to have an opportunity to apply your skills to a question which you are interested in, and at the same time, produce a document which you can use to demonstrate your skills to future employers. Good luck and have fun!

What to submit

Please submit your final write-up as a PDF. Please also submit any datasets you have used in your analysis, and the R code which you have written to process this data.

You should provide clear comments in your code so that it is easy to understand what it does. Save your code in a script. Do not submit your R workspace, your command history or your RStudio project.

You should also provide a PDF document explaining what data is contained within the dataset files.

Combine all your files into a zip file for upload to my.wbs. If you expect your zip file to be greater than 20MB, please speak to us about this at least one week in advance.

Further guidance

Your question

You are strongly recommend to use the question you identified in Assignment 2, which you will have received feedback from us about, with any changes which we have suggested. This will help you avoid encountering difficulty in acquiring the data, analysing the data, or in motivating the relevance of your question.

If you wish to investigate a different question, please speak to us first. We may be less able to offer you support with new questions.

Your analysis

Your aim is to carry out your analysis in a way that third parties could easily replicate it and verify your findings. You should therefore write your code in as clear a style as possible, with comments to help explain what your code does where necessary. You should also provide clear documentation of the data sets which you have used and which you submit - for example, what data is contained in each file and where this data was acquired from.

To practice and develop the skills you have acquired during this course, you should carry out your analysis in R.

Your write-up

Remember that the goal of this assessment is to carry out the analysis to evaluate an interesting question. As long as your question is well motivated, do not worry if your results do not turn out as you hoped. Just make sure that, in your write-up, you provide a clear motivation for your question, a clear description of the analysis you carried out to evaluate it, and a clear evaluation of the results you found.
Your write-up should be no longer than 3,000 words, and should be structured as follows:

  • Title
    • Your title should convey the main thrust of your analysis and results, but crucially should also catch the reader’s attention.
    • Your title should have a maximum of 15 words (although good titles are normally shorter!)
  • Abstract
    • In your abstract, you should first briefly explain what your question is and why it is interesting.
    • You should then explain the results of your analysis.
    • You should finally describe the conclusions of your analysis - in other words, what your results mean.
    • Your abstract should be no longer than 150 words.
  • Introduction
    • The main goal of your introduction is to motivate your question and introduce your analysis.
    • You should therefore provide enough background to make the value of your analysis clear.
    • You should cite between 5 and 10 papers which are related to your analysis.
    • You should then clearly explain what your analysis sets out to do. What is your question? What do you expect to find?
    • You may wish to give an initial indication of the results, but this is a stylistic decision.
    • There is no word limit for your introduction, but make sure your writing style is concise.
  • Methods and Results
    • In the methods and results section, you should very clearly explain what analysis steps you carried out, and what the results were.
    • As a guide to the level of detail required, you should include enough information in this section to enable someone else to reproduce your analysis without access to your code or the data you downloaded.
    • To achieve this, you should make the source of your data clear, including providing references for websites from which you have downloaded the data. You should also clearly describe any calculations you carried out on the raw data you downloaded to reach your final results. You do not need to make reference to the specific R functions that you used to do this, however.
    • All statistical tests should be reported appropriately, including at least details of the sample size (or degrees of freedom), the value of the test statistic calculated and, where calculated, the p-value.
    • You should also describe any assumptions of the analyses you carried out (e.g., should your data be normally distributed?) and show how you checked that these assumptions hold.
    • You should provide at least one figure which visualises your findings. We will give you
      20% of your marks for visualisation as detailed below. Figures should always have appropriately labelled axes, with the units of measurement specified.
    • If appropriate, you can provide up to four figures. You can also construct figures which contain multiple subfigures. However, only include important figures which help you tell your story. You need to be as concise with your figures as you are with your words.
    • Under each figure, provide a caption which clearly outlines to the reader what data the figure shows, and what patterns the reader should note in the data. Each caption should be no longer than 350 words.
    • To capture the attention of busy readers and to help them understand your analysis, you should aim to produce figures which, together with the figure captions, convey the basic story of your analysis on their own.
    • There is no word limit for your methods and results, but make sure your writing style is concise.
  • Discussion
    • The discussion should briefly summarise what you have done, and discuss what your findings mean.
    • To make your document as accessible as possible to busy readers, it is a good idea to ensure that your discussion would make sense if the reader had not read the rest of the document.
    • You may wish to begin by briefly summarising the motivation for your study once again.
      You can then restate your research question.
    • Next, give a brief indication of the nature of your analyses and summarise what your analyses found.
    • Indicate which answer to your research question your findings provide support for. Is this what you expected?
    • Try to offer a potential explanation for your findings. If you have found the pattern you expected, you may have already hinted towards this explanation in your introduction. If you did not find what you expected, why do you think this is?
    • It is not a problem if you are not sure why you found a particular pattern - simply suggest some possible ideas. It is very important that you are careful not to overstate your case. In particular, most investigations do not “prove” anything on their own, but you may have found new strong or weak support for a given idea.
    • Indicate what the implications of your investigation are. For example, have you highlighted a new opportunity to use a certain dataset to measure or forecast a certain type of behaviour? Have you provided evidence of an interesting behavioural pattern? Have you helped explain a previously observed behavioural pattern? Have you provided evidence that a particular line of enquiry may not be worth following further? What might people be able to do once they have read your results, which they might not have been able to do before?
  • References
    • You should provide full references for all papers you have cited. There are many styles of referencing. For this assessment, please use ONE of the following reference styles (do not mix two!)

How marks will be allocated

You will receive marks for the following:

  • Quality of question
    • This area is worth 20% of your final mark for the module.
    • You will be awarded marks for choosing a question which was interesting and feasible to answer.
    • You can emphasise how interesting your question is by motivating it well in the introduction. Who would be interested in the answer, and why? You may be able to provide more evidence of the value of your question in the discussion as well.
    • Again, if you have provided a good motivation for why your question was worth investigating and why you believed you might find an interesting answer, do not worry if your results do not turn out as you hoped.
    • You can emphasise how feasible your question was to answer by completing an appropriate analysis in the methods and results, and crucially, not overstating your findings in the discussion. You need to show an answer could be provided to your question from available data and analysis methods without a leap of faith!
  • Quality of analysis
    • This area is worth 20% of your final mark for the module.
    • You will be awarded marks for choosing an analysis method appropriate for answering your question; verifying that assumptions made by this analysis method hold (e.g., should your data be normally distributed?); carrying out the analysis correctly; and correctly interpreting the results of the analysis.
    • You will also be assessed on whether you have motivated any pre-processing steps well (e.g., you have not left out half of your dataset without explaining why!)
    • You can make it easier for your analysis to be correctly assessed by providing a clear and concise description of your analysis in the methods and results, and by clearly documenting both your code and the datasets which you have analysed.
  • Quality of visualisation
    • This area is worth 20% of your final mark for the module.
    • Crucially, you should provide visualisations which tell the story of your analysis in a clear, concise and engaging fashion.
    • You will be awarded marks for providing appropriate and legible visualisations for your data and analysis, and labelling your visualisations well (e.g., all axes are labelled, including units of measurements, legends are provided to explain different colours or line types used, and font sizes are not too small).
    • You will also be awarded marks for choosing an appropriate selection of visualisations.
      Remember, you should only include the visualisations which help tell your story - one may be sufficient. Do not simply include every possible visualisation you can think of!
    • You will be awarded marks for creating an attractive visualisation. The base level of plots generated by the ggplot2 library is good, but it will also allow you to change many different aspects of your visualisation where you feel this is appropriate, from colours, to line thickness, to font used, and more.
    • For the purposes of this assessment, please make all changes to your figures by writing code in R, apart from assembly of multi-panel figures which you can do in an external program (e.g., Word).
    • You will also be awarded marks for good figure captions. Do your figure captions meet the specification outlined in the structure above? Do your figures and figure captions together successfully tell the main story of your analysis?
  • Quality of written description
    • This area is worth 20% of your final mark for the module.
    • You should provide a clear, concise and engaging written description of your investigation.
    • You will be awarded marks for using the structure described above and covering all the points highlighted in the structure description.
    • Within individual sections, you will be awarded marks for structuring your writing well, to make your arguments and descriptions easy to follow.
    • You will be awarded marks for the style of your writing. Is it clear, concise, and engaging? Have you kept your sentences short where possible? Have you used correct grammar and appropriate vocabulary? (Simple vocabulary is often easier to understand - do not use complicated words for the sake of it!)
    • You will also be assessed on whether you have correctly observed conventions for reporting statistical results and providing references to previous work.

A final note

Please make sure you observe the WBS plagiarism guidelines to ensure you do not needlessly lose marks. You can see these in full on the next page.

In particular, it is extremely important that you do not copy text from existing sources or your classmates. For this assessment, you are also strongly recommended to avoid including any quotes - this should not be necessary. Write everything in your own words and provide clear references where you refer to ideas and results you have read about elsewhere.

We have seen some great work and great questions on this course. We are looking forward to you submitting some excellent data science projects!