C代写:COMP20005 Sequentially Process a File

用C语言实现一个文件分析程序,根据每日的温度记录进行统计分析。

Learning Outcomes

In this project you will demonstrate your understanding of loops and if statements by writing a program that sequentially processes a file of text data. You are also expected to make use of functions (Chapters 5 and 6) and arrays (Chapter 7, and covered in lectures in Weeks 6 and 7). The sample solution that will be provided to you will also make use of structures (Chapter 8), and you may do likewise if you wish. But there is no requirement for you to make use of struct types.

Sequential Data

Vast numbers of scientific and engineering datasets are stored in text files using comma separated values format, usually with a one-line header describing the contents of the columns. A key requirement is to be able to process this data, looking for trends, patterns, and insights.

The dataset used in this project was generated by the Bureau of Meteorology, and was accessed 27 March 2017 from http://www.bom.gov.au/climate/data/index.shtml (dataset IDCJAC0010, station 86282 “Melbourne Airport”), and then edited to make the test files provided on the LMS. The editing included removing some of the columns, replacing 19 missing values by -999, and then extracting three different subsets into three test files: data00031.txt, Melbourne temperature data for one month, March 1990; data00365.txt, Melbourne temperature data for one year, 1971; and data16802.txt, Melbourne temperature data for 45 years, 1971 to 2016 inclusive. The first few lines and last two lines of the file data00031.txt are

Product code,BoM station,Year,Month,Day,Maximum (C),Minimum (C)
IDCJAC0010,86282,1990,3,1,24.9,18
IDCJAC0010,86282,1990,3,2,30.2,15.7
IDCJAC0010,86282,1990,3,3,28.2,17.2
IDCJAC0010,86282,1990,3,4,28.6,18
IDCJAC0010,86282,1990,3,5,25.2,17.5
            [etc]
IDCJAC0010,86282,1990,3,30,19.2,-999
IDCJAC0010,86282,1990,3,31,19.8,9

where, as already noted, -999 indicates missing entries (perhaps equipment failure, or similar issues).

Stage 1 - Control of Reading and Printing (marks up to 4/10)

Your program should read the entire input dataset into a collection of parallel arrays (or, if you are adventurous, an array of struct), counting the items as it goes. The first line of the input file should be discarded without being retained. When the entire dataset is in memory, your program should print the first and last of the input records. The output this stage produces when given data00031.txt is shown by this interaction:

Stage 1
------
Input has 31 records
First record in data file:
  date: 01/03/1990
  min : 18.0 degrees C
  max : 24.9 degrees C
Last record in data file:
  date: 31/03/1990
  min : 9.0 degrees C
  max : 19.8 degrees C

To read the comma-separated data lines from the rest of the input file, you should use this recipe:

1
scanf("IDCJAC0010,%d,%d,%d,%d,%lf,%lf\n", &location, &yy, &mm, &dd, &max, &min)

That is, you may assume that the “Product code” value is fixed, but not the “BoM station”. You will need a separate while(getchar()) loop to consume the first line.

You may (and should) assume that at most 50,000 days will be covered by the input data. Notice how the output formatting of the first record is the same as the output formatting for the last one. You need to be thinking about functions at every opportunity.

Stage 2 - Computing Stuff (marks up to 6/10)

Of course, the goal is to try and compute average temperatures, and see if they have changed over the years. In this stage your program should accumulate the average minimum temperature for each year represented in the input file, and the average maximum. Note that due to recording errors, some temperatures show as the value -999. These values should be ignored when computing the average, as the corresponding years were each a little shorter.

For example, for the file data00365.txt, the required output is just two lines:

Stage 2
------
1971: average min: 9.37 degrees C (365 days)
      average max: 19.45 degrees C (365 days)

because all of the records in that file fall into a single year, 1971 (William McMahon was Prime Minister of Australia, Richard Nixon was President of the United States, Leonid Brezhnev ruled the Soviet Union [and the Cold War was active], Mao Zedong was paramount Leader of China [and no foreigners could enter the country at all], Edward Heath was Prime Minister of England, and Alistair was a middle-school student). Multi-line output for the larger file data16802.txt is given on the assignment FAQ page.

Wherever appropriate, code should be shared between the stages through the use of functions. In particular, there shouldn’t be long (or even short) stretches of repeated or similar code appearing in different places in your program.

You may assume that the data records are presented in strictly increasing date order, and that you are not required to sort them. You must not assume that there will be any particular year range in the input data, and must not assume that the months and dates will be exhaustive (there may be missing days, missing months and maybe even whole missing years).

Stage 3 - Make A Picture (marks up to 8/10)

Modify your program so that it also generates a “by the month” horizontal graph with (always) twelve rows showing the range between average observed minimum and average observed maximum temperatures, where the averages are computed over every line in the input file that corresponds to each of the months. Leave a row blank if there are no readings for that month in the input data file.

These numbers are the long-term average monthly minimum and the long-term average monthly maximum. For example, on the input file data00365.txt, this graph should be generated, where the numbers show how many min and max observations went into each of the average temperatures that are plotted:

Stage 3
------
Jan ( 31, 31) |                              ************************
Feb ( 28, 28) |                                ************************
Mar ( 31, 31) |                             ************************
Apr ( 30, 30) |                          *********************
May ( 31, 31) |                     ******************
Jun ( 30, 30) |            **************
Jul ( 31, 31) |        ******************
Aug ( 31, 31) |        ********************
Sep ( 30, 30) |            ********************
Oct ( 31, 31) |              **********************
Nov ( 30, 30) |                 **********************
Dec ( 31, 31) |                      ***************************
              +---------+---------+---------+---------+---------+---------+
              0         5        10        15        20        25        30

Further examples showing the full output that is required for the three different test files are provided on the LMS. You should also make your own test files, by editing out different subsets of the data that is provided, and/or creating them by hand.

Stage 4 - Climate Science In Action (marks up to 10/10)

Now for some climate science. Suppose we suspect that there is a general trend for temperatures to be rising with time. To get evidence of that, we decide to count the number of months in each year in which the average minimum for that month is greater than the long-term average minimum for the same month, and for which the average maximum for that month is greater than the long-term average maximum. That is, each year gets given a score of between zero (meaning, every month that year both the average minimum and average maximum were below the corresponding long-term averages) and 24 (meaning, every month that year both the average minimum and average maximum were above the corresponding long-term averages). Of course, most years will be somewhere in between these extremes.

Modify your program so that it prints out the score associated with each of the first five years and each of the last five years in the period covered by the data file (or fewer years, if there are less than ten years in total covered). It only really makes sense to apply this computation to the biggest data file, data16802.txt), and for this stage you may assume that each year that is represented in the input will have data for all twelve months (so that the scores are always out of 24). You should also ensure (as a minimum requirement) that your program does not generate a runtime error on the other two data files that are provided. Here is the required output for the file data16802.txt:

Stage 4
------
1971: score is  8/24
1972: score is 14/24
1973: score is 10/24
1974: score is  7/24
1975: score is 13/24
--
2012: score is 14/24
2013: score is 17/24
2014: score is 22/24
2015: score is 14/24
2016: score is 18/24

If there was no upward or downward trend in the data, these scores could be expected to average around 12 out of 24, as typical fluctuations around the mean. What this data suggests that Melbourne is hotter now than it was in the 1970s. Still sceptical about climate change? Ready to convince a politician?