- Your code must successfully compile and run on Submitty, which uses Ubuntu v14.04.5 LTS.
- You must use C for this homework assignment, and your code must successfully compile via gcc with absolutely no warning messages when the -Wall (i.e., warn all) compiler option is used. We will also use -Werror, which will treat all warnings as critical errors.
- Note that the gcc compiler is version 4.8.5 (Ubuntu 4.8.5-2ubuntu1~14.04.1).
In this first homework, you will use C to implement a relatively simple text file manipulation program. The goal is to become more comfortable programming in C on Linux, in particular handling strings and dynamically allocating (and re-allocating) memory. Overall, your program will open and read a text file, parse all words from the file, store each parsed word into a dynamically allocated array, then display the words and their respective number of occurrences. You will use a second array to keep track of the word occurrence counts.
The input file is specified on the command-line as the first argument. The second command-line argument, which is optional, is an integer specifying the number of words to display in the output. Note that words are displayed in the order that they are first encountered in the given text file.
As an example, you could execute your program and have it read hw1-input01.txt and display word counts for all of the words in the file as follows:
bash$ ./a.out hw1-input01.txt
As another example, you could execute your program and have it read the same input file, but display word counts for only the first ten words in the file as follows:
bash$ ./a.out hw1-input01.txt 10
Words are defined as containing one or more alphanumeric characters. Further, words are case sensitive (e.g., “Lion” is different than “lion”). Consider using fopen(), fscanf(), fgetc(), isalnum(), and other such string and character functions. Check out the details of each function by reviewing the corresponding man pages from the terminal.
If improper command-line arguments are given, report an error message to stderr and abort further program execution. In general, if an error is encountered, display a meaningful error message on stderr and abort further program execution.
Error messages must have the following format:
For your data structures, you must use two parallel arrays, meaning that the elements stored in each array at common index j correspond to one another. In this case, each word is tied to its corresponding count. More specifically, the first array contains words encountered in the given input file. The second array contains integers that represent the number of occurrences of the corresponding word in the file.
Your program must store words in the first array as you discover them in the file, initially setting the corresponding integer in the second array to 1. If a word already exists in the data structure (use linear search here), do not add it again; instead, incremement its corresponding count in the second array.
Note that you must store all words in memory, not just the words to be displayed. Further, both arrays must be dynamically allocated on the runtime heap. To do so, dynamically create the first array as an array of character pointers (char*) using calloc(). Next, dynamically create the second array as an array of integers, also via calloc(). Start with an array size of 8. If the size of the arrays needs to be increased, use realloc() to do so, doubling the size each time.
You must also dynamically allocate the memory to store each word (since a char* is simply a pointer). To do so, use malloc() or calloc(); be sure to calculate the number of bytes to allocate as the length of the given word plus one, since strings in C are implemented as char arrays that end with a ‘\0’ character.
To read in each word from the file, you may use a statically allocated character array of size 80. In other words, you can assume that each word is no more than 79 characters long.
Finally, be sure to use free() to ensure all dynamically allocated memory is properly deallocated. Use valgrind to verify that there are no memory leaks.
When you execute your program, you must display a line of output each time you allocate or re-allocate memory for the two parallel arrays.
As an example, if you are given the following hw1-input01.txt file:
Once when a Lion was asleep a little Mouse began running up and down upon him; this soon wakened the Lion, who placed his huge paw upon him, and opened his big jaws to swallow him. "Pardon, O King," cried the little Mouse: "forgive me this time, I shall never forget it: who knows but what I may be able to do you a turn some of these days?" The Lion was so tickled at the idea of the Mouse being able to help him, that he lifted up his paw and let him go. Some time after the Lion was caught in a trap, and the hunters, who desired to carry him alive to the King, tied him to a tree while they went in search of a wagon to carry him on. Just then the little Mouse happened to pass by, and seeing the sad plight in which the Lion was, sent up to him and soon gnawed away the ropes that bound the King of the Beasts. "Was I not right?" said the little Mouse. "LITTLE FRIENDS MAY PROVE GREAT FRIENDS."
Executing the code as follows should display the output shown below:
bash$ ./a.out hw1-input01.txt 10 Allocated initial parallel arrays of size 8. Re-allocated parallel arrays to be size 16. Re-allocated parallel arrays to be size 32. Re-allocated parallel arrays to be size 64. Re-allocated parallel arrays to be size 128. All done (successfully read 184 words; 107 unique words). First 10 words (and corresponding counts) are: Once -- 1 when -- 1 a -- 6 Lion -- 5 was -- 4 asleep -- 1 little -- 4 Mouse -- 5 began -- 1 running -- 1
Match the above output format exactly as shown above.
If the second command-line argument is omitted, display the output shown below:
bash$ ./a.out hw1-input01.txt Allocated initial parallel arrays of size 8. Re-allocated parallel arrays to be size 16. Re-allocated parallel arrays to be size 32. Re-allocated parallel arrays to be size 64. Re-allocated parallel arrays to be size 128. All done (successfully read 184 words; 107 unique words). All words (and corresponding counts) are: Once -- 1 when -- 1 a -- 6 ...
As with the previous example, match the above output format exactly as shown above.
To submit your assignment (and also perform final testing of your code), please use Submitty, the homework submission server. The URL is on the course website.
To make sure that your program executes properly on Submitty, use the techniques below.
First, as discussed in class, output to standard output (stdout) is buffered. To ensure buffered output is properly flushed to a file for grading on Submitty, use fflush() after every set of printf() statements as follows:
printf( ... ); /* print something out to stdout */
fflush( stdout ); /* make sure that the output is sent to the */
/* terminal or redirected output file */
Second, also discussed in class, use the DEBUG_MODE technique to make sure you do not submit any debugging code. Here is an example:
printf("the value of x is %d\n", x);
printf("the value of q is %d\n", q);
And to compile this code in “debug” mode, use the -D flag as follows:
bash$ gcc -Wall -Werror -D DEBUG_MODE homework1.c