In this assignment you are tasked with updating an old piece of software, making sure it compiles, and that it works properly in your VM environment.
Maintaining old code is a chore and an often hated part of software engineering. It is definitely one of the aspects which are seldom discussed or thought about by aspiring computer science students. However, it is prevalent throughout industry and a worthwhile skill to learn. Of course, this homework will not give you a remotely realistic experience in maintaining legacy code or code left behind by previous engineers but it still provides a small taste of what the experience may be like. You are to take on the role of an engineer whose supervisor has asked you to correct all the errors in the program, plus add additional functionality.
By completing this homework you should become more familiar with the C programming language and develop an understanding of:
- How to use tools such as gdb and valgrind for debugging C code.
- Modifying existing C code.
- C memory management and pointers.
- Working with files and the C standard I/O library.
Your goal will be to debug and extend an old program called notation , which was written by Henry Thomas in 1990-1992 and posted to Usenet in early 1992. The version I am handing out is very close to the original version, except that I have made a few changes for this assignment. First of all, I rearranged the source tree and re-wrote the Makefile to conform to what we are using for the other assignments in this course. I also introduced a few bugs here and there to make things more interesting and educational for you . Aside from these changes and the introduced bugs, which only involve a few lines, the code is identical to the original, functioning version.
The purpose of the notation program is to read an input file containing the transcript of a chess game in “algebraic” notation, and to write an output file that consists of that game transcript, but possibly annotated with diagrams showing the board at specified positions in the game. The program is also able to translate the transcript between various European languages (basically what this means is that the letters that are used to denote the various chess pieces in the input transcript are substituted with versions appropriate for the target language).
What you have to do is to first get the program to compile (for the most part, I did not modify the original code, which requires some changes for it to compile cleanly with the compiler and settings we are using). Then, you need to test the program and find and fix the bugs that prevent it from functioning properly. Finally, you will make some modifications to the program.
As you work on the program, limit the changes you make to the minimum necessary to achieve the specified objectives. Don’t rewrite the program; assume that it is essentially correct and just fix a few compilation errors and bugs as described below. You will likely find it helpful to use git for this (I did). Make exploratory changes first on a side branch (i.e. not the master branch), then when you think you have understood the proper changes that need to be made, go back and apply those changes to the master branch. Using git will help you to back up if you make changes that mess something up.
Fetch base code for hw2 as you did for the previous assignments.
Once again, to avoid a merge conflict with respect to the file .gitlab-ci.yml , use the following command to merge the commits:
git merge -m "Merging HW2_CODE" HW2_CODE/master --strategy-option=theirs
The doc directory included with the assignment basecode contains various files that were distributed with the program. One of these is a Unixstyle man page ( notation.n ). You can format and read the man page using the following command:
nroff -man doc/notation.n | less
I corrected some errors in the original formatting commands to allow the document to be processed more or less correctly with groff (the GNU re-engineered version of nroff ). The file notation.doc contains what is presumably the original result of running the then-current version of nroff on notation.n .
Other files in the doc directory are Makefile.orig (the original Makefile ) and various other files that were in the original distribution. They should not be necessary for doing the assignment.
The source files for the notation program are in the include and src directories. The include directory contains the following:
- chesssymb.def – Defines macros used in src/drivers.c and src/notation.c in a clever way to construct tables of standard annotations that are used in chess literature.
- chesstype.h – Defines various macros and types that are mostly related to the representation of chess boards, moves, and pieces in the program.
- debug.h – Defines macros for debugging printout. This was not in the original distribution; it is the one we are using in this course.
- drivers.h – Defines the interface to the output driver code found in src/drivers.c , in a kind of object-oriented style.
- lexer.h – extern declarations used by the lexical analyzer in src/lexer.c to interface with the flex lexical analyzer generator package (more about this later).
- notation.h – Defines constants and function prototypes used by the code in src/notation.c (the main program).
The src directory contains the following:
- drivers.c – Various “output drivers” that are called to emit output in various formats, such as plain text, LaTeX, Postscript, etc.
- lexer.c – Generated lexical analyzer that is the result of processing src/lexer.l with the flex lexical analyzer generator.
- lexer.l – flex source code for the lexical analyzer. Basically gives regular expressions for each kind of input “token” that the program will recognize, and specifies actions to be performed each time a token is seen.
- main.c – A file I added which contains only the main() function. It calls notation_main() (the original main() function) in src/notation.c . This was done so that we can use Criterion tests with the program (recall that Criterion uses its own main() ).
- notation.c – Contains the main body of the program.
The lib directory contains:
- notation.hlp – Contains the help message that is printed out when the program is invoked with the -h option.
- Header.ps , Footer.ps , notation.tex – “boilerplate” header and footer files inserted by the output drivers for the TeX and Postscript output formats.
The rsrc directory contains some files that can be used as test input to the program:
- algebric.ntn – Game using “algebraic” notation, no text.
- boudy.ntn – Game using “shortened algebraic” notation, with variations, no text.
- keywords.ntn – “Nonsense” file (no game), just demonstrating the use of keyword commands to show the board and reconfigure the position.
- shortened.ntn – Game using “shortened algebraic” notation, with interspersed text.
The tests directory contains C source code (in file hw2_tests.c ) for some Criterion tests that can help guide you toward bugs in the program. These are not necessarily complete or exhaustive. The test_common.c and test_common.h contain auxiliary code used by the tests. The subdirectory tests/rsrc contains files that are used by the tests.
Before you begin work on this assignment, you should read the rest of this document. In addition, we additionally advise you to read the Debugging Document.
The core function of the program is to read a transcript of a chess game, interpret the moves, and keep track of the current board position. To simplify this process, the program makes use of a lexical analyzer generator tool called flex . A lexical analyzer is used to identify “tokens” in its input. The flex tool is a GNU-re-engineered version of a traditional Unix tool that was originally called lex . The flex program reads an input file that contains a set of regular expressions that define the various forms that tokens can take, and it generates transition tables for a finite automaton that can scan for occurrences of these patterns in the input. The core lexical analyzer code is always the same: it just uses the table to simulate the finite automaton. Each time a token is identified in the input, an associated action is taken. For example, if a token is identified that matches the form of a chess move, then the associated action would be to update the current board position.
In the notation program, the flex input file is src/lexer.l . The file src/lexer.c is the output of flex , which contains the generated lexical analyzer. The flex tool is not installed by default on your Linux Mint VM, and though it would be possible to install it, it will not be necessary, because you will not need to make any changes to lexer.l that would require regenerating the lexer.c file. This information is being provided so you understand better the structure of the program and you know you don’t have to worry about what is going on in lexer.c .
Actually, the regular expression given for a chess move in src/lexer.l only defines things that might be a move. Once a token matching this pattern has been identified, the lexical analyzer calls the function parse_move() in src/notation.c . This function also uses a table-driven scheme (see transit and action ) to determine if the token really does legitimately represent a chess move. You might have to look at the parse_move() function, but you do not need to understand the contents of the tables that it uses.
The command line arguments and expected operation of the program are described by the following “Usage” message, which is printed within notation_main() in notation.c (the actual text of the message is read from the file lib/notation.hlp ):
There are a few options that the program understands:
- If -a is specified, then the output of the program will use “long” algebraic notation;
- If -s is specified, then the output of the program will use “short” algebraic notation;
- If -f is specified, followed by an argument [language] , then the program will use the specified language to interpret the letters used in the input to represent the chess pieces;
- If -t is specified, followed by an argument [language] , then the program will use the specified language to determine the letters used in the output to represent the chess pieces;
- If -o is specified, followed by a filename, then the program will print output to the specified file instead of the default stdout .
- If -c is specified, then the next argument gives a comma-separated list of moves after which the board should be displayed.
- If -e is specified, then the next argument gives a move number after which to display the board and exit.
- If -d is specified, then the next argument gives the name of the output driver to be used in generating the output.
- If -i is specified, then header and footer “boilerplate” that would normally be inserted at the beginning and end of output files is omitted.
- If -h or an unknown option is specified, then a help message is printed;
- If -v is specified, then the program exits after printing the version number;
You are to complete the following steps:
- Clean up the code; fixing any compilation issues, so that it compiles without error using the compiler options that have been set for you in the Makefile . Use git to keep track of the changes you make and the reasons for them, so that you can later review what you have done and also so that you can revert any changes you made that don’t turn out to be a good idea in the end.
- Fix bugs.
Run the program, exercising the various options, and look for cases in which the program crashes or otherwise misbehaves in an obvious way. We are only interested in obvious misbehavior here; don’t agonize over program behavior that might just have been the choice of the original author. You should use the provided Criterion tests to help point the way, though they are not guaranteed to be exhaustive.
- Use valgrind to identify any memory leaks or other memory access errors. Fix any errors you find.
Note that the bugs that are present will all manifest themselves in some way either as incorrect output, program crashes or as memory errors that can be detected by valgrind . It is not necessary to go hunting for obscure issues with the program output. Also, do not make gratuitous changes to the program output, as this will interfere with our ability to test your code.
The base code performs options processing as part of the function notation_main() (which was the main() function in the original program). It uses ad hoc techniques to process the arguments, which are typical of what many C programs might do. However, as options processing is a common function that is performed by most programs, and it is desirable for programs on the same system to be consistent in how they interpret their arguments, there have been more elaborate standardized libraries that have been written for this purpose. In particular, the POSIX standard specifies a getopt() function, which you can read about by typing man 3 getopt . A significant advantage to using a standard library function like getopt() for processing command-line arguments, rather than implementing ad hoc code to do it, is that all programs that uses the standard function will perform argument processing in the same way rather than having each program implement its own quirks that the user has to remember.
For this part of the assignment, you are to replace the original argument-processing code in notation_main() by code that uses the GNU getopt library package. In addition to the POSIX standard getopt() function, the GNU getopt package provides a function getopt_long() that understands “long forms” of option arguments in addition to the traditional single-letter options. In your revised program, notation_main() should use getopt_long() to traverse the command-line arguments, and it should understand the following alternative forms for the various options (as well as the original short forms):
- –long-algebraic as equivalent to -a
- –short-algebraic as equivalent to -s
- –output-language as equivalent to -f
- –input-language as equivalent to -t
- –output-file as equivalent to -o
- –show-after as equivalent to -c
- –end-after as equivalent to -e
- –no-headers as equivalent to -i
- –help as equivalent to -h
- –version as equivalent to -v
You will probably need to read the Linux “man page” on the getopt package. This can be accessed via the command man 3 getopt . If you need further information, search for “GNU getopt documentation” on the Web.
The original program uses malloc() to allocate memory for things like the game board, the list of moves in the main line of play, and lists of moves in sub-variations. However, the program does not always free this memory. Your job is to modify the program so that all the memory it allocates is freed before the program terminates. This should be true whether or not the program terminates normally or with an error. In order to accomplish this, you will have to study how the program uses the memory that it allocates, and determine where to introduce calls to free() in order to free this memory. You will probably not find it very easy to get this right, so consider it a challenge. You will also need to call the function yylex_destroy() before exiting, in order to free memory that was allocated by flex . If you are successful with these modifications, when you run the program with valgrind it will not report any memory that is “lost” or any memory that is “still reachable” at the end of execution.
For this assignment, you have been provided with a basic set of Criterion tests to help you debug the program.
In the tests/hw2_tests.c file, there are five test examples. You can run these with the following command:
To obtain more information about each test run, you can supply the additional option –verbose=1 .
The tests have been constructed so that they will point you at most of the problems with the program. Each test has one or more assertions to make sure that the code functions properly. If there was a problem before an assertion, such as a “segfault”, the test will print the error to the screen and continue to run the rest of the tests. Two of the tests use valgrind to verify that no memory errors are found. If errors are found, then you can look at the log file that is left behind (in the tests.out directory) by the test code. Alternatively, you can better control the information that valgrind provides if you run it manually.
The tests included in the base code are not true “unit tests”, because they all run the program as a black box using system() . You should be able to follow the pattern to construct some additional tests of your own, and you might find this helpful while working on the program. You are encouraged to try to write some of these tests so that you learn how to do it. Note that in the next homework assignment unit tests will likely be very helpful to you and you will be required to write some of your own. Criterion documentation for writing your own tests can be found here.
Ensure that all files you expect to be on your remote repository are committed and pushed prior to submission.
This homework’s tag is: hw2
$ git submit hw2