Compiler代写:COMP2932 Compiler Project


The Brief

In this project, you will develop a compiler for the JACK programming language described in our textbook. This involves implementing the lexical analyser, the syntactic analyser, the symbol table, the semantic analyser, and the code generation phases of the compiler. The target machine for the compiler is the virtual machine described in. The compiler should produce VM code that successfully run on the VM emulator provided with the textbook (see this link . The specifications of both the JACK language and the target VM code are clearly laid out in. For further information, resources, software tools and many other useful things see this site .

Compiler Details

The JACK source files

A JACK program is a collection of one or more classes. Each class is defined in its own file. JACK source files (you can call them class files) must have the .jack extension. All the source files of a single JACK program should be stored in the same directory.

You will develop a working compiler that accepts as input a directory containing one or more JACK source files. For each source file the compiler should produce an equivalent VM code file having the same name as the source file but with the .vm extension (vm=virtual machine). The target code files should be created in the same directory as the source files. After compilation, the directory will contain the .jack source files and the corresponding .vm files. To run the program using the provided VM emulator, copy the provided OS (library) files into the same program directory, then load the whole thing into the emulator.

Invoking the compiler

It should be possible to invoke your compiler at the command line. The compiler should accept one command line argument representing the name of the directory that contains the JACK source files (the JACK program). For example, if your compiler’s executable file is called myjc and the JACK program directory is called myprog, the compiler would be invoked by typing this at the command line:

myjc myprog

Error Reporting

At any phase of the compilation process, if an error is encountered, the compiler should print out an informative error message, citing the line number where the error was encountered. The message should also cite the token near which the error occurred. Here is an example of a typical error message:

Error, line 103, close to ";", an identifier is expected at this position

Uninformative generic messages such as “syntax error” must be avoided. Upon encountering an error, the compiler should report this error and immediately stop (yes, it is a short-tempered compiler!). The compiler is not required to attempt error recovery or report more than one error at a time.

The Lexical Analyser

You will write a lexical analyser (lexer) module that reads a JACK source file (having a .jack extension) and extracts all the tokens from this file. The lexer should reveal itself to other modules (i.e. the parser) through two main functions:

  • Token GetNextToken(). Whenever this function is called it will return the next available token from the input stream, and the token is removed from the input (i.e. consumed).
  • Token PeekNextToken(). When this function is called it will return the next available token in the input stream, but the token is not consumed (i.e. it will stay in the input). So, the next time the parser calls GetNextToken, or PeekNextToken, it gets this same token.

The lexer should successfully remove white space and comments from the input file and correctly extract all the tokens of the source code. It should not crash or become unstable if the source file contains any kind of lexical errors. One particular type of error to watch for is the unexpected end of file while scanning a string literal, or a multi-line comment.

Once you have finished developing the lexer, you must test its operation using a set of JACK source files which will be provided in due course.

The Grammar

The grammar of the JACK language is given in. However, we will write another version of the grammar. The main purpose of writing a new version is to account for operator precedence in arithmetic expressions which is not currently accounted for in the original grammar described in.

The Parser

You will implement a recursive descent parser. A recursive descent parser is a collection of recursive functions, and can be very easily developed from the grammar of the language. Once you have finished implementing the parser it should be thoroughly tested using the set of JACK source files. All kinds of syntax errors should be reported properly. However, as stated earlier, the compiler should stop on encountering the first error in a source file. It should not crash or become unstable when a syntax error is encountered.

The Symbol Table

You will implement a symbol table to store all program identifiers and their properties. A symbol in this context is any identifier defined in the program such as a variable or method name. We will explain symbol tables and their implementation in due course.

The Semantic Analyser

You will implement a semantic analyser to find and report possible semantic errors in the JACK program. However, this analyser will not be a standalone module. You will insert semantic analysis statements at suitable points in the parser functions. Once the semantic analyser is complete, it should be thoroughly tested.

Code Generation

You will implement code generation in your compiler. However, this will not require a separate module. Code generation statements will be inserted to the parser functions at appropriate points.

Implementation and Planning

You can write your compiler in any of the following languages:

  • C
  • C++
  • Java
  • Python

You are allowed to use any standard library that comes with the programming language. But you are not allowed to use libraries that can perform entire compilation tasks (e.g. tokenization or parsing). You are also not allowed to use complier generator tools. You must write your compiler from scratch following the methods and guidelines explained in this module.

You will adhere to the specifications of the JACK language and the virtual machine code described in. However, you will not follow the development guidelines or the plan of the book. Instead, you will follow the plan detailed in the Teaching and Project Plan spreadsheet handed out at the outset of this module. A copy of this spreadsheet is available on Minerva. It is very important to keep to the plan and make sure that you reach each milestone of the compiler project at the specified date. Each milestone represents the completion of one phase of the compiler. Except for the final project submission deadline, there are three major milestones, corresponding to the lexical, syntactic, and semantic analysis phases respectively.

Submissions and Version Control

You must create a GitLab repository for your project and give me full access to it. My GitLab username is drsalka. You must regularly push commits to the repository particularly at project milestones.

At each milestone date, you must submit your current compiler code base (without executables) to Minerva with a brief description (200 - 300 words) of what you have achieved so far. It does not matter if your code at a certain milestone still contains some bugs or errors, as long as the main functionality required at this stage has been achieved. 10 marks have been assigned for achieving the milestones in due time, as detailed in the marking scheme below.

You may not be able to work ahead of the plan, mainly because we will be explaining the techniques and the implementation guidelines as we proceed with the teaching in lectures and tutorials.

Before submitting your compiler, you must make sure that you can compile your compiler on our school’s Linux machines (e.g. DEC-10 lab machines). You must also thoroughly test your compiler on these machines. Submissions that do not compile and run on our Linux machines will score a zero mark, even if they work perfectly well on another platform. So, if you develop your compiler on a Windows or Mac PC, you must not submit your project until you have made sure that it does compile and run on our Linux computers without any problems.

The final project code base should be committed to GitLab and submitted to Minerva. Before submitting to Minerva, clean your project directory of any object or executable files, compress it using ZIP, and upload to Minerva.

You are also required to write and submit a brief report (5-10 pages) at the end of the project. A template for the report will be provided in due course.

Marking Scheme

  • The lexical analysis phase (the lexer) works correctly
  • The syntactic analysis phase (the parser) works correctly
  • The semantic analysis phase (including the symbol table) works correctly
  • The code generation phase works correctly
  • The compiler code is well structured, efficient, clear, and properly commented Adherence to project milestones (the plan):
    • First milestone, the lexer
    • Second milestone, the parser
    • Third milestone, the symbol table
  • Final project report