Java代写:CSE143 HTML Manager

这次需要代写一个HTML的解析器,主要考察Data Structure中Stack和Queue的应用。

Requirement

You will need the support files HTMLTag.java, HTMLTagType.java, HTMLParser.java, and HTML Main.java from the resources button on the course website; place these in the same folder as your program or project. If you are using EZclipse, the files will be automatically downloaded for you. You should not modify the provided files. The code you submit must work properly with the unmodified versions.

Although this assignment relates to websites and HTML, you will not need to know how to write HTML to complete the assignment.

What is HTML?

Web pages are written in a language called “Hypertext Markup Language”, or HTML. An HTML file consists of text surrounded by markings called tags. Tags give information to the text, such as formatting (bold, italic, etc.) or layout (paragraph, table, list). Some tags specify comments or information about the document (header, title, document type).

A tag consists of a named element between less-than < and greater-than > symbols. For example, the tag for making text bold uses the element b and is written as <b>.

Many tags apply to a range of text, in which case a pair of tags is used:

  • An opening tag (indicating the start of the range), which is written as: <name>
  • A closing tag (indicating the end of the range), which is written as: </name>

So to make some text bold on a page, one would surround the text with opening and closing b tags:

<b>like this</b>

Tags can be nested to combine effects. For example:

<b><i>bold italic</i></b>

Some tags, such as the br tag (which inserts a line break) or the img (which inserts an image), do not cover a range of text and are considered to be “self-closing.” Self-closing tags do not need a closing tag; for a line break, only <br> is needed. Some web developers write self-closing tags with an optional / before the >, such as <br />.

Some tags can have attributes. For example, the tag <img src="cat.jpg"> specifies an image from the file cat.jpg. The element is img, and the rest of the text such as src are attributes. In this assignment, you will not have to worry about attributes. The parser will store them for you, and you can just ignore them entirely.

One problem on the web is that many developers make mistakes in their HTML code. All tags that cover a range must eventually be closed, but some developers forget to close their tags. Also, whenever a tag is nested inside another tag, <b><i>like this</i></b>, the inner tag (i for italic, here) must be closed before the outer tag is closed. So the following tags are not valid HTML, because the </i> should appear first: <b><i>this is invalid</b></i>

In this assignment you will write a class that stores the contents of an HTML page and is able to fix any invalid HTML. Your HTMLManager will use stacks and queues to figure out whether the tags match and fix any mistakes it finds. Instructor-provided code will read HTML pages from files and break them apart into tags for you; it’s your job to store the tags and fix the tags if there is a mismatch.

Implementation Details

Write one class called HTMLManager.java that can handle pages with all types of HTMLTags (including self-closing tags). Your class must have the constructors/methods listed in the next section.
When using the Stack and Queue classes, you may only use the methods discussed in lecture, namely: add/push, remove/pop, peek, isEmpty, size and toString.

public HTMLManager(Queue page)

The constructor takes in a Queue of HTMLTags that make up an HTML page. If the queue passed is null, the constructor throws an IllegalArgumentException. An empty queue (size 0) is allowed. You should store a Queue as a field to manage your tags, however, the constructor should not directly store a reference to the given Queue to avoid exposing internal state to the client.

public void add(HTMLTag tag)

Appends the given HTMLTag to the end of the queue of HTMLTags being managed. If the HTMLTag passed in is null, the method should throw an IllegalArgumentException.

public void removeAll(HTMLTag tag)

Removes all occurrences of the given HTMLTag from the queue of HTMLTags being managed. If the HTMLTag passed in is null, the method should throw an IllegalArgumentException.

public List getTags()

Returns a list of HTMLTags being managed. To avoid exposing internal state to your client, make sure to make a copy rather than just returning a reference to any internal field.

public void fixHTML()

This method will try to fix the page’s HTML if there are any missing or extra tags. The algorithm that you will use is a simplified version of what Chrome, Firefox, and other browsers do when they try to display a broken webpage. This method should build up a correct version of the HTMLTags being managed, and update the instance to store the corrected version of the HTML when it is finished. To fix the HTML you will analyze the tags stored in the HTMLManager using a Stack.

The basic idea of the algorithm is to process the page tag by tag. For each tag, you will check to see if it has a matching tag later in the page in the correct place. Since self-closing tags don’t have to match anything, whenever you see one, you can simply add it directly to the correct version of the tags. For opening tags, we assume that the writer of the HTML page intended to actually include the tag; so, like with the self- closing tag, we add it to the result. However, we need to keep track of if we have found its match; so, it should also be added to a Stack. If we find a closing tag, we must figure out if it is in the right place or not. In particular:

  • If the opening tag at the top of the stack matches the closing tag we found, then it matches, and you should update the state according. (Hint: You probably want to edit the stack and the result.)

  • If the opening tag at the top of the stack does not match the closing tag we found, then the writer of the HTML page made a mistake. To fix the mistake, you should add a new closing tag that matches the opening one at the top of the stack (so that it remains balanced). You should keep on adding closing tags to your result as long as the opening tag on top of the stack does not match. If you close all unclosed open tags on the stack, then the closing tag we found doesn’t match any open tags and should be discarded.

  • If, at any point, you find a closing tag that has no matching open tag, just discard it.

Your Test Case (HTMLManagerTest.java)

In addition to HTMLManager.java, you will create tests to verify your class’s removeAll method. Create a file HTMLManagerTest.java to thoroughly test the removeAll method (a starter file is available on the website). Your tests should call removeAll at least three times, removing three different elements. You should compare the tags stored in your HTMLManager after each call with the expected Queue of tags.
Your testing program must produce at least three lines of output indicating whether each of your tests passed. You will be evaluated partly on how complete your tests are (whether they try every possible input and state configuration).

Development Strategy

The best way to write code is in stages. If you attempt to write everything at once, it will be significantly more difficult to debug, because any bugs are likely not isolated to a single place. For this assignment we will provide you with a development strategy. We will also provide you with some correct output that you can check using the Output Comparison Tool on the website.

We suggest that you develop the program in the follow four stages:

  1. In this stage, we want to decide what field(s) belong in the HTMLManager class. These field(s) should be initialized in the constructor.
  2. In this stage, we want to make sure that the constructor is working by printing out the page we constructed. The best way to do this is to write getTags() and print out the returned List.
  3. In this stage, we want to implement the method that edit the page we currently have. You should implement add(HTMLTag tag) and removeAll(HTMLTag tag).
  4. Finally, we want to write the difficult method: fixHTML(). You can test that things are working by using the Output Comparison Tool on the course website.