Java代写:CS214 Linked List

练习Linked List的用法,代写一个Markov chain的应用。

Overview

The purpose of this project is to get you to use linked lists, generics, and JUnit in building a real program.
A common use of computer science in literature is to analyze the way authors use words. This technique has been used to determine the authorship of works of literature as well as to examine how some authors change in writing style over time. In this program, you will create a histogram that shows the frequency with which words appear in a certain context. For fun, we will use this data to have the program write out gibberish that is in the same style as the text analyzed. While writing gibberish may seem to be a poor use of computing cycles, the process used to give the gibberish a sense of style is known as a “Markov chain”. Markov chains are used in artificial intelligence and other fields as a way of modelling behavior and processes.

In this process, your program will look at every word or collection of words in the writing, and the program will count which words tend to follow that word or collection of words. For example, consider the opening to the famous solliloquy of Hamlet:

To be, or not to be: that is the question. 
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing end them?

If we ignore punctuation and capitalization, then the word “to” occurs 4 times. Twice it is followed by the word “be”, and once each by “suffer” and take”. So, everytime the author writes a “to”, 50% of the time it is followed by “be”, 25% of the time it is followed by “suffer”, and 25% of the time it is followed by “take”. For terminology, we will call “to” a context of one word and “be”, “suffer” and “take” the following words of the context.

Using this anaylsis, we can write gibberish. If the program decides to write the word “to”, then it chooses a random number so that half of the time the next word will be “be” and half of the time it will be either “take” or “suffer”. Ignoring cases and punctuation, we can end up with something like this:

to suffer the question whether 'tis nobler in the mind
to be or to take arms against a sea of outrageous fortune
or not to be that is the slings and arrows of troubles
and by opposing end them

We can make the style closer match the author by examining pairs of words. For example, the pair “to be” only occurs twice and is followed by either “or” or “that”. In this case, if the computer generates the words “to be”, half the time the next word should be “or” and half the time it should be “that”. Here, we consider “to be” a context of two words and “or” and “that” the following words of the context.

Code Readability

To receive the full readability marks, your code must follow the following guideline:

  • All variables (fields, parameters, local variables) must be given appropriate and descriptive names.
  • All variable and method names must start with a lowercase letter. All class names must start with an uppercase letter.
  • The class body should be organized so that all the fields are at the top of the file, the constructors are next, the non-static methods next, and the static methods at the bottom with the main method last.
  • There should not be two statements on the same line.
  • All code must be properly indented (see page 689 of the Lewis book for an example of good style). The amount of indentation is up to you, but it should be at least 2 spaces, and it must be used consistently throughout the code.
  • You must be consistent in your use of {, }. The closing } must be on its own line and indented the same amount as the line containing the opening {.
  • There must be an empty line between each method.
  • There must be a space separating each operator from its operands as well as a space after each comma.
  • There must be a comment at the top of the file that is in proper JavaDoc format and includes both your name and a description of what the class represents. The comment should include tags for the author.
  • There must be a comment directly above each method (including constructors) that is in proper JavaDoc format and states what task the method is doing, not how it is doing it. The comment should include tags for any parameters, return values and exceptions, and the tags should include appropriate comments that indicate the purpose of the inputs, the value returned, and the meaning of the exceptions.
  • There must be a comment directly above each field that, in one line, states what the field is storing.
  • There must be a comment either above or to the right of each non-field variable indicating what the variable is storing. Any comments placed to the right should be aligned so they start on the same column.
  • There must be a comment above each loop that indicates the purpose of the loop. Ideally, the comment would consist of any preconditions (if they exist) and the subgoal for the loop iteration.
  • Any code that is complicated should have a short comment either above it or aligned to the right that explains the logic of the code.

Program Testing

You are to create two JUnit test class. All your tests should be coded as unit tests in these classes. The first required class is LinkedListTester.java that tests the new routines you added to the linked list. The other test class is GibberishWriterTester.java and is to test the rest of your program. Your report should be a short document that explains why the JUnit tests you chose to do thoroughly test each method.

Java Programming

Remember that in the homework, you will be writing JUnit test cases. A strong recommendation is to write your JUnit test cases at the same time as complete each part of the program.
You are welcome to add additional methods to help you with the program, but you program should contain the following classes and methods. In fact, breaking up the GibberishWriter constructor and next methods using helper methods will probably make the coding easier.

  1. Create an interface LLIterator that extends the Iterator interface. The LLIterator should containt the following methods in addition to those of Iterator:
    • addBefore takes an element as input and adds the element to the list being iterated so that the element occurs before the element that was just returned by the most recent call to next. If the list is empty or if next has not been called, this method should throw a NoSuchElementException.
    • addAfter takes an element as input and adds the element to the list being iterated so that the element occurs immediately after the element that was just returned by the most recent call to next. If the list is empty or if next has not been called, this method should place the element at the beginning of the list.
  2. Update the LinkedList class that you created in lab/lecture so that:
    • The LinkedList class implements the Iterable interface and the Iterator returned by iterator is a LLIterator with all methods except remove properly implemented.
    • LinkedList should contain the method toArrayList() that returns an ArrayList that contains the same elements of the linked list and in the same order.
  3. Create a class WordData that should be an inner class of GibberishWriter (either static or non-static). The purpose of the WordData class is to keep track of how many times a particular word occurs after a given context. The class should contain a String word and a count and the class should have the methods:
    • A constructor that takes a single String word as input and initializes the WordData setting the initial count to 1.
    • A method incrementCount takes no input, returns nothing, and increases the count by one
    • A method getWord takes no input and returns the stored word
    • A method getCount takes no input returns the stored count
  4. Create a class Context that implements the Comparable interface. The class should be an inner class of GibberishWriter and may be static or non-static. A context is a sequence of 1 or more words. You should store the sequence as an array of String. The class should contain the following methods:
    • A constructor that takes an array of String as input and creates a new Context from that array. Make sure to copy the array instead of simply assigning it to the field.
    • A method length that takes not input returns the number of words of the context (i.e. the length of the array field)
    • A method toString that takes no input and returns a String representing the context. The string should be the concatenation of the words of the context with each pair separated by a single space.
    • A method getWord that takes an int as input and the String at the given index of the context.
    • A method equals that overrides the Object equals method. Two Context objects are equal if the have the same length and contain the same Strings in the same order. (Do not use the toString method or anything else that creates extra objects. Instead use a loop that compares each string.)
    • The method compareTo required by the Comparable interface. Two Contexts should be ordered alphabetically by their Strings. Do not use the toString method or anything else that creates extra objects. Instead use a loop that compares each string. (Hint: String is comparable so you can use that to test each String in the arrays.)
  5. Create a class ContextData that implements the Comparable interface. The class shouldbe an inner class of GibberishWriter and may be static or non-static. Context data consists of a context, the number of times that context appears in the sample text where it is not the end of the text, and a list of all words that immediately follow this context plus, for each such word, the number of times that the word appears after this context. To implement this behavior, the class should contain three fields: a Context, a int that indicates the number of occurrences, and a LinkedList that stores WordData. The class should have the following methods:
    • A constructor that takes a Context as input and initializes the ContextData with this Context, sets the number of occurrences to 0, and creates an initially empty LinkedList.
    • A method getContext that takes no input and returns the stored Context.
    • A method numOccurrences that takes no input returns the number of occurrences of the context.
    • The method compareTo required by the Comparable interface. Two ContextData objects should be ordered based on their Contexts.
    • A method addFollowingWord that takes a String word as input. Use the iterator you wrote above for the LinkedList to see if word is already in the linked list. We assume that the LinkedList of the ContextData is already sorted according to the words stored in the WordData and you should stop the iterator at the moment you pass the spot where word should be in the list. If a WordData for the word is found, call that WordData’s incrementCount method. Otherwise, create a new WordData object containing word and to add it to the LinkedList in the appropriate location so that the list stays sorted. If you stopped the iterator at the appropriate spot when searching for the word, then one of the two methods of LLIterator above will place the WordData in the appropriate spot. (Note that if this method is correct, the list of WordData should be stored in order and the number of occurrences of the context should equal the sum of all the counts of the WordData objects in the list.)
    • A method getFollowingWord) that takes an int value as input and returns a String. The method returns one of the words of the linked list depending on the value. For example, if the linked lists contains the following WordData word/count pairs: (“be”, 2), (“suffer”, 1), (“take”, 1), (“the”, 3) in that order, then if value is 1 or 2, “be” is returned. If value is 3, “suffer” is returned. If value is 4, “take” is returned. If value is 5, 6, or 7, “the” is returned, and if value is larger than 7 or smaller than 1, an appropriate exception is thrown.
  6. Create a class GibberishWriter that contains an ArrayList storing ContextData and an int storing the context size. (The class will need a few additional fields to implement the methods below.) Your class should have the following:
    • A constructor that takes an int contextSize that sets the context size to be the input value and does any other processing needed to initialize the instance.
    • A method getContextSize that takes no input and returns the context size for this gibberish writer.
    • A method getContextData that takes an int index as input and returns the ContextData value stored in the ArrayList field at the given index value.
    • A static method addContextData takes a Context and a LinkedList storing ContextData as input and returns a ContextData. We assume the LinkedList is sorted. Use the iterator for the linked list to see if the list contains a ContextData instance for the input Context value. If so, return it. If not, stop the iterator at the point where you realize the value is not in the list, create a new ContextData instance for the Context, use the appropriate method from the LLIterator to place the ContextData into the list maintaining the sorted order, and return the new ContextData.
    • A method addDataFile that takes a String fileName as input and does the following (you may want to create helper methods).
      • a) Create a LinkedList that stores the ContextData of the ArrayList, in the same order.
      • Open the file with the name fileName and prepare to read the contents of the file word by word. (We define a word to be a consecutive sequence of non-whitespace characters.) The Scanner class of the API is very useful for this. If you do not want to use Scanner, you can use the BufferedFileReader as you did in one of the labs.
      • Create a Context that we will call the current context from the first context size words of the file. (If using Scanner, the next method will get the next word.)
      • Keep reading words from the file until there are no more words. With each word read, call the addContextData method with the current context and the LinkedList as inputs to get a ContextData for the current context. Add the just read word to the ContextData as a following word. Finally, adjust the current context by shifting the words of the current context down one index (thus dropping the first word of the context) and making the just read word the last work of the current context. (You may want to maintain an array of String in your method to help create and adjust the contexts.)
      • When all the words have been read, call the toArrayList method of the LinkedList and save the returned list to the ArrayList field.
    • Have the GibberishWriter class implement the Iterator interface. The class will need to maintain a last context for the iterator.
      • The hasNext method should return true as long as there is an ArrayList containing ContextData values.
      • The next method should look up the last context from the ArrayList containing ContextData. Since the ContextData should be in sorted order, use binary search. You can write this yourself or you can use Collectons.binarySearch. (If this is the first time next is called, you will not have a last context so choose one at random from the ArrayList.) Choose a random word to follow that context by getting the number of occurrences of the context, and choosing a random number between 1 and that value. Then call the getFollowingWord method to get the appropriate word that can follow the current context. That word will be returned by the method, but before it does that, it changes the current context by dropping the first word, shifting each of the remaining words over by 1 and adding the new word as the last word of the current context.
  7. A main method. The main method should take three command line arguments. A file name for the input data, a integer to represent the context size and an integer to represent the number of words in the output. The method should create a GibberishWriter instance with the appropriate context size, call the addDataFile method on the file name (and print an appropriate message if there is a problem reading from the file). Then the method should have a loop that uses the GibberishWriter iterator to output the desired number of gibberish words. On any errors, the method should not throw an exception but should print an appropriate error message.

If you get this working, have some fun. Find text from your favorite source with a distinctive voice: Shakespeare, the Bible, Dr. Seuss. Get a large enough sample, input it into the GibberishWriter and see what you get out.

Extra Credit

There are two tasks for the extra credit. First, change the definition of a word to either be a sequence of letters, a sequence of digits, or a sequence of non-letter/non-digit/non-whitespace characters. Then, create a method iterationStart for GibberishWriter that takes a String as input and the input is how we want to start the GibberishWriter iterator. You should change the next and hasNext methods of GibberishWriter so that the iterator first outputs the words of the initial string input by iterationStart, and once those words are output, it uses the last context size words of the input string to form the last context for the rest of the iterator. If this context is not in the ArrayList, the iteration should stop. Finally, the main method should be adjusted so that any input after the first three is used to form the initial string of the iterator.