Java代写:CSCI2110 Huffman Coding

使用Binary Tree代写Huffman Coding算法。

Requirement

The objective of this assignment is to implement the Huffman coding algorithm using the binary tree data structure.

Download BinaryTree.java, Frequency.txt and Pokemon.txt files given next to the Assignment link.

Problem Summary

You are given a table of letters of the English alphabet and their frequencies. Build a Huffman tree with the alphabet symbols and their probabilities. Derive the Huffman codes. Using the codes, encode a given text file with the codes. Decode the encoded text file and show that it is the same as the input text file.

Problem in Detail

In order to help you with the assignment, here’s the Huffman algorithm step-by-step procedure (as discussed in the lectures).

Step 1

Read the text file frequency.txt. Its link is given next to this lab document. It contains the frequency of letters in the English alphabet based on a sample of 40,000 words as shown below. (The file actually contains each letter and its frequency on two separate lines).

To do this step, you will find it useful to create a class called Pair.java that defines the letter and its probability as an object.

1
2
3
4
5
6
7
8
public class Pair {
private char letter;
private double prob;

//constructor
//get and set methods
//toString method
}

You can create an Arraylist of Pair objects

1
ArrayList<Pair> freqs = new ArrayList<Pair>();

and store the items into the Arraylist as you read them. Of course, you will need other variables and methods to count the frequencies and convert them into probabilities.

Step 2

Using this set of letters and frequencies, build the Huffman tree.

Step 3.1

Create a queue of Binary Tree nodes. Each Binary Tree node is of type Pair. The queue can be implemented as another simple Arraylist, where enqueue means adding an item to the end of the Arraylist and dequeue means removing the item at index 0. That is, the queue is an Arraylist of type <BinaryTree<Pair>>. The queue contains these sorted according to the increasing order of their frequencies. This is your Queue S. This is done by checking the Arraylist freqs for values in increasing order, creating the binary tree nodes and enqueueing them in the queue.

If you enumerate the Queue S, it should have the Pair objects in increasing order of their frequencies, something like this:

('Z', 0.07) ('J', 0.10), etc.

Step 3.2

Now initialize another queue T (another Arraylist) of type <BinaryTree<Pair>>.

Step 3.3

Build the Huffman tree according to the algorithm discussed in the lectures.

For instance, in the above example, first (‘Z’, 0.07) and (‘J’, 0.10), will be dequeued from S. Create a node with the combined frequency. What do you put as the character for the combined node? You can put a dummy character, say ‘&’. So (‘&’,0.17) will be the parent node, and (‘Z’, 0.07) and (‘J’, 0.10), will be the left and right children. This tree will be enqueued to Queue T.

You keep repeating the above procedure and building the Huffman tree according to the algorithm given below:

Pick the two smallest weight trees, say A and B, from S and T, as follows:

  • a) If T is empty, A and B are respectively the front and next to front entries of S. Dequeue them from S.
  • b) If T is not empty,
    • i) Find the smaller weight tree of the trees in front of S and in front of T. This is A. Dequeue it.
    • ii) Find the smaller weight tree of the trees in front of S and in front of T. This is B. Dequeue it.
  • Construct a new tree P by creating a root and attaching A and B as the subtrees of this root. The weight of the root is the combined weights of the roots of A and B.
  • Enqueue P to T.
  • Repeat steps 2 to 4 until S is empty.
  • After step 5, if T’s size is > 1, dequeue two nodes at a time, combine them and enqueue the combined tree until T’s size is 1. The last node remaining in the queue T will be the final Huffman tree.

Step 4: Derive the Huffman codes.

The following methods can be used for finding the encoding. They use a String array of 26, one spot for each letter.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public static void findEncoding(BinaryTree<Pair> t, String[] a, String prefix)
{

if (t.getLeft()==null && t.getRight()==null)
{
a[((byte)(t.getData().getValue()))-65]= prefix;
}
else
{
findEncoding(t.getLeft(), a, prefix+"0");
findEncoding(t.getRight(), a, prefix+"1");
}
}

public static String[] findEncoding(BinaryTree<Pair> t) {
String[] result = new String[26];
findEncoding(t, result, "");
return result;
}

Step 5

Read the sample text file Pokemon.txt which is shown below

POKEMON TOWER DEFENSE
YOUR MISSION IN THIS FUN STRATEGY TOWER DEFENSE GAME IS TO
HELP PROFESSOR OAK TO STOP ATTACKS OF WILD RATTATA
SET OUT ON YOUR OWN POKEMON JOURNEY TO CATCH AND TRAIN ALL
POKEMON AND TRY TO SOLVE THE MYSTERY BEHIND THESE ATTACKS
YOU MUST PLACE POKEMON CHARACTERS STRATEGICALLY ON THE
BATTLEFIELD SO THAT THEY STOP ALL WAVES OF ENEMY ATTACKER
DURING THE BATTLE YOU WILL LEVEL UP AND EVOLVE YOUR POKEMON
YOU CAN ALSO CAPTURE OTHER POKEMON DURING THE BATTLE AND ADD
THEM TO YOUR TEAM
USE YOUR MOUSE TO PLAY THE GAME
GOOD LUCK

Encode each letter using the Huffman codes that you have determined. Do not encode spaces and newline characters. Leave them as they are. Write the encoded file into another text file, Encoded.txt.

Step 6

Read the encoded text file and decode it. Write the decoded file into yet another text file, Decoded.txt. If you have done everything correctly, then Decoded.txt must be the same as Pokemon.txt.

Submit a zip file containing all the source codes (.java files), Frequency.txt, Pokemon.txt, Encoded.txt and Decoded.txt.