Java代写:CS201 Hash Table, Hash Function Comparison and Optional BST in C

代写数据结构中的Hash Table,通过不同的实现方式,对Hash碰撞进行分析。

Overview

For this assignment, you will implement a hash table and experiment using various hashing methods to try and minimize collisions. To simplify implementation, you will be using separate chaining to resolve collisions. Additionally, you will generate statistics on the performance of your hash table, which will be written to a file.

Part 1 - Hash Table

In this part, you will implement a basic Hash Table. We have provided you with an interface and starter code that implements that interface. Fill in the blanks to complete given methods.

Implementation Details

You will be implementing the IHashTable interface, as additional required private helper methods. See method descriptions for details. The benefits of a hash table are a result of the ability to hash a value and then immediately locate its position if there are no collisions, or vastly narrow down the search space if there are. As such, you shouldn’t need to search through the entire table to find an element for your contains() and delete() methods.

Your hash table will be used exclusively to store strings, as this is the type of hash table that would be used to store a dictionary of words for uses such as spell check.

To resolve collisions, use will be using separate chaining. One way of implementing this would be to have your backing storage be an array of LinkedLists. Then you would simply add to the LinkedList at the specified index when inserting elements into your hash table. The performance of hash tables degrades if there are too many collisions. As such, you will keep track of the load factor of the table (the ratio of the size of the table to the number of elements) and once this value exceeds , you must double the size of the table and rehash all of the values. You will be implementing a private helper method rehash() to accomplish this.

You will be implementing three different hash functions. You must implement a rolling hash using Horner’s Method, as well as two other hash functions of your choosing. A list of possible hash functions are provided in the starter files, but feel free to use other hash functions as well. Remember, at the minimum your hash functions must be deterministic, although your goal is for them to be uniform and fast. You will later compare the performance of the hash functions and record your results in HW5.txt.

You must actually implement the algorithms in your HashTable class. If you use String.hashCode(), you not receive any credit for this assignment.

Hash Table Methods

public boolean insert(String value)

Insert the string value into the hash table, returning true if the value was inserted and false if the value was already present.
Throws:
NullPointerException if value is null

public boolean delete(String value)

Delete the given value from the hash table, returning true if the value was deleted and false if the value could not be found.
Throws:
NullPointerException if value is null

public boolean contains(String value)

Check if the given value is present in the hash table, returning true if it is and false otherwise.
Throws:
NullPointerException if value is null

public void printTable()

Print the contents of the hash table, printing nothing if the table is empty. See appendix for example output.

public int getSize()

Accessor for the number of elements currently stored in the hash table.

private void rehash()

Expand the size of the hash table and rehash the elements. Called when the load factor passes a threshold.

Keeping Statistics

You should keep track and report the following information for the hash table

  • The number of times you had to expand the table.
  • The load factor in the table (before resizing), trimmed to 2 decimal places.
  • The number of insertions that encountered a collision (before resizing).
  • The length of the longest known collision chain (before resizing).

Note that all but the first of these statistics will need to be reinitialized (reset) whenever you expand the table.

Write it out to a text file in the following format r, c, n are integers, alpha is floating point number), each time you do a resize and rehash. Append a new set of values for each rehash.

r resizes, load factor alpha, c collisions, n longest chain

Notice the two constructors in HashTable class in the starter code provided. One takes in the initial size(any value > 0), while the other constructor takes in a filename as well as a size. The boolean ‘printStats’ is set to false by default, and is only changed to true in the constructor that receives a filename. Each time you expand and rehash your table, you must check if this boolean is set to true, and if yes, you must write the current statistics to the file before rehashing.

For example, after, say, 4 rehashes, the file will have the following lines:

  • 1 resizes, load factor 0.67, 1 collisions, 2 longest chain
  • 2 resizes, load factor 0.67, 7 collisions, 2 longest chain
  • 3 resizes, load factor 0.75, 55 collisions, 6 longest chain
  • 4 resizes, load factor 0.68, 221 collisions, 14 longest chain

There is no ‘correct statistics’ and your numbers will change depending on what hash function and initial table size you chose. These values will only indicate how good your chosen hash function is.

HashTable Tester

Your tester must test the insert(), contains(), delete() and getSize() methods thoroughly. Note that printTable() is only present for you to test your hash table during implementation, and as such does not need to be tested for an exact output string. However, this method must still be implemented. See the appendix for sample formatting.

You can test the 2-arg constructor by passing a local file path and making sure that the stats are printed to the file. However, when you submit your tester, make sure you remove/comment this test as your submitted file must not have any local file path.

Hash Analysis

Remember that a good hash function is fast, deterministic, and uniform. For the final part of this assignment, you will be evaluating your hash functions on these categories.

A good way to test the relative performance of your hashing methods is to build a dictionary with the provided files. To do this, you would read the provided longDictionary.txt or simpleDictionary.txt and treat each line read as a single word to insert into your table. Make sure that you specify a file to record your statistics to so that you have concrete data to reference with your explanations.

For each Hash function, describe the performance that it displayed. To assess whether or not it is uniform, consider the following questions:

  • How effective was it at preventing collisions?
  • When collisions occurred, did it have have many short chains or a few long ones?

Keep in mind that for a hash function to be useable, it must be deterministic. Mention briefly why each function satisfies this property.

Finally, describe the runtime performance for calculating the hash. A good way to do this would be to use the same dictionary file as before, but instead of building the complete table, just record the time it takes to hash the values. To do this, call the protected hash() function with each of the enum values.

Record your findings in HW5.txt

Extra Credit - BST in C

You will be implementing a Binary Search Tree in C. You have been provided with started code in BinaryTree.c. Input/Output for this assignment will be handled through the command line. The values inserted into your BST will be strings of max length 100. Your BST will have the ability to insert, search, and print the entire tree in order. See the appendix for specific help with C.

Function Descriptions

int compare(char a, char b)

Mimics the functionality of strcmp()
Returns:
0 if both strings are equal
Negative if the first non-matching character has a lower ascii value in a than b
Positive if the first non-matching character has a higher ascii value in a than b
DO NOT use strcmp() in the implementation of compare

void insert()

Adds a node with the input value to the tree. Smaller strings are inserted on the left and larger on the right. If the element is a duplicate, do nothing.

You will need to use malloc to allocate memory for each new node that is inserted. The parameter to malloc() is the size in bytes; just passing the number of elements is not correct.

void find()

Search the tree for a node containing the string the user inputs as its value. Prints “Found: “ node value value (using printNode()) if the node is present. Otherwise prints “Not found: “ followed by user input.

See sample output for examples

void traverse()

Traverse will output an inorder traversal of the values in the tree nodes. Use printNode to print the values of the nodes. If the tree is empty nothing will be printed.
See sample output for examples.

You may choose to implement the functionalities of the binary tree in any order but the following is recommended:

  1. compare():
    You will need a strong understanding of how strings are represented in C to complete this part of the assignment. Consult your notes/online resources. You cannot use strcmp() in your implementation, however you can check your output against strcmp().
  2. insert():
    You should be familiar with how nodes are inserted into a BST. The tricky thing about this portion is allocating memory for new nodes using malloc. Make sure you understand how malloc works before you attempt to implement insert!
  3. find():
    Again something you all should be familiar with. No tricks here.
  4. traverse():
    This is a simple in order traversal of your BST. This can be accomplished iteratively or recursively. You are free to use helper methods here.

Your output must match the Sample Output!

Important Information

Do not edit any code marked “DO NOT EDIT”
You may edit main for the purposes of testing but when you turn in your code it must match the starter code. You may use helper methods if you see fit, but be aware - in C methods must be declared before they are used.