Java代写:CSC460 Dynamic Hashing

代写数据库中的动态Hash的实现,为后续的数据库设计做准备。

Overview

In our first programming assignment, you created a binary file of uniformly- sized records. Querying the content of such files can be made more efficient by using an index.

In class we talked about Dynamic Hashing indices under the assumption of a binary (2-way) tree. If your situation permits - and ours does - the use of an n-way tree can produce a shallower in-memory tree index and thus faster searches. Further, we can create indices by reading the content of the field being indexed in reverse order. Doing so can further improve querying performance.

Assignment

Write a complete, well-documented Java program that creates a Dynamic Hash index for the binary file of taxi trips that your PartA.java program created, and uses it to satisfy a simple type of query. In particular, you are to index on the Trip Total field, using the digits in reversed (right to left) order, ignoring the decimal points and the dollar signs.

Recall that a dynamic hash structure consists of a bucket directory tree and a collection of buckets. In this assignment, the key field of the data consists of just numeric characters. Your directory tree should be structured with up to one level per digit in the key, and 10 tree pointers and 10 hash file “pointers” (really just byte offsets into the binary file of hash buckets) per node. The following picture should help:

At the beginning, your tree should consist of just the root node with pointers to 10 empty buckets in the hash bucket file. As buckets are filled based on the right-most digit of the key of the record being inserted, new leaf nodes will be added to the tree and buckets will need to be split. To add new buckets to the hash file, all you need to do is append the new buckets to the end of the file (and, optionally, reuse the old bucket, to save space). For this program, use hash buckets that can hold 1,000 index records each. I recommend that each bucket be based on a data structure that consists of (a) a count of how many of the 1,000 slots are filled, and (b) an array of slots. The hash file will be just a binary file of these bucket structures.

Once the dynamic hashing structures (the in-memory tree and the on-disk buck file) are created, the program is to process a simple variety of query using your dynamic hashing index. Prompt the user to enter a sequence of a Trip Total suffixes (e.g., a Trip Total of $6.70 has potential suffixes of 0, 70, 670, 0670, etc.). Your program should display the Trip ID and Trip Total fields associated with the given suffix and the total number of records that were printed. Sequentially scanning the binary file to find the matching records is not acceptable the querying must use your index. Display the Trip ID and Trip Total fields using the same output format you used for Program #1.

Data

The binary file to be indexed is to be the one your PartA.java program from Program #1 creates. If you need to make changes to PartA.java to create a correct binary file for this program to index, be sure to submit your updated PartA.java in addition to this program.

The queries will consist of suffixes as described above. Suffixes of no digits and of a ridiculously large quantity of digits should all be handled reasonably.

As for dreaming up queries for testing, that’s up to you. We’ll test with a variety of query strings, of course, to make sure that your code is operating correctly and robustly. Thus, so should you.

Output

Apart from the various prompts to the user, the only displayed output from your program should be the query results. After the list of records, display a count of the number of records that match each query: “# records matched your query.”

Hand In

You are required to submit your completed program file(s) using the turnin facility on lectura.
The submission folder is cs460p2. Instructions are available from the document of submission instructions linked to the class web page. Because we will be grading your program on lectura, it needs to run on lectura. This means that you need to test it on lectura. Name your main program source file Prog2.java so that we don’t have to guess which files to compile, but feel free to split up your code over additional files if you feel that doing so is appropriate. Submit each file as-is; that is, do not ‘package’ them into ZIP or TAR files.

Want to Learn More?

  • There aren’t many examples of our version of Dynamic Hashing available. I refer to this type of external hashing as “Dynamic Hashing” to distinguish it from Extendible Hashing, which is more well-known. But, really, Extendible Hashing is an example of the general concept of dynamic hashing, too.

Other Requirements and Hints:

  • It is possible that a bucket size of 1,000 records might prove to be insufficient, given that the data is far from uniformly distributed (many Trip Totals end in “5”, for example). If so, use Piazza to work with your classmates to determine an adequate (but not wastefully large!) bucket capacity, and use it.
  • Work on just a part of the program at a time; don’t try to code it all before you test any of it. Don’t be afraid to do things a little bit backwards; for example, it’s nice to have a crude query algorithm in place to help you test the construction of your index.
  • You can make debugging easier by using only a small amount of carefully selected data - and very small buckets - as you develop the code. Switch to the complete data file and full-size buckets when everything seems to be working.
  • Repeating one of my standard pieces of advice: Comment your code according to the style guidelines as you write the code (not in the last few hours before the due date and time!). Doing this makes writing documentation far less tedious.
  • As always: Start early! There’s plenty to do here.