Sorting is of enormous importance in the practice of computing. Rare is the application that does not include sorting in its implementation. Everything from word processing (spell checking, indexing) and image processing (color histograms), to operating systems (page tables) and computer graphics (depth culling); indeed, the greater challenge is coming up with real-world computing problem that do not require sorting in an efficient solution. Even non-computing applications rely on sorted data; imagine a business with unsorted filing cabinets. If you can’t, that’s because that business probably wouldn’t operate for very long.
Owing to its ubiquity, sorting is among the most studied and well-understood problems in all of computing. Despite this extensive analysis, we cannot definitively claim to have found a “best sort”! Quick sort is usually the go-to algorithm, but it is unstableand sometimes we want stabilityand it has highly undesirable O(n^2) worst case behavior, which, while easy to avoid in the degenerate case (through randomization), is impossible to avoid in all cases. Merge sort is the defacto second choice. It avoids Quick sort’s problems, being Θ(n lg n) (best- and worst-case asymptotic running times are proportional to n lg n), and it is stable, but it is not in-place (a big problem when either resources are highly limited or data is very large), and it has larger constants, such that it will lose to quick sort in the average case. Adding to this confusion is that both of these sorts tend to lose to well written O(n^2) sorts like insertion sort and selection sort on small data sets! This is because with small n the constants that big-O brushes aside tend to dominate.
The problem of writing a general sorting algorithm is further complicated by the fact that precisely how to compare two items cannot be known by the library developer. Java gets around the latter problem with the Comparable and Comparator interfaces. All other popular (and reasonable) languages have ways of dealing with this, too. And Java uses a hybrid approach in its choice of algorithm. With certain datatypes it uses a modified quick sort, and with others a variation of merge sort.
You are an API developer at Oracle in charge of choosing the sorting algorithm(s) for Java 10. Your analysis has already eliminated many sorts from the competition. You’re left with quick sort, merge sort, and selection sort. You decide to build a front-end to these algorithms which compares their performance head-to-head. The current testing is concerned only with sorting text, but even text can be sorted in many different ways: Is is case sensitive or insensitive? Do we want lexicographical order? Alphabetical? Something else entirely? Does it include characters from different alphabets? You decide to pass a configuration file listing character order to your program. You build a lookup table from the configuration alphabet. Your sorts will use a Comparitor to sort according to the order in the configuration. Finding the relative positions of alphabet characters within your comparitor will require binary search, which you must also implement.
You will be implementing three sorts - Quick Sort, Merge Sort, and Selection Sort and Binary Search, as well as a framework for testing and timing them, using the classes and methods as described below. You will time their performance using the supplied timer class and compare their actual performance over a range of input sizes compared with their expected performance as predicted by big-O analysis.
All classes, interfaces, and methods appearing in the template are required and described therein.
main() takes two file names as arguments.
The first file, the configuration file, contains a list of characters, one character per line, defining sorted order. There are no invalid characters, except that newlines are used as the seperator and thus cannot be a character in the sorted order.
The second file, the wordlist, is a list of words to be sorted according to the order defined in the configuration file. The words are newline seperated and may contain only characters that appear in the configuration file. In particular, note that whitespace characters, like space and tab, are valid letters in our alphabets and do not delimit words!
main() will read both files, process the configuration file as needed, then proceed to sort the word list with each of the three sorts. Each list may be sorted multiple times, until such time as each sort algorithm has sorted at least 1 million words (class SorterWithStatistics will keep track of the total words sorted for you).
It is difficult to accurately time very short activities on a computer, because the operating system often does other things to the detriment of your performance analysis. Over long periods of time, these other activities get averaged out, but short programs that happen to run when OS activity spikes do not get the benefit of that implicit filter. Sorting inputs multiple times will reduce the noise in this processes by allowing you to measure an average over many runs.
After all three sorts have finished, report the following statistics to the terminal for each sort:
- Length of the word list
- Total number of words sorted
- Total time spent sorting
- Average time required to sort the word list
- Words sorted per second
We will supply word lists of lengths 10, 100, 1000, 10000, 100000, and 1000000. You will run your sorts on all lengths and analyze their performance over these varied ns. Write a discussion analyzing the measured performance compared with expected performance as per big-O. Your discussion should be no longer than 500 words and in plain text format.
You are required to include, in your submission, the source code for each of the classes in the code template, as well as any additional classes or methods you may have written to complete the assignment (you shouldn’t need any). You need to write proper documentation with JavaDoc for each method in each class. Write your class so that its package name. Your source files (.java files) will be placed in the directory, as defined in the template code. Be sure to put down your name after the @author tag in each class source file. Your zip file should be named Firstname_Lastname_HW2.zip. Your discussion should be in a file named analysis.txt and stored in the edu directory. You may submit a draft version of your code early to see if you have any submission problem with Blackboard Learn. We will grade only your latest submission.