In this programming assignment you will use Thrust, a template header library (like the C++ STL), to create one program that will encrypt a text using the Vigenère cipher and another that will crack a text that has been encrypted with the Vigenère cipher.
Thrust provides a number of parallel primitives such as scans, reductions, compactions, sorts, etc. that have high performance implementations for GPUs using CUDA and CPUs using OpenMP. These primitives can be chained together to do some very interesting things (quickly)!
There is a general introduction at https://github.com/thrust/thrust/wiki/Quick-Start-Guide. A more detailed list of the documentation relevant for the assignment is given at the end of the handout.
You might be familiar with the Caesar cipher. The idea behind this cipher is to use a constant shift (with modulo arithmetic), as suggested in Table 1. To solve this kind of cipher, we can use letter frequencies; by looking at the most frequent letter in the cipher text, we can find the mapping of the letter ‘e’, and then use this mapping to deduce the shift amount. Once we have the shift, it is easy to compute the plain text from the cipher text.
|Orginal Alphabet||Cipher Alphabet|
In a poly-alphabetic cipher the shift is not constant for the entire text, it changes depending on the position of the characters within the text (see Figure below). A key is chosen which determines the shift of each letter (note that if the key has length one, it reduces to a Caesar cipher). To be practical, the key should be shorter than the message (usually much shorter). It is then repeated for the length of the message. The length of the key is called the period of the cipher. For a long time the key was chosen to be a word or phrase, but this was eventually found to be a weakness that could be exploited. It is best to choose the shifts at random. Although the shifts can be represented by numbers 0–25, the convention is to represent it by the letter in the alphabet at that position.
This new cipher, called Vigenère cipher, defeats straightforward frequency analysis because each character in the plain text can become different characters in the cipher text. In the example below I → V and I → B. Conversely, in the cipher text X appears twice and corresponds to K the first time and E the second time. How can we break this new cipher?
Plain text: ILIKEMYTEACHER KEY: NOTNOTNOTNOTNO ============================= Cipher text: VZBXSFLHXNQARF
In this part, you should modify create_cipher.cu and fill in all the places marked with TODO. To do so, you will mainly write calls to thrust functions and fill in functor bodies. make create_cipher will only make for this part.
The create cipher program takes two arguments from the command line: the name of a text file and the period that the cipher will use to encrypt this text. It will generate a random key of the specified period, encrypt the plaintext with the Vigenère cipher and output the encrypted text to a file cipher text.txt.
Here are the steps you will need to complete:
- Sanitize the input text to contain only lower case ASCII letters. Uppercase letters must be converted to lowercase and all other symbols must be removed.
- Compute the frequency of each letter in the clean text.
- Apply the cipher.
- Compute the frequency of each letter in the cipher text.
You will mainly write calls to thrust functions and fill in functor bodies. You should modify create_cipher.cu and fill in all the places marked with TODO.
(10 points) Implement the functor isnot_lowercase_alpha, which returns true if and only if the character is not a lower case alphabetic character.
(10 points) Implement the functor upper_to_lower, which converts an uppercase character to a lowercase one.
(10 points) Implement the functor apply_shift, which shifts the input character by the shift amount. The way this functor works is as follows:
- The constructor will take two arguments: a pointer to the beginning of the array containing the shift amounts, and the period (i.e., the length of the shift amounts array).
- The parentheses operator will take as argument a char (the input character to be shifted) and an integer (the position of this char in the input array). You may assume that the input char is in the range a–z.
(20 points) Implement getLetterFrequencyGpu, which calculates the letter frequency in the plain text and cipher text, prints the top 5 letters along with their frequencies (out of 1.0, not as a percentage), and returns an std::vector with the frequency values.
Note: make sure that the function can handle the case when there are less than 5 distinct letters in the text, in which case you should print out however many letters there happen to be.
The output should look similar to this:
Before ciphering! Top 5 Letter Frequencies ------------- ? 0.XXXXXXX ... After ciphering! Top 5 Letter Frequencies ------------- ? 0.XXXXXXX ...
(15 points) Implement the main function. You only need to fill in the parts marked TODO. For this question, please use a seed of 123 for the random number generator. To test your code, you can use mobydick.txt.