## Learning Outcomes

In this assignment you will demonstrate your abilities functions, pointers, arrays, loops, and conditionals. The assignment covers the following components of the textbook: Chapters 1 to 7 inclusive, along with material from pages 230-235.

The assignment is intended to be very challenging and push your programming skills significantly further than Grok, as well as testing your algorithmic thinking. However, it is also intended to be fun, and a very close analogue to a real world task. The first time I coded the full version of the attack for the research paper took me a long time! You can anticipate that this assignment might also take quite a while, but the payoff will be in your new skills and learning.

THE FOLLOWING IS FOR INFORMATIONAL PURPOSES, YOU DO NOT HAVE TO READ THE RESEARCH PAPER:

The inspiration for this assignment is the research paper “Practical state recovery attacks against legacy RNG implementations” by Shaanan N. Cohney, Matthew D. Green, and Nadia Heninger, with more details at duhkattack.com.

Read all the instructions carefully and make sure you understand what is expected of you

## Full Story (just for fun)

Watch the video here and read the accompanying description!

## The (Short) Story

You have intercepted an encrypted message, and know some details about how it was generated. The message was produced by taking the xor of the original message with some key k1 .

This key, k1 , was produced using a pseudorandom number generator at times T11 to T19 .

However, while you know all the times at which the generator was ever used (T0 to T19 ), you only have access to outputs from T9 and T10 (O9 and O10 ).

You reverse engineered the generator and found it used a second key, k2 .

Your goal is to reconstruct k1 , and use it to decrypt the original message.

## Background Material

The following additional material provides context for the assignment itself. You can also watch the videos uploaded to the LMS for a walkthrough of the assignment and the background material.

### Generating Random Numbers

Computers don’t know how to be truly random. While one can purchase a device to measure very unpredictable phenomena (like radioactive decays) and convert them into random numbers, most computers have no such hardware. Instead, computers rely on algorithms that produce random looking output that is very hard for an outsider to distinguish from output that is truly random. How hard? We say that no algorithm that runs in polynomial time (Big O of any polynomial) should be able to tell if the output is truly random or not with anymore than an minute probability. We call these pseudorandom number generators.

Importantly, the sequence of random numbers should also not be guessable. If you see prior output from the algorithm, they should be so “random” that you can’t guess what will come next. This is important because we typically use a random generator to make random keys for encryption.

One well known algorithm to generate random numbers is the the following pseudorandom number generator (known as the ‘ANSI X9.31 DRBG’).

The generator itself relies on having a cryptographic function, normally one called AES that we discuss in a bit more detail in Section 4.5. Because of this choice, our assignment will process 128-bits at a time, the same amount as AES. We represent the encryption of message m under AES with k key as AESk (m) = c.

The X9.31 generator operates as follows:

• Step 1: Initialize the state of the algorithm with some pre-generated random array V0 of size 128-bits (think of it as an array of 16 characters). This can be done by taking a value such as the time the computer was turned on, xor’d with other timing values from the startup process. Then loop starting with i = 1 as many times as you need to generate sufficient output
• Step 2: Generate an intermediate value `Ii = AESk(Ti)` where Ti is the current time
• Step 3: Generate 128 bits of output by computing `Oi = AESk(Ii ⊕ Vi−1)`
• Step 4: Calculate the value that will be used in the next iteration `Vi = AESk(Oi ⊕ Ii)`

Eventually you’ll end up with an array O that has as many blocks of random numbers as you need.

Note that the value Vi is the only variable from the previous iteration that is used in the next iteration. We therefore call Vi the state of the pseudorandom number generator. In this design, if an attacker can figure out the state, they can predict all future outputs.

The values we provide already incorporate the value of V0 , so you will not need to worry about it in the assignment.

### The DUHK Attack

Unfortunately the ANSI X9.31 random number generator has a deadly flaw. If the following conditions hold, an attacker can guess the sequence of random numbers, and therefore know any key that’s made up of those numbers:

The attack works as follows for i = 1 without loss of generality, assuming the following notation for AES decryption `AES k-1 (c) = m`

Now assume that our clever attacker:

1. Has a small number of good guesses for the key k
2. Has intercepted Oi and Oi+1
3. Knows the values of Ti and Ti+1 the times when the two outputs were generated

The attacker now makes lots of guesses for the key k. When they guess the right one, the left side and the right side of the equations will be equal! They now know the key.2
Because the attacker now knows the key they can compute the secret state of the generator using: an equation in which they know the value of all the variables on the right hand side.

And, once the attacker knows the state of the generator, they can generate all the subsequent outputs Oi+2 , Oi+3 etc., assuming they also know the times at which those are generated.

### Bits and Bytes

To carry out this attack you might want to learn a bit more about how we can store and manipulate all the values used for this sort of cryptography.

Remember from lecture that all values are ultimately represented inside a computer’s memory as numbers of some sort. Every number can be written in different ‘bases’, like decimal (base-10) or hexadecimal (base-16) which we use for addresses, or binary (base-2).

If we write a number in binary, it will consist of only zeroes and ones, with the rightmost digit corresponding to the one-place, the second-rightmost digit corresponding to the twos-place, then fours-place, eights-place, etc. ascending in powers of two.

This is directly analogous to the places in a base-10 number: the ones place, the tens place, the hundreds place.

Thinking back to what we know about the char type, we know it can store up to 256 values (including zero). To write all these numbers in binary we would need eight ones and zeroes, each of which we call a ‘bit’, with eight bits making up a ‘byte’. So, when we represent a char in binary, we will typically add zeroes to the front until all eight bits are represented.

This assignment will not require you to manipulate the bits of any individual number other than using the XOR operation on two bytes at a time.

We can do the same thing, just with base-16, instead of base-2, where along with the digits 0-9 we add the letters A-F to our repertoire. We call this the hexadecimal system, or hex for short.

This means after 9, we count A, B, C, D, E, F and then we’re out of symbols, so we add a 1 to the next column (the 161 s), and put a zero in the original column (the 160 s).

Likewise we could write3 204 as 12 161 + 12 160 = 0xC 161 + 0xC 160 = 0xCC. Note that here I prefix hexadecimal numbers with 0x to indicate to the reader, by convention, that the number is in base-16.

### XOR

One of the operations that we care about more in cryptography is called exclusive-or.

Given two eight-bit numbers represented as a collection of eight bits `i0...7` and `j0...7` , we can define operations that change the bits in different ways. One of these operations is XORshort for exclusive-orwhich we write using the symbol . We can define XOR in the following manner:
Let l0…7 be the output of the operation.

### One-time pad and Basic Encryption

For the very last part of the assignment you’ll need to know just a tiny bit more cryptography which we present here.

Symmetric encryption (where both sides start sharing a secret key) obeys the following property: `Dk (Ek (m)) = m`.

That is with a message m, encryption algorithm E and decryption algorithm D, both of which use a key k, the decryptionof-the-encryption should be the original message.

A one-time pad is a way of encrypting a message. Given a key that is as many bits in length as the message to encipher, this provides the best theoretical security guarantee of any cipher4 . To encrypt a message m into a ciphertext c with key k the process is simple: for the i-th byte of the message, XOR it with the i-th byte of the key.

The decryption process is similar: for the i-th byte of the ciphertext, XOR it with the i-th byte of the key.

Why don’t we use this technique everywhere? For the size of the messages that we transmit daily, we’d need to have already sent and stored keys just as long as the messages! This quickly gets unwieldy and infeasible.

Instead we typically use ciphers that aren’t as strong but that can use a small key to secure messages that are for all practical purposes, secure. The most widely used (symmetric) cipher is called AES (the Advanced Encryption System).5

Your solutions must be contained within a C file, program1.c.
Your code must compile as submitted or you may receive a zero on the assignment.

• Input Line 1: Length of Encrypted Message in bytes, an integer l with a maximum value of 1024.
• Input Line 2: Encrypted Message encrypted using a one time pad under key k1 , which is l characters in length. Each character is represented as a two-digit hexadecimal number in the file (see Lecture 10, minute l15). This corresponds to b = 16 groups of 16-characters (blocks), where x is the ceiling function. The numbers will not be prefixed with 0x.
• Input Line 3: Outputs O9 and O10 from the random number generator, in groups of 16 characters, where each character is represented as a two-digit hexadecimal number in the file.
• Input Line 4: Timesteps T0 through T19 used to generate O0 to O19 , in groups of 16-characters (blocks), where each character is represented as a two-digit hexadecimal number in the file.
• Input Line 5: The first 1284 characters in the text of Sherlock Holmes (or some other novel!), the cipher book.

There is no trailing newline at the end of your input.

Your job is to read in the text of the book, and find which 16-character block has been used as the key k2 for the random number generator.

You will then generate enough output from the random number generator to produce k1 , which is composed entirely of this random output.

Finally, you will decrypt the original message using the key k1 .

Along with this spec we are providing you with skeleton code to fill in, a set of libraries to accompany the scaffold and files containing the input data. See Section 6 for more details.

We now provide information about the stages in greater detail.

### Stage 0: Reading the Input File

Your first task is to read in the input file (from stdin, the standard input), and parse each line correctly, as described above.

This is typically done (outside Grok) with a command similar to this one:

``````./program1 < assignment1-input1.txt
``````

When the program tries to read input, it receives the contents of the file. You can always open up a file in your text editor to see what’s inside, and we encourage you to do so for this assignment.6

Do not use the facilities in C to open the input file within your code. While this is often good practice, we have not learned it yet in class. Therefore, your submission must take in input as described above.

You must store your results in the following variables, available as arguments in the function stage0:

• int *ciphertext_length which should be set by pointer to the length of the ciphertext (Line 1),
• msg_t ciphertext which should contain the ciphertext (Line 2),
• block_t outputs[] which should contains the two outputs from the random number we provide (Line 3),
• block_t timesteps[] which should contains all the timesteps (Line 4), and
• book_t cipherbook which should contain the text of the cipher book. (Line 5).

These values are passed to the submit_stage0 function called in main. You will not be able to pass the tests on Grok if you leave extraneous printf statements in your code. This will present you with debugging output in the following format (truncated for brevity):

### Stage 1: Stripping Punctuation

Produce an array that consists of only the alphanumeric values from the book’s text on Line 5.

You may not use any functions from string.h or ctype.h.

You must store your results in the following variables, available as arguments in the function stage1:

• book_t cipherbook[] which should (after your completed implementation) contain only the alphanumeric values from the book, and
• int *book_len which should be set by pointer to be the length of the cipher book after you have removed all non-alphanumeric values.

These values are passed to the submit_stage1 function called in main. This will submit your result for stage 1 for grading, as well as giving you a debug output with the format (truncated for brevity):

### Stage 2: Guessing the key k2

You will need to implement a loop to iterate through the stripped book you generated in stage 1, in blocks of 16-bytes. For each block, generate both sides of eq. (1). above and check if both sides match. When both sides match, you have discovered the correct set of 16 bytes that comprises k2

You must store your result in the following variable, available as arguments in the function stage2:

• block_t key2 which should contain the 16 characters that make up k2 .

This value is passed to the submit_stage2 function called in main. This will submit your result for stage 2 for grading, as well as giving you a debug output with the format:

### Stage 3: Generating the key k1

You will need to generate outputs O11 , O12 , …, Ok until you have generated n bytes of output where n is the length in characters of the encrypted message on Line 2. Remember each O is 16-bytes in length.

Write a function that implements the random number generator, and use it to generate output of length n. This output is k1 . To do so you will need to write code that performs Steps 1 to 4 listed in section 4.1.

This will require you to first calculate V10 , the initial state of the generator, which you should be able to compute from

### Stage 4: Decrypting the original message

Iterate over both k1 and c the encrypted message taking the xor to produce the original unencrypted message and solve the caper!
You must store your result in the following variable, available as arguments in the function stage4:

• byte_t plaintext[] which should contains the decrypted message.

These values are passed to the submit_stage4 function called in main. This will submit your result for stage 4 for grading, as well as giving you a debug output with the format:

## Provided Code

We are providing you with a file that contains some pre-existing code to both aid in grading, and also to help you structure your assignment.

You must use the scaffold code for the assignment, else you will have differences in your output and will face severe mark deductions.

The scaffold begins with an authorship declaration. You must fill this in with your full name, student number and date.

### Type definitions

The scaffold provides a set of type definitions: book_t, byte_t, block_t and msg_t. These are aliases for their defined types. For example, block_t, representing a block of 16 bytes of data is defined with:

block_t can be used anywhere where you would use a unsigned char[16]. For the rest of the implementations, see the scaffold code.

### The aes.h library

The scaffold includes the library aes.h with the line #include “aes.h”. It provides the simple AES implementation functions:

Clearly, a variable of type block_t could be used in the arguments of the AES functions.
Do not modify any code in aes.h or aes.c. It will be replaced with a fresh version in grading.

These are called automatically in the scaffold code at the end of each stage of the assignment, for printing the outputs you saw in 5, as well as submitting those stages for assessment during grading. Make sure your values for submission are stored in the variables that are passed to these functions.

The library also provides the helper function void hexdump(byte_t a[], int len), a debug function that outputs a byte array in the format you saw in each of the stages in Section 5.

On each line, the bit number of the first byte of each block is printed in hexadecimal. Then follows two sets of eight bytes (a block of 16 bytes), with each byte printed as pairs of hexadecimal characters (00 to FF represents all values 0 to 255 that can be stored in a byte). Finally, an ASCII representation of these bytes is printed, with all non-printable ASCII characters (control characters, newlines, invalid ASCII) being printed as (centerdot). In the example above, the byte 0f is at address 0x0010 and is not a printable ASCII character, so is printed as a .

You are free to use this function in debugging your code, but we recommend removing those debug printouts when you submit your final work.

We have provided you an implementation for this library, in a1grader.c for the development version of these functions.

Do not modify any code in a1grader.h or a1grader.c. It will be replaced with a fresh, but modified version for grading.
The scaffold code includes the functions from these files by referencing their header files:

If you wish to see how any of these functions are implemented you may open and view aes.c and a1grader.c.

It reads input from the command line, repeatedly converting pairs of hexadecimal characters (‘0’ to ‘F’), representing bytes of data to byte_t values until either a newline ‘\n’ is read, or max_count bytes have been read in. The function returns the number of bytes that have been successfully read in. If you would like to know the last character that was read in by the function after reading in values (for example, to see if it read in a newline), pass a character by pointer in the last_char argument to retrieve that value. You may also pass NULL to that argument if you do not care about that value.

## Developing and testing your program

If you are developing on Grok, all of the files including the scaffold (where you will complete the assignment code), an implementation for aes.c and the sample grader a1grader.c, and their header files (aes.h, aes.h) will be automatically included.

You may compile and run as normal there.

If you plan to develop on your own machine, you will need to make sure you have all of these files downloaded, with their correct names, including the sample test files named assignment1-input1.txt and assignment1-input2.txt.

They should all be within the same directory. You can then compile your program with the following command:

``````clang -Wall -std=c11 -pedantic program1.c aes.c a1grader.c -o program1
``````

If you are using the gcc compiler, replace clang with gcc in the above line.

Your code will be compiled with the same command as above for grading, so make sure it compiles without warnings and behaves as expected! Programs that do not compile will receive serious mark deductions!

You must not remove or modify any of the provided code, as doing so will confuse our grading tools.

Note: The test files we’re using on Grok do not contain the same secret message contained in the provided input files. You can get full points either way, but it’s more fun to actually decrypt the real messages. We do this because Grok shows you the desired output, and we want to hide the actual secrets!

## Style

Style will be graded as part of each section. While there is no official style guide, we require that your code be readable, appropriately documented, and use abstraction where appropriate (functions and typedefs).

We also provide the following set of indicators that we will use when evaluating your code for style, along with explanations for why we care about some of them.

## Submission Instructions

This assignment is worth 15% of your final mark for the subject.
You need to submit your code in Grok Learning (https://groklearning.com) for assessment.
Submission will NOT be done via Canvas. Instead, you will need to:

2. Navigate to the Assignment 1 module of our subject COMP10002 2021 S1: Foundations of Algorithms.
3. Place your code in the program1.c tab window.
4. Compile your code by clicking on the Compile button.
5. Once the compilation is successful, click on the Mark button to submit your code. (You can submit as many times as you want to. Only the last submission made before the deadline will be marked.)
6. Two sample tests will be run automatically after you make a submission. Make sure that your submission passes these sample tests.
7. Two hidden tests will be run for marking purpose. Results of these tests will not be available until after the submissions are marked.

You can (and should) submit both early and often - to check that your program compiles correctly on our test system, which may have some different characteristics to the lab machines and your own machines.

To receive any points for the coded stages your code must compile.

We will test your code with inputs (and novels) other than that provided as a sample. Please ensure you don’t hard-code any values from the input file.

You will be given sample input files:

• assignment1-input.txt
• assignment2-input.txt
• grok-input1.txt
• grok-input2.txt

and sample output for the two Grok inputs: grok-output1.txt and grok-output2.txt. You can test your code on your own machine with any one of the files using following command and compare the output with the matching file:

``````./program1 < assignment1-input.txt /* < feeds data from the file into the program */
``````

For the input files without matching output, you will know you have solved the task when the result of Stage 4 is a message written in English. The is no grade penalty for not testing on the two files without output, but doing so will allow you to read the secret messages within!

Note that we are using the following command to compile your code on the submission system (we name the source code file program.c). clang -Wall -std=c11 -pedantic program1.c aes.c a1grader.c -o program1

The flag “-std=c11” enables the compiler to use a more modern standard of the C languageC11. To ensure that your submission works properly on the submission system, you should use this command to compile your code on your local machine as well.

We have added a feature to the scaffold allowing you to produce just the output from a particular stage, and save the output to a file.

Grok automatically uses this feature to test each stage, and will show you both the output from your program given the stage being tested, and compare it to the correct output.

To do this on your own computer run the program as follows:

``````./program1 n < assignment1-input.txt /* < feeds data from the file into the program */
``````

Where you must replace n with the number of the stage you wish to test.

The output is then stored in a file program1_output.txt which you can open with a text editor.

To compare this file with the sample output we have provided a utility that works on Linux/Mac or Linux on Windows, called compare.sh. To use it, first run your program with the argument for the stage you wish to check. Then run the following command where n is the stage number, and correctfile is the name of the output file against which you wish to compare your own output.