C代写:CS136 Encode

完成C语言基础练习,根据规则实现Encode和Decode方法。

Encode

Requirement

  • Your coding style will NOT be graded for this question.
  • Your testing methodology will NOT be graded for this question.
  • All assignment rules and policies are in place for this question.
  • There is no public test. Marmoset only ensures that your code “runs”.
  • For this and EVERY question, you may define helper functions

For your convenience, the same README.TXT file appears in q1a and q1b.

When we transmit information we often want to try and minimise the size of the data that we have to send using data compression.

A popular method of data compression is to identify common patterns and then replace (substitute) those patterns with shorter codewords.

For this question, we compress a string that contains only lower case letters and whitespace (no punctuation, upper case, numbers, etc.).

We replace a pattern of lower case letters with a single non-lower case character.

For example, we could replace the pattern “the” with the char ‘T’.

If the original message was “the cat in the hat”, the compressed (or “encoded”) message would be “T cat in T hat”.

There could be multiple patterns, so we could also encode “at” as ‘@’ and then the message would be “T c@ in T h@”.

A “mapping” struct map contains:

  • the single character [small] and
  • the pattern [big].
1
2
3
4
struct map {
char small;
char *big;
};

For the above example, we would have:

1
struct map example[2] = { {'@', "at"}, {'T', "the"}};

There are a few rules for a [struct map] that must be followed:

  • [small] must be in the ASCII range ‘!’ … ‘_‘ (inclusive)
  • [big] must be a valid and non-empty string that contains only lower case letters (‘a’ … ‘z’) and no other characters (e.g., no whitespace)

When working with an array of maps, there are additional rules:

  • the [small] fields must all be unique (no duplicates)
  • the [big] fields must not only be unique, but no [big] field can be a prefix of another [big] field (see below for an example)
  • the elements of the array must be sorted so that the [big] strings are in lexicographical order (see above: “at” before “the”)

Rule 5 exists so you can use binary search on an array of maps to find a pattern match, so if the longest [big] in the array is “m” and there are “k” entries, then it would take O(m*logk) to search the array.

Rule 4 exists to make this assignment easier by removing any ambiguity.

No [big] field can be the prefix of another big field, so you could not have both “the” and “theatre” as [big] fields because “the” is a prefix of “theatre” (“the” appears at the start of “theatre”).

Even despite rule 4, there could be some ambiguity because some [big] fields could overlap. Consider this overlap example:

1
struct map overlap_example[3] = { {'@', "at"}, {'E', "eat"}, {'T', "the"}};

This is a valid array because no [big] field is the prefix of any other.

However, for the string “theatre” it could be encoded as “T@re” or “thEre”.

To avoid this, we have our final rule:

  • the “first” match is always selected, moving from the start of the message string to end (in other words, moving left to right)

So in our previous example, the encoding would be “T@re” because the first match moving left-to-right would be “the”.

In part A, you use an array of struct map to encode (compress) a string

In part B, you use an array of struct map to decode (decompress) a string

For both part A and B we have provided a test client that uses the same file format.

The file format should be straightforward for you to understand:

  • make sure you provide your mappings in alphabetical order by [big] field
  • a tilde (~) followed by a newline is used to separate the mappings and the message

Additional Notes

  • In both parts, you must provide a new, dynamically allocated string that has been realloc’d to the correct length.
  • If you were to encode a string and then decode it, the result should be the same as the string you had prior to encoding.
  • In Part A, the encoded string must be as short as possible following the above rules, meaning that if a substitution is possible it must be done.