完成Python基础算法和数据结构练习。

## Brangelina

1 | def brangelina(first, second): |

The task of combining the first names of celebrity couples into a catchy shorthand for media consumption turns out to be surprisingly simple to automate. Start by counting how many groups of consecutive vowels (aeiou, since to keep this problem simple, the letter y is always a consonant) there are inside the first name. For example, ‘brad’ and ‘ben’ have one group, ‘sheldon’ and ‘britain’ have two, and ‘angelina’ and ‘alexander’ have four. Note that a vowel group can contain more than one vowel, as in ‘juan’ or ‘queueing’.

If the first name has only one vowel group, keep only the consonants before that group and throw away everything else. For example, ‘ben’ becomes ‘b’, and ‘brad’ becomes ‘br’. Otherwise, if the first word has `n > 1`

vowel groups, keep everything before the second last vowel group n - 1. For example, ‘angelina’ becomes ‘angel’ and ‘alexander’ becomes ‘alex’. Concatenate that with the string you get by removing all consonants at the beginning of the second name.

All names given to this function are guaranteed to consist of the 26 lowercase English letters only, and each name will have at least one vowel and one consonant somewhere in it.

first | second | Expected result |
---|---|---|

‘brad’ | ‘angelina’ | ‘brangelina’ |

‘angelina’ | ‘brad’ | ‘angelad’ |

‘sheldon’ | ‘amy’ | ‘shamy’ |

‘amy’ | ‘sheldon’ | ‘eldon’ |

‘frank’ | ‘ava’ | ‘frava’ |

‘britain’ | ‘exit’ | ‘brexit’ |

(These simple rules do not always produce the best possible result. For example, ‘ross’ and ‘rachel’ meld into ‘rachel’ instead of ‘rochel’, and ‘joey’ and ‘phoebe’ meld into ‘joebe’ instead of ‘joeybe’ . The reader is invited to think up more advanced rules that would cover a wider variety of name combinations and special cases. Some truly advanced set of rules that recognizes the semantic content of words would even combine ‘donald’ and ‘hillary’ into ‘dollary’ instead of ‘dillary’ …)

## Line with most points

1 | def line_with_most_points(points): |

A point on the two-dimensional grid of integers is given as a two-element tuple of x - and y - coordinates, for example (2, 5) or (10, 1). As originally postulated by Euclid, for any two distinct points on the plane, there exists exactly one straight line that goes through both points. (In differently shaped spaces, different rules and their consequences apply.) Of course, this same line, being infinite in both directions, will also go through an infinity of other points on the plane.

Given a list of points on the integer grid, find the line that contains the largest number of points from this list. To facilitate automated testing and guarantee that the answer for each test case is unambiguous, this function should not return the line itself, but merely the count of how many of these points lie on that line. The list of points is guaranteed to contain at least two points and all points in the list are distinct, but the points are otherwise not given in any sorted order.

points | Expected result |
---|---|

[(42, 1), (7, 5)] | 2 |

[(1, 4), (2, 6), (3, 2), (4, 10)] | 3 |

[(x, y) for x in range(10) for y in range(10)] | 10 |

[(3, 5), (1, 4), (2, 6), (7, 7), (3, 8)] | 3 |

[(5, 6), (7, 3), (7, 1), (2, 1), (7, 4), (2, 6), (7, 7)] | 4 |

This problem can be brute forced with three nested loops, but the point (heh) of this problem is not to do too much more work than you really need to. To simplify your logic, consult the example program geometry.py for the cross product function that can be used to quickly determine whether three points are collinear.

## Wythoff array

1 | def wythoff_array(n): |

Wythoff array (see the Wikipedia article for illustration) is an infinite two-dimensional grid of integers that is seeded with one and two to start the first row. In each row, each element equals the sum of the previous two elements, so the first row contains precisely the Fibonacci numbers.

The first element of each later row is the smallest integer c that does not appear anywhere in the previous rows. (Since each row is strictly ascending, you can find this out without having to compute an infinite number of elements.) To compute the second element of that row, let (a, b) be the first two elements of the previous row. If the difference c-a equals two, the second element of that row equals b+3, and otherwise that element equals b+5. This construction all by itself guarantees the Wythoff array to be an interspersion of positive integers; every positive integer will appear exactly once in this infinite grid, with no gaps or duplicates! (This result nicely highlights the deeper combinatorial importance of the deceptively simple Fibonacci numbers as building blocks of other integers and integer sequences.)

The difficulty in this problem is determining the first two elements in each row after the first row, since once you know those first two values, the rest of the row is utterly trivial to generate as far as you need. This function should return the position of n inside the Wythoff array as a tuple of form (row, column), both the row and column numbers starting from zero.

n | Expected result |
---|---|

21 | (0, 6) |

47 | (1, 5) |

1042 | (8, 8) |

424242 | (9030, 6) |

39088169 | (0, 36) |

39088170 | (14930352, 0) |

## Sevens rule, zeros drool

1 | def seven_zero(n): |

Seven is considered a lucky number in Western cultures, whereas zero is what nobody wants to be. To bring these two opposites briefly together, let us look at positive integers that start with some solid sequence of sevens, followed by some (possibly empty) solid sequence of zeros. For example, 7, 77777, 7700000, 77777700, or 70000000000000000. A surprising theorem proves that for any positive integer n, there exist infinitely many integers of such seven-zero form that are divisible by n. This function should find and return the smallest such seven-zero integer.

This exercise is about efficiently generating and iterating through the numbers of the constrained form of sevens and zeros, done in strictly ascending order to guarantee finding the smallest possible working number. This is perhaps best done with two nested loops. The outer loop should iterate through the number of digits d in the current number. The inner loop should iterate through all possible ways for k from one to d to create a number that begins with a solid block of k sevens, followed by a solid block of d-k zeroes.

n | Expected result |
---|---|

17 | 7777777777777777 |

42 | 7770 |

101 | 7777 |

103 | 7777777777777777777777777777777777 |

2**50 | 700000000000000000000000000000000000000000000000000 |

This problem is adapted from the excellent MIT Open Courseware online textbook “Mathematics for Computer Science” (PDF link to the 2018 version for anybody interested) that, like so many other non-constructive combinatorial proofs, uses the pigeonhole principle to prove that some solution must exist for any integer n, but provides no clue about the exact value of such solution. Also as proven in that same textbook, whenever n is not divisible by either 2 or 5, the smallest such number will always consist of some solid sequence of sevens with no zero digits after them. Knowing this will speed up your search by an order of magnitude for such friendly values of n.

## Om nom nom

1 | def cookie(piles): |

The beloved Sesame Street character Cookie Monster has stumbled upon a table with piles of cookies, each pile a positive integer. However, the monomaniacal obsessiveness of The Count who has set up this feast has recently escalated to a whole new level of severity. To allow the Cookie Monster to enjoy this crumbly fiesta, The Count insists that these cookies must be eaten in the smallest possible number of moves. Each move consists of the Cookie Monster choosing one of the remaining pile sizes p . This move removes p cookies from every pile that contains at least p cookies (thereby eradicating all piles whose size is exactly p ), and leaves all smaller piles untouched as they were. This function should compute and return the smallest number of moves that allows Cookie Monster to scarf down these cookies.

piles | Expected result | (optimal moves) |
---|---|---|

[1, 2, 3, 4, 5, 6] | 3 | [4, 2, 1] |

[2, 3, 5, 8, 13, 21, 34, 55, 89] | 5 | [55, 21, 8, 3, 2] |

[1, 10, 17, 34, 43, 46] | 5 | [34, 9, 8, 3, 1] |

[11, 26, 37, 44, 49, 52, 68, 75, 87, 102] | 6 | [37, 31, 15, 12, 7, 4] |

This problem has a curious property that the moves in the optimal solution can be performed in any order, which is rare for combinatorial problems of this spirit. Without loss of generality, it suffices to examine only strictly descending move sequences. Even then, this problem is far deeper than it would initially seem . Various greedy strategies that would seem natural for the addiction-riddled brain of the Cookie Monster such as “choose p to maximize the number of cookies eaten at the current move” or “choose p to eliminate the largest possible number of piles” (yep, think about that one) do not always lead to the shortest possible sequence of moves.

## Autocorrect for stubby fingers

1 | def autocorrect_word(word, words, df): |

In this day and age all you whippersnappers are surely familiar with autocorrect that replaces a non-word with the “closest” real word in the dictionary. Many techniques exist to guess more accurately what the user intended to write. Since many people, such as your instructor, have stubby fingers, many typos are caused by the user pressing a virtual key next to the intended key. For example, when the non-existent word is “cst”, the intended word is far more likely to be “cat” than “cut”, assuming that the text was entered with an ordinary QWERTY keyboard.

Given a word, a list of words, and a distance function df that tells how “far” the first argument is from the second (for example, df(‘a’,’s’) == 1 and df(‘a’, ‘a’) == 0 ), find and return the word in the list of words that have the same number of letters and whose distance, measured as the sum of the distances of the characters in the same position, is the smallest. If there exist several equidistant words, return the first word in the dictionary order.

word | Expected result |
---|---|

‘qrc’ | ‘arc’ |

‘jqmbo’ | ‘jambo’ |

‘hello’ | ‘hello’ |

‘interokay’ | ‘interplay’ |

Advanced autocorrect algorithms use statistical techniques based on the surrounding context to suggest the best replacement, since not all words are equally likely to be the intended word, given the rest of the sentence. For example, the misspelling ‘urc’ should probably be corrected very differently in the sentence “The gateway consists of an urc of curved blocks” than in “The brave little hobbit swung his sword at the smelly urc.)

## Uambcsrlne the wrod

1 | def unscramble(words, word): |

Smdboeoy nteoicd a few yreas ago taht the lretets isndie Eisgnlh wdors can be ronmaldy slmecbrad wouhtit antifecfg tiehr rlaibdiatey too mcuh, piovredd that you keep the frsit and the last lteters as tehy were. Gevin a lsit of words gtuaraened to be soterd, and one serlcmbad wrod, tihs fctounin shulod rterun the list of wrdos taht cloud hvae been the orgiianl word taht got seambclrd, and of csorue retrun taht lsit wtih its wdros sterod in apcabihaetll oerdr to enrsue the uaigntbmuiy of atematoud testing. In the vast maitjory of ceass, this list wlil catnoin only one wrod.

word | Expected result (using the wordlist words_sorted.txt) |
---|---|

‘tartnaas’ | [‘tantaras’, ‘tarantas’, ‘tartanas’] |

‘aierotsd’ | [‘asteroid’] |

‘ksmrseah’ | [‘kersmash’] |

‘bttilele’ | [‘belittle’, ‘billette’] |

(Writing the filter to transform the given plaintext to the above scrambled form is also a good little programming exercise in Python. This leads us to a useful estimation exercise: how long would the plaintext have to be for you to write such a filter yourself instead of scrambling the plaintext by hand as you would if you needed to scramble just one word? In general, how many times do you think you need to solve some particular problem until it becomes more efficient to design, write and debug a Python script to do it? Would your answer change if it turned out that millions of people around the world also have that same problem?)

## Substitution words

1 | def substitution_words(pattern, words): |

Given a list of words (once again, guaranteed to be in sorted order and each consist of the 26 lowercase English letters only) and a pattern that consists of uppercase English characters, this function should find and return a list of precisely those words that match the pattern in the sense that there exists a substitution from uppercase letters to lowercase letters that turns the pattern into the word. Furthermore, this substitution must be injective, meaning that no two different uppercase letters in the pattern are mapped to the same lowercase letter in that word.

For example, the word ‘akin’ matches the pattern ‘ABCD’ with the substitutions A → a, B → k, C → i and D → n . However, the word ‘area’ would not match that same pattern, since the pattern characters A and D would both have to be mapped to the same letter a, against the rules.

pattern | Expected result (using the wordlist words_sorted.txt ) |
---|---|

‘ABBA’ | [‘abba’, ‘acca’, ‘adda’, ‘affa’, ‘akka’, ‘amma’, ‘anna’, ‘atta’, ‘boob’, ‘deed’, ‘ecce’, ‘elle’, ‘esse’, ‘goog’, ‘immi’, ‘keek’, ‘kook’, ‘maam’, ‘noon’, ‘otto’, ‘peep’, ‘poop’, ‘sees’, ‘teet’, ‘toot’] |

‘ABABA’ | [‘ajaja’, ‘alala’, ‘anana’, ‘arara’, ‘ululu’] |

‘CEECEE’ | [‘beebee’, ‘booboo’, ‘muumuu’, ‘seesee’, ‘soosoo’, ‘teetee’, ‘weewee’, ‘zoozoo’] |

‘ABCDCBA’ | [‘adinida’, ‘deified’, ‘hagigah’, ‘murdrum’, ‘reifier’, ‘repaper’, ‘reviver’, ‘rotator’, ‘senones’] |

‘DEFDEF’ | 76 words, first three of which are [‘akeake’, ‘atlatl’, ‘barbar’] |

‘ABCDEFGIJKLM’ | 243 words, first three of which are [‘absorptively’, ‘adjunctively’, ‘adsorptively’] |

(Extra math problem for the readers who are into combinatorics and discrete math: can you construct an encoding function from strings to natural numbers so that two strings map into the same number if and only if these two strings both match the same pattern ?)

## Manhattan skyline

1 | def manhattan_skyline(towers): |

This classic problem in computational geometry (essentially, geometry that can be done using only integer arithmetic; yes, that is an actual thing) is best illustrated by pictures and animations such as those on the page “The Skyline problem”, so you can first check that it out to get an idea of what is going on. Given a list of rectangular towers as tuples (s, e, h) where s and e are the start and end x - coordinates ( `e > s`

) and h is the height of that tower, compute and return the total visible area of the towers, being careful not to double count two or more towers that are partially overlapping. All towers share the same flat ground baseline at height zero.

The classic solution illustrates the important sweep line technique that starts by creating a list of precisely those x -coordinate values where something relevant to the problem takes place. In this problem, the relevant x -coordinates are those where some tower either starts or ends. Next, loop through this list in ascending order, updating your computation for the interval between the current relevant x -coordinate and the previous one. In this particular problem, you need to maintain a list of active towers so that tower (s, e, h) becomes active when x == s, and becomes inactive again when x == e. During each interval, only the tallest active tower has any effect on the computation.

towers | Expected result |
---|---|

[(2, 3, 39)] | 39 |

[(6, 8, 56), (5, 14, 81), (3, 13, 71)] | 871 |

[(6, 18, 95), (3, 20, 95), (14, 31, 22), (5, 12, 93)] | 1857 |

(The more complex versions of this classic chestnut ask for the silhouette outline as a list of polygons, as on the linked page, and also toss the restriction that all towers must lie on the same ground level line, or even be axis-aligned.)

## Count overlapping disks

1 | def count_overlapping_disks(disks): |

Right on the heels of the previous Manhattan skyline problem, here is another classic of similar spirit for us to solve efficiently with a sweep line algorithm . Given a list of disks on the two-dimensional plane as tuples (x, y, r) so that (x, y) is the center point and r is the radius of that disk, count how many pairs of disks intersect each other in that their areas have at least one point in common. To test whether disks (x1, y1, r1) and (x2, y2, r2) intersect, use the Pythagorean formula `(x2-x1)**2 + (y2-y1)**2 <= (r1+r2)**2`

. (Note again how this precise formula uses only integer arithmetic whenever all individual components are integers. And no square roots or some other nasty irrational numbers.)

For this problem, crudely looping through all possible pairs of disks would work but be horrendously inefficient as the list grows larger. However, a sweep line algorithm can solve this problem by looking at a far fewer pairs of disks. Again, sweep through the space from left to right for all relevant x - coordinate values and maintain the set of active disks at the moment. Each individual disk (x, y, r) enters the active set when the sweep line reaches the x -coordinate x-r, and leaves the active set when the sweep line reaches x+r . When a disk enters the active set, check for an intersection between that disk and only the disks presently in the active set.

disks | Expected result |
---|---|

[(0, 0, 3), (6, 0, 3), (6, 6, 3), (0, 6, 3)] | 4 |

[(4, -1, 3), (-3, 3, 2), (-3, 4, 2), (3, 1, 4)] | 2 |

[(-10, 6, 2), (6, -4, 5), (6, 3, 5), (-9, -8, 1), (1, -5, 3)] | 2 |

[(x, x, x // 2) for x in range(2, 101)] | 2563 |