Python代写:COMSW3101 Introduction to Python

Introduction

本次需要代写的Python作业,包含了5个算术问题需要解决。

Problem 1 - Decrypting Government Data

Your job is to summarize this gov data about oil consumation

  • The format of the file is rather bizzare - note that each line has data for two months, in two different years! (Plus I had to hand edit the file to make it parseable)
  • Fortunately, Python is great for untangling and manipulating data.
  • Write a generator that reads from the given url over the network, and produces a summary line for a year’s data on each ‘next’ call
  • remember that urllib.request returns ‘bytes arrays’, not strings
  • The generator should read the lines of the oil2.txt file in a lazy fashion - it should only read 13 lines for every two years of output. Note a loop can have any number of ‘yield’ calls in it.
  • Ignore the monthly data, just extract the yearly info
  • Drop the month column
  • In addition to the ‘oil’ generator function, my solution had a separate helper function, ‘def makeCSV- Line(year, data):’

Here is the first two years of data, 2014 and 2013

Year,Quantity,QuantityChange,Unknown,Unknown2,Price,PriceChange
2014,2700903,-112867,246409332,-26397845,91.23,-5.72
2013,2813770,-283638,272807177,-40367786,96.95,-4.15
2012,3097408,-224509,313174963,-18407090,101.11,1.29
2011,3321917,-55160,331582053,79421544,99.82,25.15
2010,3377077,62290,252160509,63448733,74.67,17.74
2009,3314787,-275841,188711776,-153200712,56.93,-38.29
2008,3590628,-99940,341912488,104700835,95.22,30.95
2007,3690568,-43658,237211653,20584322,64.28,6.26
2006,3734226,-20445,216627331,40871990,58.01,11.20
2005,3754671,-66308,175755341,44012676,46.81,12.33
2004,3820979,144974,131742665,32575492,34.48,7.50
2003,3676005,257983,99167173,21883842,26.98,4.37
2002,3418022,-53045,77283331,2990437,22.61,1.21
2001,3471067,71827,74292894,-15583539,21.40,-5.04
2000,3399240,171148,89876433,38986812,26.44,10.68
1999,3228092,-14620,50889621,13637399,15.76,4.28
1998,3242712,173281,37252222,-16973685,11.49,-6.18
1997,3069431,175785,54225907,-704950,17.67,-1.32
1996,2893646,126333,54930857,11181204,18.98,3.17

now that we have something that looks like a CVS file, can do all kinds of things

  • could save it to a file then
    • excel, openoffice could read it
    • Python has a CVS Reader
  • with a little juggling, can easily pump the data into a panda DataFrame

Input:

1
2
3
4
5
6
7
8
9
10
11
12
with open('/tmp/oil.csv', 'w') as f:
for l in oil(url):
f.write(l + '\n')

o = oil(url)
ls = list(o)
s = '\n'.join(ls)
import pandas as pd
import io
# we will cover StringIO next week - kind of an 'in-memory' file
df = pd.read_csv(io.StringIO(s))
df

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
    Year Quantity QuantityChange    Unknown   Unknown2   Price PriceChange
020142700903-112867246409332-2639784591.23-5.72
120132813770-283638272807177-4036778696.95-4.15
220123097408-224509313174963-18407090101.111.29
320113321917-551603315820537942154499.8225.15
420103377077622902521605096344873374.6717.74
520093314787-275841188711776-15320071256.93-38.29
620083590628-9994034191248810470083595.2230.95
720073690568-436582372116532058432264.286.26
820063734226-204452166273314087199058.0111.20
920053754671-663081757553414401267646.8112.33
10200438209791449741317426653257549234.487.50
1120033676005257983991671732188384226.984.37
1220023418022-5304577283331299043722.611.21
13200134710677182774292894-1558353921.40-5.04
1420003399240171148898764333898681226.4410.68
1519993228092-14620508896211363739915.764.28
161998324271217328137252222-1697368511.49-6.18
171997306943117578554225907-70495017.67-1.32
1819962893646126333549308571118120418.983.17
19199527673136311643749653527023615.811.58
2019942704197160822384794171004114.23-0.90
211993254337524880538469376-8367915.13-1.68

Input:

1
[df['Price'].mean(), df['Price'].min(), df['Price'].max()]

Output:

1
[46.63681818181818, 11.49, 101.11]

Problem 2

  • suppose we want to convert between C(Celsius) and F(Fahrenheit), using the equation 9C = 5 (F-32)
  • could write functions ‘c2f’ and ‘f2c’
  • do all computation in floating point for this problem

Input:

1
2
3
4
5
defc2f(c):
return((9. * c + 5. * 32.) / 5.)
deff2c(f):
return(5. * (f - 32) / 9.)
[c2f(0), c2f(100), f2c(32), f2c(212)]

Output:

1
[32.0, 212.0, 0.0, 100.0]

  • to write f2c, we solved the equation for C, and made a function out of the other side of the equation
  • to write c2f, we solved for F, . . .
  • there is another way to think about this
  • rearrange the equation into a symmetric form 9 * C - 5 * F = -32 * 5
  • you can think of the equation above as a “constraint” between F and C. if you specify one variable, the other’s value is determined by the equation. in general, if we have c0 * x0 + c1 * x1 + … cN * xN = total
  • cI are fixed coefficients
  • specifying any N of the (N + 1) x’s will determine the remaining x variable
  • define a class, ‘Constaint’ that will do ‘constraint satisfaction’
  • you may find ‘dotnone’ to be helpful

Input:

1
2
3
4
5
6
7
8
9
10
# regular dot product, except that if or both values in a pair is 'None',
# that term is defined to contribute 0 to the sum
defdotnone(l1, l2):
'''another dot product variant'''
sum = 0
for e1,e2 in zip(l1,l2):
ifnot (e1 isNoneor e2 isNone):
sum += e1 * e2
return(sum)
[dotnone([1,2,3], [4,5,6]), dotnone([1,None,3], [4,5,6]), dotnone([None,1], [2,None])]

Output:

1
[32, 22, 0]

Input:

1
2
3
4
5
6
7
8
9
10
11
12
13
# setup constraint btw C and F
# 1st arg is var names,
# 2nd arg is coefficients
# 3rd arg is total
c = Constraint('C F', [9, -5], -5 * 32)
# 1st arg - variable index or name
# 2nd arg - variable value
# setvar will fire when there is only one unset variable remaining
# it will print the variable values, return them in a list, and
# clear all variable values
c.setvar(0, 100)
C = 100.0
F = 212.0

Output:

1
[100.0, 212.0]

Problem 3 - Hamlet

  • Python is very popular in ‘digital humanities’
  • MIT has the complete works of Shakespeare in a simple html format
  • You will do a simple analysis of Hamlet by reading the html file, one line at a time(usual iteration scheme) and doing pattern matching
  • The goal is to return a list of the linecnt, total number of ‘speeches’(look at the file format), and a dict showing the number of ‘speeches’ each character gives
  • Your program should read directly from the url given, but you may want to download a copy to examine the structure of the file.
  • remember that usrlib.request returns ‘byte arrays’, not strings
  • here’s a short sample of the file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<ANAME=speech25><b>HORATIO</b></a>
<blockquote>
<ANAME=1.1.37>Tush, tush, 'twill not appear.</A><br>
</blockquote>

<ANAME=speech26><b>BERNARDO</b></a>
<blockquote>
<ANAME=1.1.38>Sit down awhile;</A><br>
<ANAME=1.1.39>And let us once again assail your ears,</A><br>
<ANAME=1.1.40>That are so fortified against our story</A><br>
<ANAME=1.1.41>What we have two nights seen.</A><br>
</blockquote>

<ANAME=speech27><b>HORATIO</b></a>
<blockquote>
<ANAME=1.1.42>Well, sit we down,</A><br>
<ANAME=1.1.43>And let us hear Bernardo speak of this.</A><br>
</blockquote>

<ANAME=speech28><b>BERNARDO</b></a>
<blockquote>
<ANAME=1.1.44>Last night of all,</A><br>
<ANAME=1.1.45>When yond same star that's westward from the pole</A><br>
<ANAME=1.1.46>Had made his course to illume that part of heaven</A><br>
<ANAME=1.1.47>Where now it burns, Marcellus and myself,</A><br>
<ANAME=1.1.48>The bell then beating one,--</A><br>
<p><i>Enter Ghost</i></p>
</blockquote>

<ANAME=speech29><b>MARCELLUS</b></a>
<blockquote>
<ANAME=1.1.49>Peace, break thee off; look, where it comes again!</A><br>
</blockquote>

<ANAME=speech30><b>BERNARDO</b></a>
<blockquote>
<ANAME=1.1.50>In the same figure, like the king that's dead.</A><br>
</blockquote>

Input:

1
hamlet(url)

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[8881,
1150,
defaultdict(int,
{'All': 4,
'BERNARDO': 23,
'CORNELIUS': 1,
'Captain': 7,
'Danes': 3,
'FRANCISCO': 8,
'First Ambassador': 1,
'First Clown': 33,
'First Player': 8,
'First Priest': 2,
'First Sailor': 2,
'GUILDENSTERN': 33,
'Gentleman': 3,
'Ghost': 14,
'HAMLET': 359,
'HORATIO': 112,
'KING CLAUDIUS': 102,
'LAERTES': 62,
'LORD POLONIUS': 86,
'LUCIANUS': 1,
'Lord': 3,
'MARCELLUS': 36,
'Messenger': 2,
'OPHELIA': 58,
'OSRIC': 25,
'PRINCE FORTINBRAS': 6,
'Player King': 4,
'Player Queen': 5,
'Prologue': 1,
'QUEEN GERTRUDE': 69,
'REYNALDO': 13,
'ROSENCRANTZ': 49,
'Second Clown': 12,
'Servant': 1,
'VOLTIMAND': 2})]

Problem 4

  • in class, we discussed two different ways to represent a polynomial
    • polylist, a ‘dense’ represenation, that hold the coefficients in a list
    • polydict, a ‘sparse’ representation, that holds (exponent, coefficent) pairs in a dict
  • add a method, ‘topolydict()’ to class ‘polylist’, that converts the polylist into a polydict
  • add a method, ‘topolylist()’ to class ‘polydict’, that converts the polydict into a polylist
  • note that polylist->polydict will always work, but polydict->polylist can fail, because a polylist cannot represent negative exponents. in this case, raise a ValueError
  • just to tell them apart, polylist prints with a leading ‘+’

Input:

1
2
3
4
5
6
pl1 = polylist([1, 2, 3])
pl2 = polylist([0, 10, 5])
pd1 = polydict({2:3, 1:2, 0:1})
pd2 = polydict({1:10, 2:5})
pd3 = polydict({-1:10, 2:5})
[pl1, pl2, pd1, pd2, pd3]

Output:

1
2
3
4
5
[+ 3 * X ** 2 + 2 * X + 1,
+ 5 * X ** 2 + 10 * x,
3 * X ** 2 + 2 * X + 1,
5 * X ** 2 + 10 * X,
5 * X ** 2 + 10 * X ** -1]

Input:

1
[pl1.topolydict(), pl2.topolydict(), pd1.topolylist(), pd2.topolylist()]

Output:

1
[3 * X ** 2 + 2 * X + 1, 5 * X ** 2 + 10 * X, + 3 * X ** 2 + 2 * X + 1, + 5 * X ** 2 + 10 * X]

Problem 5

define the __mul__ method for polydict
Input:

1
[pd1, pd2, pd3, pd1 * pd2, pd1 * pd3, pd2 * pd3]

Output:

1
2
3
4
5
6
7
8
[+ 3 * X ** 2 + 2 * X + 1,
+ 5 * X ** 2 + 10 * x,
3 * X ** 2 + 2 * X + 1,
5 * X ** 2 + 10 * X,
5 * X ** 2 + 10 * X ** -1,
15 * X ** 4 + 40 * X ** 3 + 25 * X ** 2 + 10 * X,
15 * X ** 4 + 10 * X ** 3 + 5 * X ** 2 + 30 * X + 20 * X ** -1,
25 * X ** 4 + 50 * X ** 3 + 50 * X + 100]