C代写:COMP10002 Text Query

代写C基础作业,实现文本的查询。

Submission Instructions

A complete submission is shown in the lecture recording starting at around the 10 minute mark of the class on 5 September.

Information and Resources

Measuring speed: Note that when you have lots of output coming to the terminal screen, the scrolling and video-update processes can become time-consuming, making it seem like the program is executing slowly. If you want to measure the speed of your program, send the output into a file:

myass1 word nerd bird slurred < pg11.txt > temp.txt

Your program shouldn’t give an extended lag when you do this, the next prompt should appear within just one second (or if you have a really slow computer, maybe two seconds).

What should Stage 1 output?: When I ran my program in class on Sept 8th, I didn’t have the right messages being generated for the “incorrect query” lines, I was missing the initial S1:. Your submitted programs should comply with the specification.

Tricksy wicksy input: Hey, will we ever get command-line input strings like

ass1-soln "" < alice-eg.txt

or

ass1-soln "string with blanks" < alice-eg.txt

that are not just single words? Answer: The latter of these two should be reported as a Stage 1 error, since the string you get via argv contains a non-alnum character. The former one won’t appear in any testing that I do, but you are welcome to also report strings of length zero as Stage 1 errors if you wish.

Compilation: Dear Dr Moffat, (1) I have noticed that I cannot compile my program on dimefox without the flag “-std=c99” or higher. I was wondering if this is ok. (2) Is it ok to submit multiple files and how would I do this, as I have broken up my program into multiple header files. Answer: the compilation line I’ll be using on dimefox when testing your programs is

gcc -Wall -o ass1 filename.c -lm

The -lm is to get the maths library.

Your program should be written in such a way that it compiles cleanly (no warnings) on dimefox with this command. Note that also means that you should be submitting a single file.

Example Program: As an example of the standard of work that is expected (including the type and degree of commenting), here are:
The 2013 Assignment 1 specification
The skeleton program that was the starting point for the 2013 Assignment 1
A sample solution to the 2013 Assignment 1.
Note that you are not required to use structs in your Assignment 1 submission, but may do so if you wish to.
Blank lines: Sir, what about blank lines? What about if the whole input file is empty? Answer: If you look at the outputs linked below you’ll see that blank lines are presemed to have line numbers, but don’t generate any output. A completely empty input file will generate the Stage 1 output, but no Stage 2 or Stage 3 or Stage 4 output.
Apostophes and etc: Dear Sir, your output has the lines

many miles I've fallen by this time?' she said aloud. 'I must be getting
S2: line = 2, bytes = 72, words = 15

but it seems you are counting “I’ve” as two words rather than one. Shouldn’t that be just one word? Answer: The definition of “word” we are working to makes it two words. (And, in any case, isn’t “I’ve” short for “I have”?)
But Sir, if we count “I’ve” as two words, and one of the query terms is “v”, then won’t that mean there is a match that has to be counted? Answer: Correct, and that is what the specification intends and that is what my program does.

Value of argc: Dear Sir, you say in the handout that argc of zero represents no query. But don’t you mean argc of 1? Answer: Yes, I guess I do… Access Denied on dimefox: Note that you won’t have an account yet on dimefox if you have never logged in to a lab machine so far this semester, for example, if you have been using your own computer for all the workshops, or if you simply haven’t attended any workshops. The symptoms of this will be an “access denied” message when you try and connect with scp/puttyscp or ssh/putty when you want to copy/submit your program. You will need to login to a lab machine, get you account initialized, and then wait a few hours for everything to percolate through the various processes involved with transferring those account details on to dimefox. Then you’ll be in a position to ssh/scp and eventually submit.

If you still have problems after you have taken this step, and are sure you are using the right password (and can log in to other University services using it), send me an email confirming that you have taken all of these steps and still can’t get access to dimefox. Don’t leave this until the last minute. It is a problem that can’t be fixed in a minute!

What Can Be Stored?: Dear Sir, you say “You can only retain five lines and their scores at any given time, plus the current line that is being processed”, do you mean that exactly, or do you really mean “You can only retain five data structures representing five lines (possibly, for example, in original as well as processed/parsed format) and their scores at any given time, plus the current line that is being processed?”. Answer:Yes, good question, and I mean: you can retain at most five instances of the data structure(s) that represent a single line.

Error in Printed Handout: There was a small typo in the printed version of the handout that was ciculated in class on Friday 1 September that has now been corrected in the online version here: the very first example in the handout shows

mac: ./ass1 < alice-eg.txt
S1: No query specified, must provide at least one word
mac: ./ass1 Lat 66 loNg 32 words < alice-eg.txt
S1: query = lat 66 loNg 32 words
S1: loNg: invalid character(s) in query
mac:

and it should be showing

mac: ./ass1 < alice-eg.txt
S1: No query specified, must provide at least one word
mac: ./ass1 lat 66 loNg 32 words < alice-eg.txt
S1: query = lat 66 loNg 32 words
S1: loNg: invalid character(s) in query
mac:

with a lowercase “ell” in the second example commandline marked by the arrow.

Marking Rubric: The marking rubric is linked here. Lines that do not apply to your program will be removed during the marking process; your mark will then be the sum of the lines that remain, positives and negatives. Marks won’t go below zero in each section, and won’t go below zero overall either.

Attribution for Re-Used Code: It is ok to make use of code (for example, insertionsort, and/or getword(), and etc) from the book or from the lecture slides or from other published/public sources, but you should remember to add an attribution as a comment to each relevant function, saying where you got it from, what modifications you added to make it suit your purpose, and so on – exactly as you would when quoting some other author when writing an essay.

Of course, the expectation is that the assembly and “glueing together” of these bits to make a final program will all be your own work, and that the “quoted” bits will be a relatively small fraction of the “new” output you are being asked to generate. So it is not ok to take a whole solution from somewhere else, even if it appears on the web; and it is not ok to solicit or commission a solution by posting the specification to a forum or web site and asking for “assistance” or “guidance” or “suggestions”. Just as it wouldn’t be ok to submit something you found online in response to an assignment that involved writing an essay.

And a reminder of what it says in the specification: we will be using similarity-checking software across all the submissions, and we will be referring cases of suspected academic misconduct for disciplinary hearings run by the School of Engineering, and in the past those hearings have resulted (including multiple times in this subject) in students being awarded penalties including final marks of zero for the subject, regardless of their other components of assessment.

Debugging (more): If a C program encounters a run-time error and exits, there might still be pending output that has not been written. This is a particular problem in the submit environment, because it can look like the program is failing before it generates any output at all. If in doubt, add fflush(stdout) function calls after each of your your debugging printf()’s (or even, add it to the macro), to force all pending output to be written immediately. You’ll then be able to get a much clearer idea of how far the program is getting before it fails.
Debugging: Try putting this at the top of your program:

1
2
3
4
5
6
#define DEBUG 1
#if DEBUG
#define DUMP_DBL(x) printf("line %d: %s = %.5f\n", __LINE__, #x, x)
#else
#define DUMP_DBL(x)
#endif

and then, later in your code, where you have a double variable (say) score, try

1
DUMP_DBL(score);

Then change DEBUG to 0 at the top of the program, compile it again, and then run it again. Get it?

Can then add DUMP_INT and DUMP_STR, and get the extra output turned on whenever you need it to understand what your program is doing. Then turn it all off again with one simple edit.

Trouble with newline characters: Text files that are created on a PC, or copied to a PC, edited and then saved again on the PC, may end up with PC-format two-character (CR+LF) newline sequence, see the Wiki page for details.

If you have compiled your program on a PC, and it receives a CR+LF sequence, then getchar() will consume them both, and hand a single-character ‘\n’ newline to your program. So in that sense, everything works as expected. Likewise, on a PC when you write a ‘\n’ to stdout, a CR+LF pair will be placed in to the output file (or passed through the pipe to the next program in the chain).

The problems arise when you copy your program and a PC-format test file to a Unix system and then try compiling and executing your program there. Now the CR characters get in the way and arrive via getchar() into your program as stand-alone ‘\r’ characters.

The easiest way to defend against these confusions is to write your program so that it looks at every character that it reads, and if it ever sees a CR come through, it throws it away. That way, if you do accidentally get CR characters in your test files on the Unix server (or on your Mac) your program won’t be disrupted by them. Here is a function that you should use to do this:

1
2
3
4
5
6
7
int
mygetchar() {

int c;
while ((c=getchar())=='\r') {
}
return c;
}

Then just call mygetchar() whenever you would ordinarily call getchar(), on both PC and Mac.

Because most of you work on PCs (including in the labs), the test files that are provided have been created with the PC-style CR+LF newlines, and should work correctly when copied (use right-click->”Save as”) to a PC. With mygetchar() they can also be used on a Mac, but won’t interact sensibly using the standard getchar() function.

To be consistent, the final post-submission testing will also be done using PC-style input files but will be executed on a Unix machine, meaning that all submitted programs will need to make use of mygetchar().

You can use the “Preferences->Encodings” menu (“screwdrive/hammer Options->Encodings” in the PC version) in jEdit to select whether to use Unix (LF) or DOS/Windows (CR+LF) encodings in any test files that you create with jEdit. Note that this only applies to newly created files. jEdit will by default respect the formatting in any current files.

Note also that jEdit doesn’t automatically add a newline after the last line of text files, you need to put it there explicitly yourself (just press enter one more time, so that jEdit thinks there is an empty line at the end of the file). Watch out for this problem if you are creating your own test files on Mac or PC. All the test files I supply will have a newline at the end of the last line of the file, including during the post-submission re-testing.

If in any doubt, use od -a <file> in a Unix shell to look at the byte-by-byte contents of a file, and check which format is being used, and whether there is a final newline character (or final CR+LF pair). You can do this on a PC by starting the MinGW shell and then using cd to reach the right directory. There is an od version available within the MinGW shell on the PCs in the labs. On a Mac, Terminal is a Unix shell.

Tabs: The default in jEdit is for tabs to be aligned every 8 character positions. Some of you have altered that to four (Preferences-]Editing-]Tab width), to reflect the layout that the programs in the book have. Then, on submission, the tabs have “appeared” in the output as being 8 again, which can make your program spill past the 80-character RH boundary. When I run the programs for marking, they’ll all get formatted with tabs reflecting 4 character positions, not 8. But don’t use any fewer than 4 in your jEdit (or other editor) settings.

Magic numbers: Here is a summary of the rules about magic numbers:

Where a number is totally self-defining, I’m happy for it to be used any number of times without a hash-define, provided the code is commented each time and/or explicitly sensible variable names are used. For example, in

1
2
/* compute percentage */
pcent = 100.0 * count / totcount;

I wouldn’t expect 100 to have been hash-defined, since the comment explains the role of the 100, and it isn’t going to change, ever, even if other percentages are calculated in the program using 100 too.

This rule also allows 0 and 1, of course, unless they represent something other than the additive and multiplicative identities, in which case they should be hash-defined.

This rule also allows while (scanf("%lf%lf", &x, &y)==2), since the 2 is immediately obvious from the adjacent context (two variables to be read).

Where a constant is one that is a fact that is in no way ever going to be varied, then provided it only appears once in the program and is explained with a comment, then it need not be hash-defined. The example here is the -32.0 in the temperature conversion computation, assuming that it is entirely within a function called Cels2Fahr or etc and that it isn’t used in other places scattered through the program. Anything of this type that appears even twice should be hash-defined.

Where a factual constant is used more than once in a program, even if all occurrences are in a single function, it should be hash-defined.

Where a constant is one that is clearly an artifact of the problem description or the program that implements the solution (for example, numbers like MAXINT, or the number of variables), then they must be hash-defined, even if only used once in the program.