C代写:CMPS101 DFS and SCC

代写Graph相关的算法作业,需要编写DFS(Depth-first search)和SCC(strongly connected component)两个算法。

Reading

Read the text, sections 7.1 - 7.4 for explanations of what the dfs skeleton and dfsTrace functions are are doing. This has been covered in lectures and will be covered further in sections.

Depth-first search (DFS) is a major topic for this course.

Read the text, section 7.5 for explanations of how the the SCC algorithm is based on DFS. The SCC algorithm will be covered in lectures before the assignment is due, and further in sections.

Depth-first search (DFS) and the strongly connected component algorithm (SCC) are major topics for this course.

Several online Unix tutorials are mentioned on the class web page.

Information Sources

Same as ho04.txt and ho07.txt, the 1st and 2nd program assignments.

There are usually quite a few questions and clarifications about programs on the “C101” mailing list, so keep up. Consider the source when evaluating a message, because there is no screening of the postings.

Program Requirements

You will implement Depth First Search as explained in the text, Ch. 7. This will be the DFS skeleton for directed graphs, with some added code to carry accomplish the functions described. The procedure will correspond closely to DFS Trace in the text.

The added work after DFS Trace works is to find the strongly connected components (SCCs) for unweighted graphs.

dfsTrace1.c

Implement dfsTrace (Algorithm 7.4, which is a refinement of Algorithm 7.3) on a directed graph represented as an array of adjacency IntVecs. (Note that graph02.c and the associated files in pa02, your second program assignment, read a directed graph from input, create the array of IntVecs, and print the graph, so you can simply copy this code from your WORKING pa02. If pa02 was working correctly, this allows you to concentrate on DFS and SCC parts of the program.)

intVec.h should be commented as found in the class locker. It is almost the same as in pa02, but has one new constructor.

Although intList.h and intList.c are not actually used in this assignment, they should be programmed and working, as assigned in pa02. They will be part of the grading for pa03.

In this assignment, your intVec.o will be tested with other students’ programs and their intVec.o will be tested with yours. So use the standard names in this class. A starter intVec.h is in the course locker, but you need to copy it and put in your comments, as indicated by the questions in the file. The starter for pa02 is there too, named intVec.h-pa02, so you can see what changed with “diff”.

Reuse the main c file from pa02

Right from pa02, graph02.c can be renamed to scc03.c and compiled into scc03, for pa03; it contains main() and probably other functions. Several other C and .h files can be used right from pa02.

Of course, for pa03, the “main” procedure will be changed to do different things after the graph is loaded. Also, the loadGraph functionality will be separated, as assigned in pa02.

High-level flow

main() should read and interpret the command-line parameters, which are different from in pa02, then open the input file and interact with loadGraph.c functions to input the graph, as in pa02. main() should call a function to print the graph, as in pa02.

To compute SCCs, main calls a function with a name like findSCCs() that is also in scc03.c to manage the SCC computation. findSCCs() should call the main subroutines dfsTrace1(), transposeGraph(), dfsPhase2(), as well as calling the printing functions. In particular, dfsTrace1(), transposeGraph(), dfsPhase2() will not print anything. Their job is to compute data structures.

findSCCs() in scc03.c will allocate the arrays filled by its subroutines. Because there are two runs of dfsTrace now, name the arrays filled by dfsTrace1(): discoverTime1, finishTime1, parent1, and finishStk1, or recognizable abbreviations, such as dtime1, etc.

Also findSCCs() allocates adjVerticesT for the transpose graph. Finally, it will allocate the arrays filled by the second dfs,
which works on the transpose graph and finds the SCCs: discoverTime2, finishTime2, parent2, dfstRoot2 (or abbreviations).
Other procedures will output the contents of the arrays after each dfsTrace has completed.

Allocating all the arrays in scc03.c makes it easier to pass them as arguments to functions in other files, without making any global arrays. However, if you already have working code that allocates and returns an array from a different file (such as loadGraph.c) it is okay to use and build upon that code.

A more sophisticated way to pass several run-time parameters is that main() or findSCCs() creates a struct with fields for all of them, and then a pointer to that struct is passed around. For this technique you should declare the struct AS A TYPE in scc03.h. Then scc03.c and loadGraph.c and maybe others #include scc03.h.

Write dfsTrace1.h to declare the function prototypes in dfsTrace1.c that are called from a different C file. State the pre- and post-conditions in dfsTrace1.h.

In pa03 dfsTrace1.c should be a separate file and you should have it working before copying it into dfsPhase2.c as a starter.
It is better to copy working code than debug both versions.

Aside from intVec.h, intVec.c, scc03.c and scc03, the names of files, functions and arrays should be understandable but do NOT need to be strictly the same as this handout.

Command-line arguments

If your program receives no command-line arguments, it should print a usage message and exit. If your program receives incorrect command-line arguments, such as an unknown flag or no file name, it also should print a usage message and exit with a positive exit code, such as 1 or 2 or 3.

Command-line arguments of the form “-something” are often called “flags”, and provide the means for varying the way the program runs, or varying the form of the output. (“-“ by itself denotes stdin.)

The final command-line argument is the name of the input file, and this name will not begin with a “-“ unless it is “-“ by itself.

As in pa02, the “-V” flag denotes that IntVec should be used for lists and stacks.

The new command-line flag “-u” instructs the program to build an undirected graph from the input. That is, if the input contains “3 5” on a line, enter 5 as an adjacency of 3 and enter 3 as an adjacency of 5. Do not worry about duplicate edges.

DFS Trace will expect the graph not to be weighted. Therefore weights in the input should be parsed if they are present, but they do not become part of the data structure for the graph.

In pa03 IntList will not be used (although intList.h and intList.c are still required to be submitted and working). Therefore if a flag “-L” is present on the command line, that is an error, and should be treated as specified in the first paragraph of this subsection..

The flags -u and -V may be in either order before the file name. The -V is required for compatibility; -u is optional.

When running the program on a large input file remember to use > to redirect the stdout to a file. Do not submit very large test files or their outputs. We will supply some large files.

Example command lines for doing SCCs:

scc03
(user wants usage message.)

scc03 -V -
(user wants type in the file, directed.)

scc03 -V test1.in
(test1.in is treated as directed.)

scc03 -u -V test2.in
(test2.in is treated as undirected.)

Input format

This is the same as pa01 and pa02, repeated for self-containment.

Input consists of a sequence of lines read from a file that was given as the LAST command-line argument. The string “-“ as a filename stands for standard input, as usual. We will use stdin to denote either case.

End-of-file signals the end of input, and is typed on the keyboard as cntl-D in Unix (maybe cntl-Z in DOS, Windows, etc.).

Lines will have the format expected by graph01.c with or without without weights.

One int on the first line to tell us the value of “n”. Two ints per line for each edge after that, or two ints and a double per line if the graph is weighted. For pa03, weights are parsed but ignored, even if they are present in the input.

Print an informative error message if the input contains a bad vertex number, outside 1,…,n, or has the wrong number of words on a line. This is for your own protection. It is not a grading issue and submitting tests to check bad input is not asked and will not count as testing credit.

Example 1 input, unweighted

6
1 4
5 4
1 3
2 3
3 3
5 6
6 5
4 3
1 2

Remember that graphs want to start on index 1, not index 0. So allocate 1 extra space in the array and start loading at 1.

Example 2, with weights:

6
1 4 2.7
5 4 -3
1 3 0.0

Undirected input

Whether the input is treated as undirected or directed depends on a command-line argument “-u”, as described before. If the graph is considered undirected, create an edge in each direction from one edge in the input file. In the above example, when “-u” is specified, “1 4 2.7” should result in edges (1,4) and (4,1). If “-w” was also specified, both edges have weight 2.7.

Do not worry if edges are not unique. The algorithms will work anyway.

OUTPUT

Print information on stdout. You should know how to structure your unix command line to redirect stdout to a file or another process, so your program is not concerned about these details.

For a preliminary version, print the input graph as in pa01, ignoring edge weights even if “-w” was specified.
Example 2 would look something like this:

1 [3, 4]
2 []
3 []
4 []
5 [4]
6 []

Using “[]” instead of “null” for an empty list is preferred. Details like punctuation may vary but the numbers should be in the order shown.

It would be quite complicated to try to show that structure of the DFS forest graphically, so for phase 1, write a procedure that just prints a table of 5 columns for vertex, color, dtime, ftime, and parent. In that order (spacing is flexible). Do not use numbers for colors! Include column titles (exact spelling not required) and print an empty line after the table.

Use “-1” for the parent if the vertex is the root of a DFS tree.

This first table should present the arrays filled by dfsTrace1. Also print the arrays filled by dfsTrace1 to show whether the graph is being visited correctly.
Example 2 after dfsTrace1 is finished would look something like this:

V color dTime fTime parent
1 B 1 6 -1
2 B 7 8 -1
3 B 2 3 1
4 B 4 5 1
5 B 9 10 -1
6 B 11 12 -1

Color should be one of W,G,B and other table entries are ints.

The reason for making this a procedure is so you can call it at any time (in gdb, most likely, or in debugging code that you remove later).

If you call this print function BEFORE dfsTrace1 is finished, every vertex with color W should show 0 for the last 3 columns, and every vertex with color G should show 0 for fTime. This might be helpful for debugging and for midterm studying, but should not occur in the submitted program.

After the table for dfsTrace1, print finishStk1 ON A SINGLE LINE with the bottom element to the left, top to the right.
E.g.,

FSTK: 3 4 1 2 5 6

There should not be any other numbers on this line. The title and spacing are flexible, but don’t make the mistake of using “FSTK1”.

Print the transpose graph in the same format as the original graph. The same print function should work for both (the title can be printed before calling the function or passed as a parameter).

The last part of the output shows the result of dfsTrace2. This shows values for the tranpose graph. Write a separate procedure for this.

Example 2 after dfsTrace2 is finished would look something like this:

V color2 dTime2 fTime2 parent2 dfstRoot2
1 B 7 8 -1 1
2 B 5 6 -1 2
3 B 2 3 -1 3
4 B 9 10 -1 4
5 B 3 4 -1 5
6 B 1 2 -1 6

This is a pretty boring example, and just shows the format, not an interesting test. Every dfs tree has one vertex and no edges. Every SCC has one vertex and is its own dfsRoot.

How to preceed

Make a new directory to work in. Do this first.

If you wrote a weighted edge ADT for pa01, save it for last. (It was not assigned in pa01.)

You need to use the standard names given in this section so your ADTs can interface with other students’ clients and vice versa.

The function names in the intList ADT must be as given in the intList.h starter file for pa01 and pa02. The function names in the intVec ADT must be as given in the intVec.h starter file for pa02.

These names must be spelled and capitalized as shown in that file.

For this program, loadGraph.c should be a separate file, although that was not required in pa01. loadGraph.h will provide the interface. Functions in loadGraph.c that are only called within loadGraph.c should not appear in loadGraph.h.

loadGraph is not required to be an ADT, so the functions and prototypes do not need to agree with other students. Similarly, dfsTrace1 and dfsTrace2 are not ADTs.

dfsTrace1 needs to build a finishing-time stack, call it finishStk1. Starting with an empty stack at the beginning of dfsTrace1(), simply push v (the vertex whose visit is finishing) on to the stack at its finishing time, as explained in Section 7.5. finishStk1 should be implemented as an IntVec with the new constructor intMakeEmptyVecN(). What should the parameter be for this constructor so you never need to do a realloc on it?

NEW FUNCTION: transposeGraph()

Write a function to transpose a graph, after it is built. Make your function as simple as you can to be sure it is correct. There is a fast way to do this, but if you do not see it, think of a simpler, possibly slower, way.

For C the function prototype is

1
IntVec* transposeGraph(IntVec* origGraph, int n);

It may be implemented in scc03.c; a separate file is unnecessary.

The transpose graph has edges in the opposite direction of the original graph, in one-to-one correspondence. Suppose the original graph prints like this

1 [ 3, 2 ]
2 [ 3, 4 ]
3 [ ]
4 [ 1 ]

The transpose graph should have edges (2,1), (3,1), (4,2), (3,2), (1,4), and print accordingly. Most likely it would appear as

1 [ 4 ]
2 [ 1 ]
3 [ 2, 1 ]
4 [ 1 ]

but other orders are possible. See the text for more details. if needed.

NEW C FILE: dfsPhase2.c

dfsPhase2() in dfsPhase2.c will implement phase 2 of the SCC algorithm, and also will compute discoverTime2, finishTime2 for insurance, although the SCC algorithm does not need them.

After you get dfsTrace1.c working, you should be able to use almost all the code here, with some name modifications.

Keep in mind that dfsPhase2() uses finishStk1 for dfsSweepT() and operates on the transpose graph. The recursive function is named something like dfsT() or dfsTrace2(). Read the text for details.

An additional array, dfstRoot2, will be filled to keep track of the separate DFS trees in a second dfs. Some function prototypes will be altered to pass this value down from dfsTsweep() through dfsTrace2() calls.

The purpose of dfstRoot2 is to tell every vertex who is its SCC leader. dfsSweepT() passes the number of the root into the recursive function dfsT. (Where does dfsSweepT get this number from?) Then dfsT(v, …) puts this number into dfstRoot[v] and passes this number down into its own recursive calls of dfsT().

Write dfsTrace1.h and dfsPhase2.h to declare the function prototypes in the respective C files, and state the pre- and post-conditions.