C代写:CMPSC355 Explore A Big File

代写多进程I/O程序,完成对超大文件的快速读取。

Assignment Information

This program must be written in ANSI C, compiled with the following options: -Wall -Wextra-std=c99, and display no errors and no warnings. Each program shall consist of one C file. No header files are required. Do not submit executable files (only C source files).

The program shall count and display the number of lines (the number of ‘n’ characters) in a large text file. The program shall take one or two command line parameters: the name of the file (required) and the number of subprocesses, N(optional). If the number of subprocesses is not specied, it shall be assumed to be 1. If the second parameter is not a positive integer number, the program shall display an error message and terminate.

Each subprocess shall be responsible for counting lines in the part of the file assigned to it by the master process.

The master process shall start by getting the file size via stat() or fstat()(you can use either function) and calculating the file fragment size FS as file size divided by N. The first subprocess shall count lines in the first FS bytes of the file, the second subprocess, in the secondFS bytes, etc. The master process shall then:

  1. Create 2N pipes(half of them going from the master to the subprocesses and the other half going in the opposite direction),
  2. fork() N subprocesses,
  3. Wait for all of them to report the number of lines in their fragment, and
  4. Notify each reporting subprocess of the successful read from their pipe.

You can optimize steps 3 and 4 by using select(), if you want.

Once all subprocesses report their results, the master process shall display the total number of lines on the stdout and the total running time on stderr. (Use the f*() functions for reporting, gettimeofday() to measure the total running time.)

Each subprocess shall find its portion of the large file with lseek(). It shall then:

  1. Allocate enough memoryto read the file fragment,
  2. Do the actual read(),
  3. Count the number of ‘n’ characters (use a for loop),
  4. Deallocate the memory,
  5. Report the count to the master process via the pipe (donot convert the integer number to ASCII!),
  6. Wait for a notication from the master via the other pipe, and
  7. Terminate.

Use the following fragment of a major social network to test your program (you must unzip the file before using it). Let me know ASAP if you cannot access the file. The file has 133 MB of data and 9,105,518 lines.

Run your program with the values of N=1,2,4,8,16,32. (The number of reported lines must be the same in all 6 cases!) Plot the total running time against N and submit the plot together with the program code.