Network Programming代写:EE450 MapReduce Model

代写实现分布式计算中的MapReduce模型,考察Linux下的网络编程能力。

Problem Statement

In this project you will implement a simple model of computational offloading where a single client offloads some computation to a server which in turn distributes the load over 3 backend servers. The server facing the client then collects the results from the backend and communicates the same to the client in the required format. This is an example of how a cloud­computing service such Amazon Web Services might implement MapReduce to speed up a large computation task offloaded by the client.

The server communicating with the client is called AWS (Amazon Web Server) and the three backend servers are named Back­Server A, Back­Server B and Back­Server C. The client and the AWS communicates over a TCP connection while the communication between AWS and the Back­Servers A, B & C is over a UDP connection.

Input Files Used

The files specified below will be used as inputs in your programs in order to dynamically configure the state of the system. The contents of the files should NOT be “hardcoded” in your source code, because during grading, the input files will be different, but the formats of the files will remain the same.

If you are working in an environment other than UNIX, pay particular attention to line endings or newlines . For this project, it is assumed that all files follow the UNIX line ending convention. This is particularly important while handling the input file(s). See the articles here and here for more information.

nums.csv: An ASCII file that contains a single column of integers. Each row consists of a single integer and ends with a newline. You may assume that each integer is within the range of a long signed integer type. The number of rows in the file will be a multiple of 3. This file will always reside in the same directory as the client.

Source Code Files

Your implementation should include the source code files described below, for each component of the system.

  1. AWS: You must name your code file: aws.c or aws.cc or aws.cpp (all small letters). Also you must call the corresponding header file (if you have one; it is not mandatory) a ws.h (all small letters).

  2. Back­Server A, B and C: You must use one of these names for this piece of code: server#.c or server#.cc or server#.cpp (all small letters except for #). Also you must call the corresponding header file (if you have one; it is not mandatory) server#.h (all small letters, except for #). The “#” character must be replaced by the server identifier (i.e. A or B or C), depending on the server it corresponds to.
    Note: In case you are using one executable for all four servers (i.e. if you choose to make a “fork” based implementation), you should call the file servers.c or servers.cc or servers.cpp. Also you must call the corresponding header file (if you have one; it is not mandatory) servers.h (all small letters). In order to create four servers in your system using one executable, you can use the fork() function inside your server’s code to create 4 child processes. You must follow this naming convention! This piece of code basically handles the server functionalities.

  3. Client: The name of this piece of code must be client.c or client.cc or client.cpp (all small letters) and the header file (if you have one; it is not mandatory) must be called client.h (all small letters).

More Detailed Explanations

Phase 1

All four server programs (AWS, Back­Server A, B, & C) boot up in this phase. While booting up, the servers must display a boot message on the terminal. The format of the boot message for each server is given in the onscreen messages tables at the end of the document. As the boot message indicates, each server must listen on the appropriate port for incoming packets/connections.

Once the server programs have booted up, the client program is run. The client displays a boot message as indicated in the onscreen messages table. Note that the client code takes an input argument from the command line, that specifies the computation that is to be run. The format for running the client code is

./client <function_name>

where function_name can take a value from {min, max, sum, sos}. As an example, to find the sum of the all the numbers in the input file, the client should be run as follows:

./client sum

After booting up, the client establishes a TCP connection with AWS. After successfully establishing the connection, the client first sends the function_name to AWS. Once the function_name is sent, the client should print a message in the format given int the table. The client then reads all integers from nums.csv and proceeds to send them to AWS over the same TCP connection. After successfully sending the integers, the client should print the number of integers sent to AWS. This ends Phase 1 and we now proceed to Phase 2.

Phase 2

In Phase 1, you read the numbers from the file and sent them to the AWS server over a TCP connection. Now in phase 2, this AWS server will divide the data into 3 non­overlapping components and send that to the 3 back­servers. If there are N numbers in the file, then the first N/3 numbers must be sent to back­server A, next N/3 to back­server B and the last N/3 numbers to back­server C. TAs will make sure that the number N is divisible by 3. Also the function to be performed needs to be communicated to the back­servers.

The communication between the AWS server and the back­servers happen over UDP. The AWS server will send the function_name along with the actual numbers. Note that the function_name can be MIN, MAX, SUM or SOS (sum of squares). The port numbers for back­servers A, B and C are specified in table 2. Since all the servers will run on the same machine in our project, all have the same IP address (the IP address of localhost is usually 127.0.0.1).

Once a back­server receives the actual numbers (a total of N/3 numbers) and the function to be performed, it computes the function value. Let this value for server i as X(i). This step is also called as map in MapReduce. If the numbers received the back­server i are n(1), n(2), then the Map operations it performs are as follows.

Phase 3

At the end of Phase 2, all backend­servers have their answers ready. Let’s call the value calculated by backend­server i as X(i). This is to be sent to the AWS server using UDP. The final answer needs to be calculated by the Frontend­server (AWS) in the reduce step and then handed over to the user.

The frontend­server (server D) looks at the type of reduction operation and calculates the final answer which we call X f inal based on the answers it receives from the back­servers A, B and C. This step is also called as reduce in MapReduce. Now depending on the operation requested by the user we have.

Example Output

Backend­Server A Terminal:
The Server A is up and running using UDP on port 21319.
The Server A has received 30 numbers
The Server A has successfully finished the reduction SUM: 1000
The Server A has successfully finished sending the reduction value to AWS server.

Backend­Server B Terminal:
The Server B is up and running using UDP on port 22319.
The Server B has received 30 numbers
The Server B has successfully finished the reduction SUM: 1001
The Server B has successfully finished sending the reduction value to AWS server.

Backend­Server C Terminal:
The Server C is up and running using UDP on port 23319.
The Server C has received 30 numbers
The Server C has successfully finished the reduction SUM: 1002
The Server C has successfully finished sending the reduction value to AWS server.

AWS Terminal:
The AWS is up and running.
The AWS has received 90 numbers from the client using TCP over port 25319
The AWS has sent 30 numbers to Backend-Server A
The AWS has sent 30 numbers to Backend-Server B
The AWS has sent 30 numbers to Backend-Server C
The AWS received reduction result of SUM from Backend-Server A using UDP
over port 24319 and it is 1000

The AWS received reduction result of SUM from Backend-Server B using UDP
over port 24319 and it is 1001
The AWS received reduction result of SUM from Backend-Server C using UDP
over port 24319 and it is 1002
The AWS has successfully finished the reduction SUM: 3003
The AWS has successfully finished sending the reduction value to client.

Client Terminal:
The client is up and running.
The client has sent the reduction type SUM to AWS.
The client has sent 90 numbers to AWS
The client has received reduction SUM: 3003

Assumptions

  1. It is recommended to start the processes in this order: backend­server (A), backend­server (B), backend­server (C), AWS (D), Client.

  2. If you need to have more code files than the ones that are mentioned here, please use meaningful names and all small letters and mention them all in your README file.

  3. You are allowed to use blocks of code from Beej’s socket programming tutorial (Beej’s guide to network programming) in your project. However, you need to mark the copied part in your code.

  4. When you run your code, if you get the message “port already in use” or “address already in use”, please first check to see if you have a zombie process (from past logins or previous runs of code that are still not terminated and hold the port busy). If you do not have such zombie processes or if you still get this message after terminating all zombie processes, try changing the static UDP or TCP port number corresponding to this error message (all port numbers below 1024 are reserved and must not be used). If you have to change the port number, please do mention it in your README file. If you have zombie processes you can kill them using unix commands: kill or killall.

Requirements

  1. Do not hardcode the TCP or UDP port numbers that are to be obtained dynamically. Refer to Table 1 to see which ports are statically defined and which ones are dynamically assigned. Use getsockname() function to retrieve the locally­bound port number wherever ports are assigned dynamically as shown.

  2. Use gethostbyname() to obtain the IP address or the local host however the host name must be hardcoded as nunki.usc.edu or localhost in all pieces of code.

  3. You can either terminate all processes after completion of phase 3 or assume that the user will terminate them at the end by pressing ctrl-c.

  4. All the naming conventions and the on­screen messages must conform to the previously mentioned rules.

  5. You are not allowed to pass any parameter or value or string or character as a command­line argument except while running the client in Phase 1.

  6. All the on­screen messages must conform exactly to the project description. You should not add anymore on­screen messages. If you need to do so for the debugging purposes, you must comment out all of the extra messages before you submit your project.

  7. Using fork() or similar system calls are not mandatory if you do not feel comfortable using them to create concurrent processes.

  8. Please do remember to close the socket and tear down the connection once you are done using that socket.