## Introduction

Your task is to write a MapReduce program in Java to calculate the maximum of the weights of all outgoing edges for each node in the graph.
You should have already loaded two graph files into HDFS. Each file stores a list of edges as tab-separated-values.
Each line represents a single edge consisting of three columns: (source node ID, target node ID, edge weight), each of which is separated by a tab (\t). Node IDs are positive integers, and weights are also positive integers. Edges are ordered randomly.

``````src tgt weight
15  127 2
15  134 3
15  599 3
511 330 51
511 694 79
230 15  11
``````

## Task 2: Analyzing a Large Graph with Spark/Scala

Your task is to cascade the edge weights in graph1.tsv and graph2.tsv to node weights, and finally determine the accumulated node weights using Spark, in Scala. Assume that 80% of the edge weight comes from the source node and 20% from the target node. When loading the edges, parse the edge weights using the t oInt method and before cascading, filter out (ignore) all edges whose edge weights equal 1. That is, only consider edges whose edge weights do not equal 1.
Consider the following example:
Input:

``````src tgt weight
1   2   40
2   3   100
1   3   60
3   4   1
``````

Output:

``````1 80.0 = 0.8*40 + 0.8*60
2 88.0 = 0.2*40 + 0.8*100
3 32.0 = 0.2*100 + 0.2*60
``````

## Task 3: Analyzing Large Amount of Data with Pig on AWS

For each unique bigram, compute its average number of appearances per book. For the above example, the results will be:

``````I am (342 + 211) / (90 + 10) = 5.53
very cool (500 + 3210 + 9994) / (10 + 1000 + 3020) = 3.40049628
``````

Output the 10 bigrams having the highest average number of appearances per book along with their corresponding averages, in t abseparated
format, sorted in descending order. If multiple bigrams have the same average, o rder them alphabetically. For the example above, the output will be:

``````I am 5.53
very cool 3.40049628
``````

You will solve this problem by writing a PIG script on Amazon EC2 and save the output.

## Task 4: Explore and Analyze data with Pandas

``````Number_of_unique_movies