Instrument the function “func” of program prof.c to gather the node profile, the edge profile and the path profile of the function. Treat the “for” loops in the function “func” as nodes, i.e., you don’t need to go in the for loops to generate paths. The “func” function has a number of arguments. The corresponding main function provides two possible input sets for those arguments. You should do the profiling for both input sets. You need to submit 3 versions for the program, each version measuring one type of profile. Draw the CFG of the “func” function and include it in the report.
This problem should be completed on a machine installed with PAPI. Both PAPI are freely available and can be installed on Linux computers. If you don’t have access to such a machine, you may use cpeg655.ece.udel.edu.
The PAPI Hardware Counter Library: The PAPI library has been installed under the directory /usr. The library binaries are in /usr/lib, and the library header files are in /usr/include.
When you need to compile your program with PAPI, you can use the command line:
gcc -I/ usr/include your_program_file -L/usr/lib -lpapi“.
In this problem you are required to using PAPI hardware counter library to measure the memory hierarchy performance of all the paths in the function “func”. In other words, you should report the L1, L2 and TLB cache misses for each path in the function “func”. You don’t need to measure or report the counters for other parts of the program. Furthermore, you should optimize/de-optimize the two programs for memory hierarchy and again use PAPI to verify their memory hierarchy performance.
(1) Use the PAPI library to measure the L1 cache miss, the L2 cache miss and the TLB miss of the function “func” of the two programs. The “main” function of the two programs has already set up the initialization of the PAPI library. You only need to provide the event names in the line that is labeled with “Please add your event here.” Submit your code and measurements in your report. You should measure for both input sets.
(2) Transform the func function so that for one input set of your choice, the most frequently executed path in the “func” function can achieve minimum and maximum of L2 cache miss. The “func” is basically a sequence of memory accesses. You can change the order of the memory accesses, but you cannot add or remove memory accesses to the sequence. You may also change the data struct declaration for the transformation. Note that you are required to only work on the most frequently executed path, implying that you may sacrifice other paths to achieve the goal. The most important grading criteria is WHY you do what your do. Submit your code and measurements, together with your explanation of the transformations, i.e., why they work. (Hint: Optimize/de-optimize by avoiding/creating cache conflicts.)