OpenMP代写:CME213 Radix Sort


利用OpenMP对Radix Sort进行并行化处理,然而并不是随便写一个Radix Sort就可以,作业给了一个64位的二进制Radix Sort框架,而且参数非常灵活,导致难度直线上升。


In this programming assignment you will implement Radix Sort, and will learn about OpenMP, an API which simplifies parallel programming on shared memory CPUs. OpenMP is an API which enables simple yet powerful multi-threaded computing on shared memory systems. To link the OpenMP libraries to your C++ code, you simply need to add -fopenmp to your compiler flags. You can then specify the number of threads to run with from within the program, or set environment variables:

setenv OMP_NUM_THREADS 4 (for csh shell)
export OMP_NUM_THREADS=4 (for sh/ksh/bash shell, e.g. on icme-gpu1)

If you find yourself struggling, there are many excellent examples at:
We will cover OpenMP in class. You can learn more about OpenMP at the official website:
Please do not modify the filenames, Makefile or any of the test files. Only files you need to modify are main q1.cpp and main q2.cpp. Since this homework does not use GPUs, you can do all the testing and submission on Corn. Do not forget to set the number of threads before running your program. However, if you would like to get acquainted with the ICME GPU cluster, instructions to run the codes on it are provided in B.
Typing make will make all the files; typing make main_q1 will only make the first problem etc.


In this problem, you will implement Radix Sort in parallel. If you need a refresh on the details of Radix Sort, you should refer to the accompanying Radix Sort Tutorial.
Radix Sort sorts an array of elements in several passes. To do so, it examines, starting from the least significant bit, a group of numBits bits, sorts the elements according to this group of bits, and proceeds to the next group of bits.
More precisely:

  1. Select the number of bits numBits you want to compare per pass.
  2. Fill a histogram with numBuckets = 2numBits buckets, i.e. make a pass over the data and count the number of elements in each bucket.
  3. Reorder the array to take into account the bucket to which an element belongs.
  4. Process the next group of bits and repeat until you have dealt with all the bits of the elements (in our case 32 bits).