CUDA代写:CS475 CUDA Monte Carlo

使用CUDA实现Monte Carlo模拟算法。

CUDA

Node

The flip machines do not have GPU cards in them, so CUDA will not run there. If your own system has a GPU, you can use that. You can also use the DGX machine, but please be good about sharing it.

Introduction

Monte Carlo simulation is used to determine the range of outcomes for a series of parameters, each of which has a probability distribution showing how likely each option is to happen. In this project, you will take a scenario and develop a Monte Carlo simulation of it, determining how likely a particular output is to happen.

The Scenario

A laser is pointed at a circle (circle, in this case). The circle is defined by a center point (xc,yc) and a radius (r). The beam comes out at a 30 angle. It bounces off the circle. Underneath, even with the laser origin, is an infinite plate. Given all of this, does the beam hit the plate?

Normally this would be a pretty straightforward geometric calculation, but the circle is randomly changing location and size. So now, the laser beam might hit the plate or it might not, depending on the values of (xc, yc, r ). OK, since it is not certain, what is the probability that it hits the plate? This is a job for GPU Monte Carlo simulation!

Because of the variability, the beam could miss the circle entirely (A). The circle might totally engulf the laser pointer (B). It could bounce off the circle and miss the plate entirely (C). Or, it could bounce off the circle and actually hit the plate (D).

So, now the question is “What is the probability that the beam hits the plate?”.

In My Opinion, Here Is How To Make Your Life Way, Way, Way Easier

IMHO, use Linux for this project. The compilation is orders of magnitude simpler, and you can try this out on OSU’s new DGX system, which will produce dazzling performances.

Also, before you use the DGX, do your development on the rabbit system (Slide #3 of the DGX noteset). It is a little friendlier because you don’t have to run your program through a batch submission. But, don’t take any final performance numbers from rabbit, just get your program running there.

But, if you decide to use Visual Studio on your own machine, you must first install the CUDA Toolkit!

If you are trying to run CUDA on your own Visual Studio system, make sure your machine has the CUDA toolkit installed. It is available here: https://developer.nvidia.com/cuda-downloads

Requirements

Variable Range
xc 0 0 - 2.0
yc 0 0 - 2.0
r 0.5 - 2.0
  1. The ranges are above.
    Note: these are not the same numbers as we used before!
  2. Run this for four BLOCKSIZEs (i.e., the number of threads per block) of 16, 32, 64, and 128, combined with NUMTRIALS sizes of 16K, 32K, 64K, 128K, 256K, 512K, and 1M.
  3. Be sure the NUMTRIALS are in multiples of 1024, that is, for example, use 32,768, not 32,000.
  4. Record timing for each combination. For performance, use some appropriate units like MegaTrials/Second or GigaTrials/Second.
  5. For this one, use CUDA timing, not OpenMP timing.
  6. Do a table and two graphs:
    1. Performance vs. NUMTRIALS with multiple curves of BLOCKSIZE
    2. Performance vs. BLOCKSIZE with multiple curves of NUMTRIALS
  7. Like before, fill the Xcs, Ycs, and Rs arrays ahead of time. Send them to the GPU where they can be used as look-up tables.
  8. A template of what the code could look like can be found in the montecarloTemplate.cu file.
  9. You will also need six .h files:
    • helper_functions.h
    • helper_cuda.h
    • helper_image.h
    • helper_string.h
    • helper_timer.h
    • exception.h
  10. Your commentary PDF should:
    1. Tell what machine you ran this on
    2. Show the table and the two graphs
    3. What patterns are you seeing in the performance curves?
    4. Why do you think the patterns look this way?
    5. Why is a BLOCKSIZE of 16 so much worse than the others?
    6. How do these performance results compare with what you got in Project #1? Why?
    7. What does this mean for the proper use of GPU parallel computing?

Grading

Feature Points
Monte Carlo performance table 20
Graph of performance vs. NUMTRIALS with multiple curves of BLOCKSIZE 25
Graph of performance vs. BLOCKSIZE with multiple curves of NUMTRIALS 25
Commentary 30
Potential Total 100