Skip to content
Snippets Groups Projects
Select Git revision
  • 2022 default
  • 2021
  • master protected
  • 2021
4 results

README.md

Blame
  • README.md 5.19 KiB

    OpenMP Advanced project

    About this exercise

    The aim of this exercise is to give hands-on experience in parallelizing a larger program, measure parallel performance and gain experience in what to expect from modern multi-core architectures.

    In the exercise you will use a dual hexadeca-core shared memory Intel Xeon E5-2698v3 Haswell node. There will be several nodes available on the Cray for interactive use during the lab and each group will have access to a node of their own. Running the program should therefore give you realistic timings and speedup characteristics.

    Your task is to parallelize a finite-volume solver for the two dimensional shallow water equations. Measure speedup and if you have time, tune the code. You don’t need to understand the numerics in order to solve this exercise (a short description is given in Appendix A). However, it assumes some prior experience with OpenMP, please refer to the lecture on shared memory programming if necessary.

    Algorithm

    For this exercise we solve the shallow water equations on a square domain using a simple dimensional splitting approach. Updating volumes Q with numerical fluxes F and G, first in the x and then in the y direction, more easily expressed with the following pseudo-code

    for each time step do
        Apply boundary conditions
        for each Q do
            Calculate fluxes F in the x-direction
            Update volume Q with fluxes F
        end
        for each Q do
            Calculate fluxes G in the y-direction
            Update volumes Q with fluxes G
        end
    end

    In order to obtain good parallel speedup with OpenMP, each sub-task assigned to a thread needs to be rather large. Since the nested loops contains a lot of numerical calculations the solver is a perfect candidate for OpenMP parallelization. But as you will see in this exercise, it’s fairly difficult to obtain optimal speedup on today’s multi-core computers. However, it should be fairly easy to obtain some speedup without too much effort. The difficult task is to make a good use of all the available cores.

    Choose to work with either the given serial C/Fortran 90 code or, if you think you have time, write your own implementation (but don’t waste time and energy). Compile the code by typing make and execute the program shwater2d with srun as described in the general instructions.

    1. Parallelize the code.

    Start with the file shwater2d.c or shwater2d.f, add OpenMP statements to make it run in parallel and make sure the computed solution is correct. Some advice are given below

    • How should the work be distributed among threads
    • Don’t parallelize everything
    • What’s the difference between
    !$omp parallel do
        do i=1,n
        ...
    !$omp end parallel do
    !$omp parallel do
        do j=1,m
        ...
    !$omp end parallel do

    and

    !$omp parallel
        !$omp do
            do i=1,n
            ...
        !$omp end do
        !$omp do
            do j=1,m
            ...
        !$omp end do
    !$omp end parallel

    Hint: How are threads created/destroyed by OpenMP? How can it impact performance?

    2. Measure parallel performance.

    In this exercise, parallel performance refers to the computational speedup Sn = T1/Tn, using n threads. Measure run time T for 1, 2, ..., 16 threads and calculate speedup. Is it linear? If not, why? Finally, is the obtained speedup acceptable? Also, try to increase the space discretization (M,N) and see if it affects the speedup.

    Recall from the OpenMP exercises that the number of threads is determined by an environment variable OMP_NUM_THREADS. One could change the variable or use the shell script provided in Appendix B.

    3. Optimize the code.

    The given serial code is not optimal, why? If you have time, go ahead and try to make it faster. Try to decrease the serial run time. Once the serial performance is optimal, redo the speedup measurements and comment on the result.

    For debugging purposes you might want to visualize the computed solution. Uncomment the line save_vtk. The result will be stored in result.vtk, which can be opened in ParaView, available on Tegner after module add paraview. Beware that the resulting file could be rather large, unless the space discretization (M,N) is decreased.

    A. About the Finite-Volume solver

    In this exercise we solve the shallow water equations in two dimensions given by

    Eq_1

    where h is the depth and (u,v) are the velocity vectors. To solve the equations we use a dimensional splitting approach, i.e. reducing the two dimensional problem to a sequence of one-dimensional problems

    Eq_2

    For this exercise we use the Lax-Friedrich’s scheme, with numerical fluxes F, G defined as

    Eq_3