@@ -8,7 +8,7 @@ Your task is to parallelize a finite-volume solver for the two dimensional shall
...
@@ -8,7 +8,7 @@ Your task is to parallelize a finite-volume solver for the two dimensional shall
## Algorithm
## Algorithm
For this exercise we solve the shallow water equations on a square domain using a simple dimensional splitting approach. Updating volumes *Q* with numerical fluxes *F* and *G*, first in the x and then in the y direction, more easily expressed with the following pseudo-code
For this exercise we solve the shallow water equations on a square domain using a simple dimensional splitting approach. Updating volumes *Q* with numerical fluxes *F* and *G*, first in the x and then in the y direction, more easily expressed with the following pseudo-code:
```
```
for each time step do
for each time step do
...
@@ -30,8 +30,7 @@ Choose to work with either the given serial C/Fortran 90 code or, if you think y
...
@@ -30,8 +30,7 @@ Choose to work with either the given serial C/Fortran 90 code or, if you think y
## 1. Parallelize the code
## 1. Parallelize the code
A serial version of the code is provided here: [shwater2d.c](c/shwater2d.c) or [shwater2d.f](f90/shwater2d.f90). Remember not to try parallelising everything.
A serial version of the code is provided here: [shwater2d.c](c/shwater2d.c) or [shwater2d.f](f90/shwater2d.f90). Remember not to try parallelising everything. Add OpenMP statements to make it run in parallel and make sure the computed solution is correct. Some advices are provided below.
add OpenMP statements to make it run in parallel and make sure the computed solution is correct.Do not parallelize everything Some advices are provided below.
### Tasks and questions to be addressed
### Tasks and questions to be addressed
...
@@ -67,23 +66,22 @@ and
...
@@ -67,23 +66,22 @@ and
_Hint: How are threads created/destroyed by OpenMP? How can it impact performance?_
_Hint: How are threads created/destroyed by OpenMP? How can it impact performance?_
## 2. Measure parallel performance.
## 2. Measure parallel performance
In this exercise, we explore parallel performance refers to the computational speed-up *S*<sub>n</sub>_ = $\Delta$*T*<sub>1</sub>/$\Delta$*T*<sub>n</sub>_, using _n_ threads.
In this exercise, we explore parallel performance refers to the computational speed-up *S*<sub>*n*</sub> = ($\Delta$*T*<sub>1</sub>/$\Delta$*T*<sub>*n*</sub>), where *n* is the number of threads.
### Tasks and questions to be addressed
### Tasks and questions to be addressed
1) Measure run time $\Delta$T for 1, 2, ..., 24 threads and calculate the speed-up.
1) Measure run time $\Delta$*T*<sub>*n*</sub> for *n* = 1, 2, ..., 24 threads and calculate the speed-up.
2) Is it linear? If not, why?
2) Is it linear? If not, why?
3) Finally, is the obtained speed-up acceptable?
3) Finally, is the obtained speed-up acceptable?
4) Try to increase the space discretization (M,N) and see if it affects the speed-up.
4) Try to increase the space discretization (M,N) and see if it affects the speed-up.
Recall from the OpenMP exercises that the number of threads is determined by an environment variable ``OMP_NUM_THREADS``. One could change the variable or use the shell script provided in Appendix B.
Recall from the OpenMP exercises that the number of threads is determined by an environment variable ``OMP_NUM_THREADS``. One could change the variable or use the shell script provided in Appendix B.
### 3. Optimize the code.
## 3. Optimize the code
The given serial code is not optimal, why? If you have time, go ahead and try to make it faster. Try to decrease the serial run time. Once the serial
The given serial code is not optimal, why? If you have time, go ahead and try to make it faster. Try to decrease the serial run time. Once the serial performance is optimal, redo the speedup measurements and comment on the result.
performance is optimal, redo the speedup measurements and comment on the result.
For debugging purposes you might want to visualize the computed solution. Uncomment the line ``save_vtk``. The result will be stored in ``result.vtk``, which can be opened in ParaView, available on Tegner after ``module add paraview``. Beware that the resulting file could be rather large, unless the space discretization (M,N) is decreased.
For debugging purposes you might want to visualize the computed solution. Uncomment the line ``save_vtk``. The result will be stored in ``result.vtk``, which can be opened in ParaView, available on Tegner after ``module add paraview``. Beware that the resulting file could be rather large, unless the space discretization (M,N) is decreased.
...
@@ -105,10 +103,23 @@ where *f* and *g* are the flux functions, derived from (1). For simplicity we us
...
@@ -105,10 +103,23 @@ where *f* and *g* are the flux functions, derived from (1). For simplicity we us
<imgsrc="image/eq_4.png"alt="Eq_4"width="800px"/>
<imgsrc="image/eq_4.png"alt="Eq_4"width="800px"/>
## B. Run script for changing ``OMP_NUM_THREADS``
## B. Run scripts for changing ``OMP_NUM_THREADS``