@@ -9,14 +9,14 @@ To run your code, you need first to generate your executable. It is very importa
...
@@ -9,14 +9,14 @@ To run your code, you need first to generate your executable. It is very importa
To compile your C OpenMP code using ``gcc``, therefore, use
To compile your C OpenMP code using ``gcc``, therefore, use
```
```
gcc -O2 -openmp -o myprog.x myprog.c -lm
gcc -O2 -fopenmp -o myprog.x myprog.c -lm
```
```
In Fortran, it is recommended to use the Intel compiler
In Fortran, it is recommended to use the Intel compiler
```
```
module load i-compilers
module load i-compilers
ifort -O2 -fopenmp -o myprog.x myprog.f90 -lm
ifort -O2 -qopenmp -o myprog.x myprog.f90 -lm
```
```
To run your code, you will need to have an (e.g., interactive) allocation:
To run your code, you will need to have an (e.g., interactive) allocation:
...
@@ -83,10 +83,10 @@ This implementation performs repeated execution of the benchmarked kernel to mak
...
@@ -83,10 +83,10 @@ This implementation performs repeated execution of the benchmarked kernel to mak
### Tasks and questions to be addressed
### Tasks and questions to be addressed
1) Create a parallel version of the programs using a parallel construct: ``#pragma omp parallel for``. In addition to a parallel construct, you might need some runtime library routines:
1) Create a parallel version of the programs using a parallel construct: ``#pragma omp parallel for``. In addition to a parallel construct, you might need some runtime library routines:
-``int omp_get_num_threads()`` to get the number of threads in a team
-``int omp_get_max_threads()`` to get the maximum number of threads
-``int omp_get_thread_num()`` to get thread ID
-``int omp_get_thread_num()`` to get thread ID
-``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past
-``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past
-``omp_set_num_threads()`` to request a number of threads in a team
-``omp_set_num_threads()`` to set the number of threads to be used
2) Run the parallel code and take the execution time with 1, 2, 4, 12, 24 threads for different array length ``N``. Record the timing.
2) Run the parallel code and take the execution time with 1, 2, 4, 12, 24 threads for different array length ``N``. Record the timing.
3) Produce a plot showing execution time as a function of array length for different number of threads.
3) Produce a plot showing execution time as a function of array length for different number of threads.
4) How large does ``N`` has to be for using 2 threads becoming more beneficial compared to a single thread?
4) How large does ``N`` has to be for using 2 threads becoming more beneficial compared to a single thread?
...
@@ -132,10 +132,10 @@ A simple serial C code to calculate $\pi$ is the following:
...
@@ -132,10 +132,10 @@ A simple serial C code to calculate $\pi$ is the following:
### Tasks and questions to be addressed
### Tasks and questions to be addressed
1) Create a parallel version of the [pi.c](pi.c) / [pi.f90](pi.f90) program using a parallel construct: ``#pragma omp parallel``. Pay close attention to shared versus private variables. In addition to a parallel construct, you might need some runtime library routines:
1) Create a parallel version of the [pi.c](pi.c) / [pi.f90](pi.f90) program using a parallel construct: ``#pragma omp parallel``. Pay close attention to shared versus private variables. In addition to a parallel construct, you might need some runtime library routines:
-``int omp_get_num_threads()`` to get the number of threads in a team
-``int omp_get_max_threads()`` to get the maximum number of threads
-``int omp_get_thread_num()`` to get thread ID
-``int omp_get_thread_num()`` to get thread ID
-``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past
-``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past
-``omp_set_num_threads()`` to request a number of threads in a team
-``omp_set_num_threads()`` to set the number of threads to be used
2) Run the parallel code and take the execution time with 1, 2, 4, 8, 12, 24 threads. Record the timing.
2) Run the parallel code and take the execution time with 1, 2, 4, 8, 12, 24 threads. Record the timing.
3) How does the execution time change varying the number of threads? Is it what you expected? If not, why do you think it is so?
3) How does the execution time change varying the number of threads? Is it what you expected? If not, why do you think it is so?
4) Is there any technique you heard of in class to improve the scalability of the technique? How would you implement it?
4) Is there any technique you heard of in class to improve the scalability of the technique? How would you implement it?