diff --git a/intro_lab/README.md b/intro_lab/README.md index 55c3332ecd032c1b8de0f6940f7ed3aa01998e88..47053af2891af480c41369547e42fedb20b2a2ac 100644 --- a/intro_lab/README.md +++ b/intro_lab/README.md @@ -9,14 +9,14 @@ To run your code, you need first to generate your executable. It is very importa To compile your C OpenMP code using ``gcc``, therefore, use ``` -gcc -O2 -openmp -o myprog.x myprog.c -lm +gcc -O2 -fopenmp -o myprog.x myprog.c -lm ``` In Fortran, it is recommended to use the Intel compiler ``` module load i-compilers -ifort -O2 -fopenmp -o myprog.x myprog.f90 -lm +ifort -O2 -qopenmp -o myprog.x myprog.f90 -lm ``` To run your code, you will need to have an (e.g., interactive) allocation: @@ -83,10 +83,10 @@ This implementation performs repeated execution of the benchmarked kernel to mak ### Tasks and questions to be addressed 1) Create a parallel version of the programs using a parallel construct: ``#pragma omp parallel for``. In addition to a parallel construct, you might need some runtime library routines: - - ``int omp_get_num_threads()`` to get the number of threads in a team + - ``int omp_get_max_threads()`` to get the maximum number of threads - ``int omp_get_thread_num()`` to get thread ID - ``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past - - ``omp_set_num_threads()`` to request a number of threads in a team + - ``omp_set_num_threads()`` to set the number of threads to be used 2) Run the parallel code and take the execution time with 1, 2, 4, 12, 24 threads for different array length ``N``. Record the timing. 3) Produce a plot showing execution time as a function of array length for different number of threads. 4) How large does ``N`` has to be for using 2 threads becoming more beneficial compared to a single thread? @@ -132,10 +132,10 @@ A simple serial C code to calculate $\pi$ is the following: ### Tasks and questions to be addressed 1) Create a parallel version of the [pi.c](pi.c) / [pi.f90](pi.f90) program using a parallel construct: ``#pragma omp parallel``. Pay close attention to shared versus private variables. In addition to a parallel construct, you might need some runtime library routines: - - ``int omp_get_num_threads()`` to get the number of threads in a team + - ``int omp_get_max_threads()`` to get the maximum number of threads - ``int omp_get_thread_num()`` to get thread ID - ``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past - - ``omp_set_num_threads()`` to request a number of threads in a team + - ``omp_set_num_threads()`` to set the number of threads to be used 2) Run the parallel code and take the execution time with 1, 2, 4, 8, 12, 24 threads. Record the timing. 3) How does the execution time change varying the number of threads? Is it what you expected? If not, why do you think it is so? 4) Is there any technique you heard of in class to improve the scalability of the technique? How would you implement it? @@ -180,4 +180,4 @@ Here we are going to implement a fourth parallel version of the [pi.c](pi.c) / [ Hints: -- To change the schedule, you can either change the environment variable with ``export OMP_SCHEDULE=type`` where ``type`` can be any of static, dynamic, guided or in the source code as ``omp parallel for schedule(type)``. \ No newline at end of file +- To change the schedule, you can either change the environment variable with ``export OMP_SCHEDULE=type`` where ``type`` can be any of static, dynamic, guided or in the source code as ``omp parallel for schedule(type)``. diff --git a/intro_lab/pi.f90 b/intro_lab/pi.f90 index 127a5fe6c9b2703f9cbb2dc624f6a08e90b5ff57..0c8dcb2b636c1b533fbc8edb7942a5204ee1a6e0 100644 --- a/intro_lab/pi.f90 +++ b/intro_lab/pi.f90 @@ -27,6 +27,7 @@ enddo pi = pi * 4.0D0 * dx run_time = OMP_GET_WTIME() - start_time ref_pi = 4.0D0 * atan(1.0D0) -print '("pi with ", i0, " steps is ", f16.10, " in ", f12.6, " seconds (error=", e12.6, ")")', NSTEPS, pi, run_time, abs(ref_pi - pi) +print '("pi with ", i0, " steps is ", f16.10, " in ", f12.6, " seconds (error=", e12.6, ")")', & + NSTEPS, pi, run_time, abs(ref_pi - pi) end program diff --git a/intro_lab/stream-triad.c b/intro_lab/stream-triad.c index 9ee42ac9c73b6a68ea69f2b919202c6798fae045..1a77a89a92508808a4ef1c9672f77c4ee764f13a 100644 --- a/intro_lab/stream-triad.c +++ b/intro_lab/stream-triad.c @@ -14,18 +14,18 @@ int main() { int i, j; double s; - /* Initialise b, c and s */ + /* Initialise b, c and s */ s = 0.1; for (i = 0; i < N; i++) { b[i] = (double) i; c[i] = (double) i; } - /* Run benchmark loop M times */ - for (j = 0; j < M; j++) { + /* Run benchmark loop M times */ + for (j = 0; j < M; j++) { for (i = 0; i < N; i++) a[i] = b[i] + s * c[i]; } return 0; -} \ No newline at end of file +}