updated intro_lab

5c835a4d · Xin Li · 9fc743f7 · 5c835a4d · 5c835a4d · 5c835a4d
Commit 5c835a4d authored 3 years ago by Xin Li
--- a/intro_lab/README.md
+++ b/intro_lab/README.md
@@ -9,14 +9,14 @@ To run your code, you need first to generate your executable. It is very importa
 To compile your C OpenMP code using ``gcc``, therefore, use
 ```
-gcc -O2 -openmp -o myprog.x myprog.c -lm
+gcc -O2 -fopenmp -o myprog.x myprog.c -lm
 ```
 In Fortran, it is recommended to use the Intel compiler
 ```
 module load i-compilers
-ifort -O2 -fopenmp -o myprog.x myprog.f90 -lm
+ifort -O2 -qopenmp -o myprog.x myprog.f90 -lm
 ```
 To run your code, you will need to have an (e.g., interactive) allocation:
@@ -83,10 +83,10 @@ This implementation performs repeated execution of the benchmarked kernel to mak
 ### Tasks and questions to be addressed
 1) Create a parallel version of the programs using a parallel construct: ``#pragma omp parallel for``. In addition to a parallel construct, you might need some runtime library routines:
-   - ``int omp_get_num_threads()`` to get the number of threads in a team
+   - ``int omp_get_max_threads()`` to get the maximum number of threads
   - ``int omp_get_thread_num()`` to get thread ID
   - ``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past
-   - ``omp_set_num_threads()`` to request a number of threads in a team
+   - ``omp_set_num_threads()`` to set the number of threads to be used
 2) Run the parallel code and take the execution time with 1, 2, 4, 12, 24 threads for different array length ``N``. Record the timing.
 3) Produce a plot showing execution time as a function of array length for different number of threads.
 4) How large does ``N`` has to be for using 2 threads becoming more beneficial compared to a single thread?
@@ -132,10 +132,10 @@ A simple serial C code to calculate $\pi$ is the following:
 ### Tasks and questions to be addressed
 1) Create a parallel version of the [pi.c](pi.c) / [pi.f90](pi.f90) program using a parallel construct: ``#pragma omp parallel``.  Pay close attention to shared versus private variables. In addition to a parallel construct, you might need some runtime library routines:
-   - ``int omp_get_num_threads()`` to get the number of threads in a team
+   - ``int omp_get_max_threads()`` to get the maximum number of threads
   - ``int omp_get_thread_num()`` to get thread ID
   - ``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past
-   - ``omp_set_num_threads()`` to request a number of threads in a team
+   - ``omp_set_num_threads()`` to set the number of threads to be used
 2) Run the parallel code and take the execution time with 1, 2, 4, 8, 12, 24 threads. Record the timing.
 3) How does the execution time change varying the number of threads? Is it what you expected? If not, why do you think it is so?
 4) Is there any technique you heard of in class to improve the scalability of the technique? How would you implement it?

--- a/intro_lab/pi.f90
+++ b/intro_lab/pi.f90
@@ -27,6 +27,7 @@ enddo
 pi = pi * 4.0D0 * dx
 run_time = OMP_GET_WTIME() - start_time
 ref_pi = 4.0D0 * atan(1.0D0)
-print '("pi with ", i0, " steps is ", f16.10, " in ", f12.6, " seconds (error=", e12.6, ")")', NSTEPS, pi, run_time, abs(ref_pi - pi)
+print '("pi with ", i0, " steps is ", f16.10, " in ", f12.6, " seconds (error=", e12.6, ")")', &
+    NSTEPS, pi, run_time, abs(ref_pi - pi)
 end program
--- a/intro_lab/stream-triad.c
+++ b/intro_lab/stream-triad.c