diff --git a/README.md b/README.md index 133ba9cb6e381003e107b6553e4043e12b1d60e3..974006861c4bb14f425c0078f6acad4f6424d45d 100644 --- a/README.md +++ b/README.md @@ -22,20 +22,25 @@ The Cray automatically loads several [modules](https://www.pdc.kth.se/support/do - Heimdal - [Kerberos commands](https://www.pdc.kth.se/support/documents/login/login.html#general-information-about-kerberos) - OpenAFS - [AFS commands](https://www.pdc.kth.se/support/documents/data_management/afs.html) - SLURM - [batch jobs](https://www.pdc.kth.se/support/documents/run_jobs/queueing_jobs.html) and [interactive jobs](https://www.pdc.kth.se/support/documents/run_jobs/run_interactively.html) - +- Software development - [Programming environments and compilers](https://www.pdc.kth.se/support/documents/software_development/development.html) # Running OpenMP programs on Beskow -First it is necessary to book a node for interactive use: +After having compiled your code with the +[correct compilers flags for OpenMP](https://www.pdc.kth.se/support/documents/software_development/development.html), +it is necessary to book a node for interactive use: ``` salloc -A <allocation-name> -N 1 -t 1:0:0 ``` +You might also need to specify a **reservation** by adding the flag +``--reservation=<name-of-reservation>``. + An environment variable specifying the number of threads should also be set: ``` -export OMP_NUM_THREADS=<number-of-threads> +export OMP_NUM_THREADS=32 ``` Then the srun command is used to launch an OpenMP application: diff --git a/advanced_lab/README.md b/advanced_lab/README.md index 303086cd660272fde7ca0b279a5506e0d15f06fa..1be5e1a5dca839bcb1ad3cf4bc9ad44bb7f672de 100644 --- a/advanced_lab/README.md +++ b/advanced_lab/README.md @@ -56,7 +56,8 @@ instructions](https://www.pdc.kth.se/support/documents/courses/summerschool.html ### 1. Parallelize the code. -Start with the file ``shwater2d.(c/f90)``, add OpenMP statements to make it run in +Start with the file [shwater2d.c](c/shwater2d.c) or +[shwater2d.f](f90/shwater2d.f90), add OpenMP statements to make it run in parallel and make sure the computed solution is correct. Some advice are given below @@ -113,7 +114,7 @@ result. For debugging purposes you might want to visualize the computed solution. Uncomment the line ``save_vtk``. The result will be stored in ``result.vtk``, which can -be opened in ParaView, available on the lab computers (and also on Tegner) after +be opened in ParaView, available on Tegner after ``module add paraview``. Beware that the resulting file could be rather large, unless the space discretization (M,N) is decreased. diff --git a/advanced_lab/c/Makefile b/advanced_lab/c/Makefile index e08914905d1f1b6f334c9ab36255d4f701a1ef6e..e0e17dbead2edc6861be47873243e3c9cea4d8c0 100644 --- a/advanced_lab/c/Makefile +++ b/advanced_lab/c/Makefile @@ -1,8 +1,8 @@ CC = cc ifeq ($(CRAY_PRGENVCRAY), loaded) -CFLAGS = -O2 -homp -else ifeq ($(CRAY_PRGENVINTEL), loaded) CFLAGS = -O2 -openmp +else ifeq ($(CRAY_PRGENVINTEL), loaded) +CFLAGS = -O2 -openmp -D_Float128=__float128 else ifeq ($(CRAY_PRGENVGNU), loaded) CFLAGS = -O2 -fopenmp else diff --git a/advanced_lab/f90/Makefile b/advanced_lab/f90/Makefile index c7d291edff2a45ad890909ac983c74656d582039..f9024308bfa829be311b1073410230ff418b5aec 100644 --- a/advanced_lab/f90/Makefile +++ b/advanced_lab/f90/Makefile @@ -1,6 +1,6 @@ FC = ftn ifeq ($(CRAY_PRGENVCRAY), loaded) -FFLAGS = -O2 -homp +FFLAGS = -O2 -openmp else ifeq ($(CRAY_PRGENVINTEL), loaded) FFLAGS = -O2 -openmp else ifeq ($(CRAY_PRGENVGNU), loaded) diff --git a/intro_lab/README.md b/intro_lab/README.md index be5b483a9661e73ec4707afdf63590360212d528..48582505dfb4f0434092e10e5987bcf3f523cbfa 100644 --- a/intro_lab/README.md +++ b/intro_lab/README.md @@ -39,7 +39,7 @@ ftn -fpp -O2 -openmp -lm name_source.f90 -o name_exec To run your code on Beskow, you will need to have an interactive allocation: ``` -salloc -N 1 -t 4:00:00 -A edu18.summer --reservation=summer-2018-08-15 +salloc -N 1 -t 4:00:00 -A <name-of-allocation> --reservation=<name-of-reservation> ``` To set the number of threads, you need to set the OpenMP environment variable: @@ -98,7 +98,8 @@ Questions: _Concepts: Parallel, default data environment, runtime library calls_ -Here we are going to implement a first parallel version of the pi.c / pi.f90 +Here we are going to implement a first parallel version of the +[pi.c](pi.c) / [pi.f90](pi.f90) code to calculate the value of π using the parallel construct. The figure below shows the numerical technique, we are going to use to calculate π. @@ -133,7 +134,8 @@ A simple serial C code to calculate π is the following: pi *= 4.0 * dx; ``` -Instructions: Create a parallel version of the pi.c / pi.f90 program using a +Instructions: Create a parallel version of the +[pi.c](pi.c) / [pi.f90](pi.f90) program using a parallel construct: ``#pragma omp parallel``. Run the parallel code and take the execution time with 1, 2, 4, 8, 16, 32 threads. Record the timing. @@ -154,9 +156,9 @@ Hints: Questions: -- How does the execution time change varying the number of threads? Is what you - expected? If not, why you think it is so? -- Is there any technique you heard in class to improve the scalability of the +- How does the execution time change varying the number of threads? Is it what you + expected? If not, why do you think it is so? +- Is there any technique you heard of in class to improve the scalability of the technique? How would you implement it? ## Exercise 3 - Calculate π using critical and atomic directives @@ -164,10 +166,11 @@ Questions: _Concepts: parallel region, synchronization, critical, atomic_ Here we are going to implement a second and a third parallel version of the -pi.c / pi.f90 code to calculate the value of π using the critical and atomic -directives. +[pi.c](pi.c) / [pi.f90](pi.f90) code to calculate the value of π +using the critical and atomic directives. -Instructions: Create two new parallel versions of the pi.c / pi.f90 program +Instructions: Create two new parallel versions of the +[pi.c](pi.c) / [pi.f90](pi.f90) program using the parallel construct ``#pragma omp parallel`` and 1) ``#pragma omp critical`` 2) ``#pragma omp atomic``. Run the two new parallel codes and take the execution time with 1, 2, 4, 8, 16, 32 threads. Record the timing in a table. @@ -191,10 +194,12 @@ Questions: _Concepts: worksharing, parallel loop, schedule, reduction_ -Here we are going to implement a fourth parallel version of the pi.c / pi.f90 +Here we are going to implement a fourth parallel version of the +[pi.c](pi.c) / [pi.f90](pi.f90) code to calculate the value of π using ``omp for`` and ``reduction`` operations. -Instructions: Create a new parallel versions of the pi.c / pi.f90 program using +Instructions: Create a new parallel versions of the +[pi.c](pi.c) / [pi.f90](pi.f90) program using the parallel construct ``#pragma omp for`` and ``reduction`` operation. Run the new parallel code and take the execution time for 1, 2, 4, 8, 16, 32 threads. Record the timing in a table. Change the schedule to dynamic and guided and measure