diff --git a/README.md b/README.md
index 23150f90499086993060e9fc0c44ddf7ed0be9d3..b720e814f0bc8448c3951cffd234468b020834fd 100644
--- a/README.md
+++ b/README.md
@@ -1,64 +1,47 @@
-**Instructions and hints on how to run for the OpenMP lab**
+# PDC Summer School: General Instructions for the OpenMP Labs
 
-# Where to run
+## Where to run
 
-The exercises will be run on PDC's CRAY XC-40 system [Beskow](https://www.pdc.kth.se/hpc-services/computing-systems):
+The exercises will be run on PDC's cluster [Tegner](https://www.pdc.kth.se/hpc-services/computing-systems/tegner-1.737437):
 
 ```
-beskow.pdc.kth.se
+tegner.pdc.kth.se
 ```
 
-# How to login
+## How to login
 
-To access PDC's cluster you should use your laptop and the Eduroam or KTH Open wireless networks.
+To access PDC's systems you need an account at PDC. Check the [instructions for obtaining an account](https://www.pdc.kth.se/support/documents/getting_access/get_access.html#apply-via-pdc-webpage).
 
-[Instructions on how to connect from various operating systems](https://www.pdc.kth.se/support/documents/login/login.html).
+Once you have an account, you can follow the [instructions on how to connect from various operating systems](https://www.pdc.kth.se/support/documents/login/login.html).
 
+Related to the Kerberos-based authentication environment, please check the [Kerberos commands documentation](https://www.pdc.kth.se/support/documents/login/login.html#general-information-about-kerberos)
 
-# More about the environment on Beskow
+## More about the environment on Tegner
 
-The Cray automatically loads several [modules](https://www.pdc.kth.se/support/documents/run_jobs/job_scheduling.html#accessing-software) at login.
+Software, which is not available by default, needs to be loaded as a [module](https://www.pdc.kth.se/support/documents/run_jobs/job_scheduling.html#accessing-software) at login. Use ``module avail`` to get a list of available modules. The following modules are of interest for this lab exercises:
 
-- Heimdal - [Kerberos commands](https://www.pdc.kth.se/support/documents/login/login.html#general-information-about-kerberos)
-- OpenAFS - [AFS commands](https://www.pdc.kth.se/support/documents/data_management/afs.html)
-- SLURM -  [batch jobs](https://www.pdc.kth.se/support/documents/run_jobs/queueing_jobs.html) and [interactive jobs](https://www.pdc.kth.se/support/documents/run_jobs/run_interactively.html)
-- Programming environment - [Compilers for software development](https://www.pdc.kth.se/support/documents/software_development/development.html)
+- Different versions of the GNU compiler suite (``gcc/*``)
+- Different versions of the Intel compiler suite (``i-compilers/*``)
 
-# Compiling MPI programs on Beskow
+For more information see the [software development documentation page](https://www.pdc.kth.se/support/documents/software_development/development.html).
 
-By default the cray compiler is loaded into your environment. In order to use another compiler you have to swap compiler modules:
+Home directories are provided through an OpenAFS services. See the [AFS data management page](https://www.pdc.kth.se/support/documents/data_management/afs.html) for more information.
 
-```
-module swap PrgEnv-cray PrgEnv-gnu
-```
-or
-```
-module swap PrgEnv-cray PrgEnv-intel
-```
+To use the Tegner compute nodes you have to submit [SLURM batch jobs](https://www.pdc.kth.se/support/documents/run_jobs/queueing_jobs.html) or run [SLURM interactive jobs](https://www.pdc.kth.se/support/documents/run_jobs/run_interactively.html).
 
-On Beskow one should always use the *compiler wrappers* `cc`, `CC` or 
-`ftn` (for C, C++ and Fortran codes, respectively), 
-which will automatically link to MPI libraries and linear 
-algebra libraries like BLAS, LAPACK, etc.
+## Compiling programs
 
-Examples:
+By default you are provided with the compilers that come with the OS and are not the most recent versions of the compilers. To use a recent version of the GNU compiler suite or the Intel compilers use
 
 ```
-# Intel
-ftn -openmp source.f90
-cc -openmp source.c
-CC -openmp source.cpp
-# Cray
-ftn -openmp source.f90
-cc -openmp source.c
-CC -openmp source.cpp
-# GNU
-ftn -fopenmp source.f90
-cc -fopenmp source.c
-CC -fopenmp source.cpp 
+module load gcc
+```
+or
+```
+module load i-compilers
 ```
 
-# Running OpenMP programs on Beskow
+## Running OpenMP programs
 
 After having compiled your code with the 
 [correct compilers flags for OpenMP](https://www.pdc.kth.se/support/documents/software_development/development.html), 
@@ -68,13 +51,12 @@ it is necessary to book a node for interactive use:
 salloc -A <allocation-name> -N 1 -t 1:0:0
 ```
 
-You might also need to specify a **reservation** by adding the flag 
-``--reservation=<name-of-reservation>``.
+You might also need to specify a **reservation** by adding the flag ``--reservation=<name-of-reservation>``.
 
 An environment variable specifying the number of threads should also be set:
 
 ```
-export OMP_NUM_THREADS=32
+export OMP_NUM_THREADS=24
 ```
 
 Then the srun command is used to launch an OpenMP application:
@@ -83,23 +65,13 @@ Then the srun command is used to launch an OpenMP application:
 srun -n 1 ./example.x
 ```
 
-In this example we will start one task with 32 threads (there are 32 cores per node on the Beskow nodes).
-
-It is important to use the `srun` command since otherwise the job will run on the Beskow login node.
+In this example we will start one task with 24 threads.
 
-# OpenMP Exercises
+It is important to use the `srun` command since otherwise the job will run on the login node.
 
-The aim of these exercises is to give an introduction to OpenMP programming. 
-All examples are available in both C and Fortran90.
+## OpenMP Exercises
 
-- OpenMP Intro lab: 
-  - [Instructions](intro_lab/README.md)
-  - Simple hello world program [in C](intro_lab/hello.c) and [in Fortran](intro_lab/hello.f90)
-  - Calculate &pi; [in C](intro_lab/pi.c) and [in Fortran](intro_lab/pi.f90)
-  - Solutions will be made available later during the lab
-- OpenMP Advanced Lab: 
-  - [Instructions](advanced_lab/README.md)
-  - In C: [shwater2d.c](advanced_lab/c/shwater2d.c), [vtk_export.c](advanced_lab/c/vtk_export.c) and [Makefile](advanced_lab/c/Makefile)
-  - In Fortran: [shwater2d.f90](advanced_lab/f90/shwater2d.f90), [vtk_export.f90](advanced_lab/f90/vtk_export.f90) and [Makefile](advanced_lab/f90/Makefile)
-  - Solutions will be made available later during the lab
+The aim of these exercises is to give an introduction to OpenMP programming. All examples are available in both C and Fortran90.
 
+- [OpenMP Intro lab](intro_lab/README.md)
+- [OpenMP Advanced Lab](advanced_lab/README.md)
diff --git a/advanced_lab/README.md b/advanced_lab/README.md
index 1be5e1a5dca839bcb1ad3cf4bc9ad44bb7f672de..8ce624e287f138fe1fefaf7e9e3c2d287bf8a12e 100644
--- a/advanced_lab/README.md
+++ b/advanced_lab/README.md
@@ -1,30 +1,14 @@
-# OpenMP Advanced project
+# PDC Summer School: OpenMP Advanced Project
 
 ## About this exercise
 
-The aim of this exercise is to give hands-on experience in parallelizing a
-larger program, measure parallel performance and gain experience in what to
-expect from modern multi-core architectures.
+The aim of this exercise is to give hands-on experience in parallelizing a larger program, measure parallel performance and gain experience in what to expect from modern multi-core architectures.
 
-In the exercise you will use a dual hexadeca-core shared memory Intel Xeon
-E5-2698v3 Haswell node. There will be several nodes available on the Cray for
-interactive use during the lab and each group will have access to a node of
-their own. Running the program should therefore give you realistic timings and
-speedup characteristics.
-
-Your task is to parallelize a finite-volume solver for the two dimensional
-shallow water equations. Measure speedup and if you have time, tune the code.
-You don’t need to understand the numerics in order to solve this exercise (a
-short description is given in Appendix A). However, it assumes some prior
-experience with OpenMP, please refer to the lecture on shared memory
-programming if necessary.
+Your task is to parallelize a finite-volume solver for the two dimensional shallow water equations. Measure speed-up and if you have time, tune the code.  You do not need to understand the numerics in order to solve this exercise (a short description is given in Appendix A). However, it assumes some prior experience with OpenMP, please refer to the lecture on shared memory programming if necessary.
 
 ## Algorithm
 
-For this exercise we solve the shallow water equations on a square domain using
-a simple dimensional splitting approach. Updating volumes Q with numerical
-fluxes F and G, first in the x and then in the y direction, more easily
-expressed with the following pseudo-code
+For this exercise we solve the shallow water equations on a square domain using a simple dimensional splitting approach. Updating volumes *Q* with numerical fluxes *F* and *G*, first in the x and then in the y direction, more easily expressed with the following pseudo-code
 
 ```
 for each time step do
@@ -40,30 +24,20 @@ for each time step do
 end
 ```
 
-In order to obtain good parallel speedup with OpenMP, each sub-task assigned to
-a thread needs to be rather large. Since the nested loops contains a lot of
-numerical calculations the solver is a perfect candidate for OpenMP
-parallelization. But as you will see in this exercise, it’s fairly difficult to
-obtain optimal speedup on today’s multi-core computers. However, it
-should be fairly easy to obtain some speedup without too much effort. The
-difficult task is to make a good use of all the available cores.
+In order to obtain good parallel speed-up with OpenMP, each sub-task assigned to a thread needs to be rather large. Since the nested loops contains a lot of numerical calculations the solver is a perfect candidate for OpenMP parallelization. But as you will see in this exercise, it is fairly difficult to obtain optimal speed-up on today’s multi-core computers. However, it should be fairly easy to obtain some speed-up without too much effort. The difficult task is to make a good use of all the available cores.
+
+Choose to work with either the given serial C/Fortran 90 code or, if you think you have time, write your own implementation (but do not waste time and energy).  Compile the code by typing make and execute the program ``shwater2d`` with ``srun`` as described in the general documentation.
 
-Choose to work with either the given serial C/Fortran 90 code or, if you think
-you have time, write your own implementation (but don’t waste time and energy).
-Compile the code by typing make and execute the program ``shwater2d`` with ``srun`` as
-described in the [general
-instructions](https://www.pdc.kth.se/support/documents/courses/summerschool.html).
+## 1. Parallelize the code
 
-### 1. Parallelize the code. 
+A serial version of the code is provided here: [shwater2d.c](c/shwater2d.c) or [shwater2d.f](f90/shwater2d.f90). Remember not to try parallelising everything.
+ add OpenMP statements to make it run in parallel and make sure the computed solution is correct.Do not parallelize everything Some advices are provided below.
 
-Start with the file [shwater2d.c](c/shwater2d.c) or 
-[shwater2d.f](f90/shwater2d.f90), add OpenMP statements to make it run in
-parallel and make sure the computed solution is correct. Some advice are given
-below
+### Tasks and questions to be addressed
 
-- How should the work be distributed among threads 
-- Don’t parallelize everything
-- What’s the difference between
+1) How should the work be distributed among threads?
+2) Add OpenMP statements to make the code in parallel without affecting the correctness of the code.
+3) What is the difference between
 
 ```
 !$omp parallel do
@@ -93,59 +67,49 @@ and
 
 _Hint: How are threads created/destroyed by OpenMP? How can it impact performance?_
 
-### 2. Measure parallel performance.
+## 2. Measure parallel performance.
+
+In this exercise, we explore parallel performance refers to the computational speed-up *S*<sub>n</sub>_ = $\Delta$*T*<sub>1</sub>/$\Delta$*T*<sub>n</sub>_, using _n_ threads.
 
-In this exercise, parallel performance refers to the computational speedup _S<sub>n</sub>_ =
-_T_<sub>1</sub>/_T<sub>n</sub>_, using _n_ threads. Measure run time T for 1, 2, ..., 16 threads and
-calculate speedup. Is it linear? If not, why? Finally, is the obtained speedup
-acceptable? Also, try to increase the space discretization (M,N) and see if it
-affects the speedup.
+### Tasks and questions to be addressed
 
-Recall from the OpenMP exercises that the number of threads is determined by an
-environment variable ``OMP_NUM_THREADS``. One could change the variable or use
-the shell script provided in Appendix B.
+1) Measure run time $\Delta$T for 1, 2, ..., 24 threads and calculate the speed-up.
+2) Is it linear? If not, why?
+3) Finally, is the obtained speed-up acceptable?
+4) Try to increase the space discretization (M,N) and see if it affects the speed-up.
+
+Recall from the OpenMP exercises that the number of threads is determined by an environment variable ``OMP_NUM_THREADS``. One could change the variable or use the shell script provided in Appendix B.
 
 ### 3. Optimize the code.
 
-The given serial code is not optimal, why? If you have time, go ahead and try
-to make it faster. Try to decrease the serial run time. Once the serial
-performance is optimal, redo the speedup measurements and comment on the
-result.
+The given serial code is not optimal, why? If you have time, go ahead and try to make it faster. Try to decrease the serial run time. Once the serial
+performance is optimal, redo the speedup measurements and comment on the result.
 
-For debugging purposes you might want to visualize the computed solution.
-Uncomment the line ``save_vtk``. The result will be stored in ``result.vtk``, which can
-be opened in ParaView, available on Tegner after 
-``module add paraview``. Beware that the resulting file could be rather large,
-unless the space discretization (M,N) is decreased.
+For debugging purposes you might want to visualize the computed solution.  Uncomment the line ``save_vtk``. The result will be stored in ``result.vtk``, which can be opened in ParaView, available on Tegner after ``module add paraview``. Beware that the resulting file could be rather large, unless the space discretization (M,N) is decreased.
 
-### A. About the Finite-Volume solver
+## A. About the Finite-Volume solver
 
-In this exercise we solve the shallow water equations in two dimensions given
-by 
+In this exercise we solve the shallow water equations in two dimensions given by 
 
 <img src="image/eq_1.png" alt="Eq_1" width="800px"/>
 
-where _h_ is the depth and (_u_,_v_) are the velocity vectors. To solve the equations
-we use a dimensional splitting approach, i.e. reducing the two dimensional problem
-to a sequence of one-dimensional problems
+where _h_ is the depth and (_u_,_v_) are the velocity vectors. To solve the equations we use a dimensional splitting approach, i.e. reducing the two dimensional problem to a sequence of one-dimensional problems
 
 <img src="image/eq_2.png" alt="Eq_2" width="800px"/>
 
-For this exercise we use the Lax-Friedrich’s scheme, with numerical fluxes _F_, _G_
-defined as
+For this exercise we use the Lax-Friedrich’s scheme, with numerical fluxes *F*, *G* defined as
 
 <img src="image/eq_3.png" alt="Eq_3" width="800px"/>
 
-where _f_ and _g_ are the flux functions, derived from (1). For simplicity we use
-reflective boundary conditions, thus at the boundary
+where *f* and *g* are the flux functions, derived from (1). For simplicity we use reflective boundary conditions, thus at the boundary
 
 <img src="image/eq_4.png" alt="Eq_4" width="800px"/>
 
-### B. Run script for changing OMP_NUM_THREADS
+## B. Run script for changing ``OMP_NUM_THREADS``
 
 ```
 #!/bin/csh
 foreach n (`seq 1 1 16`)
     env OMP_NUM_THREADS=$n srun -n 1 ./a.out
 end
-```
+```
\ No newline at end of file
diff --git a/advanced_lab/ompproj.pdf b/advanced_lab/ompproj.pdf
deleted file mode 100644
index 3f47aa86813a4895690c4afa84286725a353a4ba..0000000000000000000000000000000000000000
Binary files a/advanced_lab/ompproj.pdf and /dev/null differ
diff --git a/intro_lab/README.md b/intro_lab/README.md
index 48582505dfb4f0434092e10e5987bcf3f523cbfa..55c3332ecd032c1b8de0f6940f7ed3aa01998e88 100644
--- a/intro_lab/README.md
+++ b/intro_lab/README.md
@@ -1,42 +1,25 @@
-# OpenMP Lab Assignment
+# PDC Summer School: OpenMP Lab Assignment
 
 ## Overview
 
-The goal of these exercises is to familiarize you with OpenMP environment and
-make our first parallel codes with OpenMP. We will also record the code
-performance and understand race condition and false sharing. This laboratory
-contains four exercises, each with step-by-step instructions below.
+The goal of these exercises is to familiarize you with OpenMP environment and make our first parallel codes with OpenMP. We will also record the code performance and understand race condition and false sharing. This laboratory contains four exercises, each with step-by-step instructions below.
 
-For your experiments, you are going to use a node of the
-[Beskow](https://www.pdc.kth.se/hpc-services/computing-systems/beskow-1.737436)
-supercomputer.  To run your code on Beskow, you need first to generate your
-executable. It is very important that you include a compiler flag telling the
-compiler that you are going to use OpenMP. If you forget the flag, the compiler
-will happily ignore all the OpenMP directives and create an executable that
-runs in serial. Different compilers have different flags. When using Cray
-compilers, the OpenMP flag is ``-openmp``.
+To run your code, you need first to generate your executable. It is very important that you include a compiler flag telling the compiler that you are going to use OpenMP. If you forget the flag, the compiler will happily ignore all the OpenMP directives and create an executable that runs in serial. Different compilers often have different flags, but often they follow the convention of the GNU compilers and accept the OpenMP flag ``-fopenmp``.
 
-To compile your C OpenMP code using the default Cray compilers:
+To compile your C OpenMP code using ``gcc``, therefore, use
 
 ```
-cc -O2 -openmp -lm name_source.c -o name_exec
-```
-
-Alternatively, compile your C OpenMP code using GNU compilers:
-
-```
-module swap PrgEnv-cray PrgEnv-gnu
-cc -O2 -fopenmp -lm name_source.c -o name_exec
+gcc -O2 -openmp -o myprog.x myprog.c -lm
 ```
 
 In Fortran, it is recommended to use the Intel compiler
 
 ```
-module swap PrgEnv-cray PrgEnv-intel
-ftn -fpp -O2 -openmp -lm name_source.f90 -o name_exec
+module load i-compilers
+ifort -O2 -fopenmp -o myprog.x myprog.f90 -lm
 ```
 
-To run your code on Beskow, you will need to have an interactive allocation:
+To run your code, you will need to have an (e.g., interactive) allocation:
 
 ```
 salloc -N 1 -t 4:00:00 -A <name-of-allocation> --reservation=<name-of-reservation>
@@ -48,7 +31,7 @@ To set the number of threads, you need to set the OpenMP environment variable:
 export OMP_NUM_THREADS=<number-of-threads>
 ```
 
-To run an OpenMP code on a computing node of Beskow:
+To run an OpenMP code on a computing node:
 
 ```
 srun -n 1 ./name_exec
@@ -58,10 +41,7 @@ srun -n 1 ./name_exec
 
 _Concepts: Parallel regions, parallel, thread ID_
 
-Here we are going to implement the first OpenMP program. Expected knowledge
-includes basic understanding of OpenMP environment, how to compile an OpenMP
-program, how to set the number of OpenMP threads and retrieve the thread ID
-number at runtime.
+Here we are going to implement the first OpenMP program. Expected knowledge includes basic understanding of OpenMP environment, how to compile an OpenMP program, how to set the number of OpenMP threads and retrieve the thread ID number at runtime.
 
 Your code using 4 threads should behave similarly to:
 
@@ -80,29 +60,44 @@ Hello World from Thread 2
 Hello World from Thread 1
 ```
 
-Instructions: Write a C/Fortran code to make each OpenMP thread print "``Hello
-World from Thread X!``" with ``X`` = thread ID.
+### Tasks and questions to be addressed
+
+1) Write a C/Fortran code to make each OpenMP thread print "``Hello World from Thread X!``" with ``X`` = thread ID.
+2) How do you change the number of threads?
+3) How many different ways are there to change the number of threads? Which one are those?
+4) How can you make the output ordered from thread 0 to thread 4?
 
 Hints:
 
 - Remember to include OpenMP library
 - Retrieve the ID of the thread with ``omp_get_thread_num()`` in C or in Fortran ``OMP_GET_THREAD_NUM()``.
 
-Questions:
+## Exercise 2 - Parallel load/stores using ``pragma omp parallel for``
 
-- How do you change the number of threads?
-- How many different ways are there to change the number of threads? Which one are those?
-- How can you make the output ordered from thread 0 to thread 4?
+_Concepts: Parallel, default data environment, runtime library calls_
 
-## Exercise 2 - Creating Threads: calculate &pi; in parallel using pragma omp parallel
+Here are considering the parallelisation of a widely used computational pattern, namely adding an array with a scaled array. Serial versions of the this task are provided: [stream-triad.c](stream-triad.c) / [stream-triad.f90](stream-triad.f90)
 
-_Concepts: Parallel, default data environment, runtime library calls_
+This implementation performs repeated execution of the benchmarked kernel to make improve time measurements.
+
+### Tasks and questions to be addressed
+
+1) Create a parallel version of the programs using a parallel construct: ``#pragma omp parallel for``. In addition to a parallel construct, you might need some runtime library routines:
+   - ``int omp_get_num_threads()`` to get the number of threads in a team
+   - ``int omp_get_thread_num()`` to get thread ID
+   - ``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past
+   - ``omp_set_num_threads()`` to request a number of threads in a team
+2) Run the parallel code and take the execution time with 1, 2, 4, 12, 24 threads for different array length ``N``. Record the timing.
+3) Produce a plot showing execution time as a function of array length for different number of threads.
+4) How large does ``N`` has to be for using 2 threads becoming more beneficial compared to a single thread?
+5) How large needs ``N`` to be chosen for all arrays not to fit into the L3 cache?
+6) Compare results for large ``N`` and 8 threads using different settings of ``OMP_PROC_BIND`` and reason about the observed performance differences.
+
+## Exercise 3 - Parallel calculation of $\pi$ using ``pragma omp parallel``
 
-Here we are going to implement a first parallel version of the 
-[pi.c](pi.c) / [pi.f90](pi.f90)
-code to calculate the value of &pi; using the parallel construct.
+_Concepts: Parallel, default data environment, runtime library calls_
 
-The figure below shows the numerical technique, we are going to use to calculate &pi;.
+Here we are going to implement a first parallel version of the [pi.c](pi.c) / [pi.f90](pi.f90) code to calculate the value of &pi; using the parallel construct. The figure below shows the numerical technique, we are going to use to calculate &pi;.
 
 <img src="image/pi_int.png" alt="PI_integral" width="350px"/>
 
@@ -114,9 +109,9 @@ We can approximate the integral as a sum of rectangles
 
 <img src="image/pi_eq_2.png" alt="PI_Eq_2" width="200px"/>
 
-where each rectangle has width &Delta;x and height F(x<sub>i</sub>) at the middle of interval i.
+where each rectangle has width $\Delta$ and height F(x<sub>i</sub>) at the middle of interval i.
 
-A simple serial C code to calculate &pi; is the following:
+A simple serial C code to calculate $\pi$ is the following:
 
 ```
     unsigned long nsteps = 1<<27; /* around 10^8 steps */
@@ -134,86 +129,55 @@ A simple serial C code to calculate &pi; is the following:
     pi *= 4.0 * dx;
 ```
 
-Instructions: Create a parallel version of the 
-[pi.c](pi.c) / [pi.f90](pi.f90) program using a
-parallel construct: ``#pragma omp parallel``. Run the parallel code and take the
-execution time with 1, 2, 4, 8, 16, 32 threads. Record the timing.
-
-Pay close attention to shared versus private variables.
+### Tasks and questions to be addressed
 
-- In addition to a parallel construct, you might need the runtime library routines
-- ``int omp_get_num_threads()``; to get the number of threads in a team
-- ``int omp_get_thread_num()``; to get thread ID
-- ``double omp_get_wtime()``; to get the time in seconds since a fixed point in the past
-- ``omp_set_num_threads()``; to request a number of threads in a team
+1) Create a parallel version of the [pi.c](pi.c) / [pi.f90](pi.f90) program using a parallel construct: ``#pragma omp parallel``.  Pay close attention to shared versus private variables. In addition to a parallel construct, you might need some runtime library routines:
+   - ``int omp_get_num_threads()`` to get the number of threads in a team
+   - ``int omp_get_thread_num()`` to get thread ID
+   - ``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past
+   - ``omp_set_num_threads()`` to request a number of threads in a team
+2) Run the parallel code and take the execution time with 1, 2, 4, 8, 12, 24 threads. Record the timing.
+3) How does the execution time change varying the number of threads? Is it what you expected? If not, why do you think it is so?
+4) Is there any technique you heard of in class to improve the scalability of the technique? How would you implement it?
 
 Hints:
 
 - Use a parallel construct: ``#pragma omp parallel``.
 - Divide loop iterations between threads (use the thread ID and the number of threads).
-- Create an accumulator for each thread to hold partial sums that you can later
-  combine to generate the global sum.
-
-Questions:
+- Create an accumulator for each thread to hold partial sums that you can later  combine to generate the global sum.
 
-- How does the execution time change varying the number of threads? Is it what you
-  expected? If not, why do you think it is so?
-- Is there any technique you heard of in class to improve the scalability of the
-  technique? How would you implement it?
-
-## Exercise 3 - Calculate &pi; using critical and atomic directives
+## Exercise 3 - Calculate $\pi$ using critical and atomic directives
 
 _Concepts: parallel region, synchronization, critical, atomic_
 
-Here we are going to implement a second and a third parallel version of the
-[pi.c](pi.c) / [pi.f90](pi.f90) code to calculate the value of &pi; 
-using the critical and atomic directives.
+Here we are going to implement a second and a third parallel version of the [pi.c](pi.c) / [pi.f90](pi.f90) code to calculate the value of $\pi$ using the critical and atomic directives.
+
+### Tasks and questions to be addressed
 
-Instructions: Create two new parallel versions of the 
-[pi.c](pi.c) / [pi.f90](pi.f90) program
-using the parallel construct ``#pragma omp parallel`` and 1) ``#pragma omp critical``
-2) ``#pragma omp atomic``. Run the two new parallel codes and take the execution
-time with 1, 2, 4, 8, 16, 32 threads. Record the timing in a table.
+1) Create two new parallel versions of the [pi.c](pi.c) / [pi.f90](pi.f90) program using the parallel construct ``#pragma omp parallel`` and a) ``#pragma omp critical`` b) ``#pragma omp atomic``.
+2) Run the two new parallel codes and take the execution time with 1, 2, 4, 8, 16, 32 threads. Record the timing in a table.
+3) What would happen if you hadn’t used critical or atomic a shared variable?
+4) How does the execution time change varying the number of threads? Is it what you expected?
+5) Do the two versions of the code differ in performance? If so, what do you think is the reason?
 
 Hints:
 
-- We can use a shared variable &pi; to be updated concurrently by different
-  threads. However, this variable needs to be protected with a critical section
-  or an atomic access.
+- We can use a shared variable $\pi$ to be updated concurrently by different threads. However, this variable needs to be protected with a critical section or an atomic access.
 - Use critical and atomic before the update ``pi += step``
 
-Questions:
-
-- What would happen if you hadn’t used critical or atomic a shared variable?
-- How does the execution time change varying the number of threads? Is it what
-  you expected?
-- Do the two versions of the code differ in performance? If so, what do you
-  think is the reason?
-
 ## Exercise 4 - Calculate &pi; with a loop and a reduction
 
 _Concepts: worksharing, parallel loop, schedule, reduction_
 
-Here we are going to implement a fourth parallel version of the 
-[pi.c](pi.c) / [pi.f90](pi.f90)
-code to calculate the value of &pi; using ``omp for`` and ``reduction`` operations.
+Here we are going to implement a fourth parallel version of the [pi.c](pi.c) / [pi.f90](pi.f90) code to calculate the value of $\pi$; using ``omp for`` and ``reduction`` operations.
 
-Instructions: Create a new parallel versions of the 
-[pi.c](pi.c) / [pi.f90](pi.f90) program using
-the parallel construct ``#pragma omp for`` and ``reduction`` operation. Run the new
-parallel code and take the execution time for 1, 2, 4, 8, 16, 32 threads. Record
-the timing in a table. Change the schedule to dynamic and guided and measure
-the execution time for 1, 2, 4, 8, 16, 32 threads.
+### Tasks and questions to be addressed
 
-Hints:
+1) Create a new parallel versions of the [pi.c](pi.c) / [pi.f90](pi.f90) program using the parallel construct ``#pragma omp for`` and ``reduction`` operation.
+2) Run the new parallel code and take the execution time for 1, 2, 4, 8, 12, 24 threads. Record the timing in a table. Change the schedule to dynamic and guided and measure the execution time for 1, 2, 4, 8, 12, 24 threads.
+3) What is the scheduling that provides the best performance? What is the reason for that?
+4) What is the fastest parallel implementation of pi.c / pi.f90 program? What is the reason for it being the fastest? What would be an even faster implementation of pi.c / pi.f90 program?
 
-- To change the schedule, you can either change the environment variable with
-``export OMP_SCHEDULE=type`` where ``type`` can be any of static, dynamic, guided or in
-the source code as ``omp parallel for schedule(type)``.
-
-Questions:
+Hints:
 
-- What is the scheduling that provides the best performance? What is the reason for that?
-- What is the fastest parallel implementation of pi.c / pi.f90 program? What is
-  the reason for it being the fastest? What would be an even faster implementation
-  of pi.c / pi.f90 program?
+- To change the schedule, you can either change the environment variable with ``export OMP_SCHEDULE=type`` where ``type`` can be any of static, dynamic, guided or in the source code as ``omp parallel for schedule(type)``.
\ No newline at end of file