diff --git a/README.md b/README.md
index 7d0b4a598903973f768bf83978db16dbad43be8c..da5b2dfd81a9fd180f10c594200976f4e05822fc 100644
--- a/README.md
+++ b/README.md
@@ -48,7 +48,7 @@ For the 2022 PDC Summer School the reservation name is ``summer-<date>``, where
 An environment variable specifying the number of threads should also be set:
 
 ```
-export OMP_NUM_THREADS=128
+export OMP_NUM_THREADS=16
 ```
 
 Then the ``srun`` command is used to launch an OpenMP application:
@@ -57,7 +57,7 @@ Then the ``srun`` command is used to launch an OpenMP application:
 srun -n 1 ./example.x
 ```
 
-In this example we will start one task with 128 threads.
+In this example we will start one task with 16 threads.
 
 It is important to use the `srun` command since otherwise the job will run on the login node.
 
diff --git a/advanced_lab/README.md b/advanced_lab/README.md
index 30aa8a8c39ecfaa66a82bf2568387a87c8e9395c..7c6e6fa6f5642df4d96bbefbe3da3a23b28d0ddd 100644
--- a/advanced_lab/README.md
+++ b/advanced_lab/README.md
@@ -72,7 +72,7 @@ In this exercise, we explore parallel performance refers to the computational sp
 
 ### Tasks and questions to be addressed
 
-1) Measure run time $\Delta$*T*<sub>*n*</sub> for *n* = 1, 2, ..., 24 threads and calculate the speed-up.
+1) Measure run time $\Delta$*T*<sub>*n*</sub> for *n* = 1, 2, ..., 16 threads and calculate the speed-up.
 2) Is it linear? If not, why?
 3) Finally, is the obtained speed-up acceptable?
 4) Try to increase the space discretization (M,N) and see if it affects the speed-up.
diff --git a/intro_lab/README.md b/intro_lab/README.md
index f9d988de973ac1705e9a260ad3fbc1443f657269..6210aae65e9cabed9af16ff0666807f676e5d794 100644
--- a/intro_lab/README.md
+++ b/intro_lab/README.md
@@ -86,7 +86,7 @@ This implementation performs repeated execution of the benchmarked kernel to mak
    - ``int omp_get_thread_num()`` to get thread ID
    - ``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past
    - ``omp_set_num_threads()`` to set the number of threads to be used
-2) Run the parallel code and take the execution time with 1, 2, 4, 12, 24 threads for different array length ``N``. Record the timing.
+2) Run the parallel code and take the execution time with 1, 2, 4, 8, 16 threads for different array length ``N``. Record the timing.
 3) Produce a plot showing execution time as a function of array length for different number of threads.
 4) How large does ``N`` has to be for using 2 threads becoming more beneficial compared to a single thread?
 5) How large needs ``N`` to be chosen for all arrays not to fit into the L3 cache?
@@ -135,7 +135,7 @@ A simple serial C code to calculate $\pi$ is the following:
    - ``int omp_get_thread_num()`` to get thread ID
    - ``double omp_get_wtime()`` to get the time in seconds since a fixed point in the past
    - ``omp_set_num_threads()`` to set the number of threads to be used
-2) Run the parallel code and take the execution time with 1, 2, 4, 8, 12, 24 threads. Record the timing.
+2) Run the parallel code and take the execution time with 1, 2, 4, 8, 16 threads. Record the timing.
 3) How does the execution time change varying the number of threads? Is it what you expected? If not, why do you think it is so?
 4) Is there any technique you heard of in class to improve the scalability of the technique? How would you implement it?
 
@@ -145,7 +145,7 @@ Hints:
 - Divide loop iterations between threads (use the thread ID and the number of threads).
 - Create an accumulator for each thread to hold partial sums that you can later  combine to generate the global sum.
 
-## Exercise 3 - Calculate $\pi$ using critical and atomic directives
+## Exercise 4 - Calculate $\pi$ using critical and atomic directives
 
 _Concepts: parallel region, synchronization, critical, atomic_
 
@@ -164,7 +164,7 @@ Hints:
 - We can use a shared variable $\pi$ to be updated concurrently by different threads. However, this variable needs to be protected with a critical section or an atomic access.
 - Use critical and atomic before the update ``pi += step``
 
-## Exercise 4 - Calculate &pi; with a loop and a reduction
+## Exercise 5 - Calculate &pi; with a loop and a reduction
 
 _Concepts: worksharing, parallel loop, schedule, reduction_
 
@@ -173,7 +173,7 @@ Here we are going to implement a fourth parallel version of the [pi.c](pi.c) / [
 ### Tasks and questions to be addressed
 
 1) Create a new parallel versions of the [pi.c](pi.c) / [pi.f90](pi.f90) program using the parallel construct ``#pragma omp for`` and ``reduction`` operation.
-2) Run the new parallel code and take the execution time for 1, 2, 4, 8, 12, 24 threads. Record the timing in a table. Change the schedule to dynamic and guided and measure the execution time for 1, 2, 4, 8, 12, 24 threads.
+2) Run the new parallel code and take the execution time for 1, 2, 4, 8, 16 threads. Record the timing in a table. Change the schedule to dynamic and guided and measure the execution time for 1, 2, 4, 8, 16 threads.
 3) What is the scheduling that provides the best performance? What is the reason for that?
 4) What is the fastest parallel implementation of pi.c / pi.f90 program? What is the reason for it being the fastest? What would be an even faster implementation of pi.c / pi.f90 program?