- We can use a shared variable $\pi$ to be updated concurrently by different threads. However, this variable needs to be protected with a critical section or an atomic access.
- Use critical and atomic before the update ``pi += step``
## Exercise 4 - Calculate π with a loop and a reduction
## Exercise 5 - Calculate π with a loop and a reduction
@@ -173,7 +173,7 @@ Here we are going to implement a fourth parallel version of the [pi.c](pi.c) / [
### Tasks and questions to be addressed
1) Create a new parallel versions of the [pi.c](pi.c) / [pi.f90](pi.f90) program using the parallel construct ``#pragma omp for`` and ``reduction`` operation.
2) Run the new parallel code and take the execution time for 1, 2, 4, 8, 12, 24 threads. Record the timing in a table. Change the schedule to dynamic and guided and measure the execution time for 1, 2, 4, 8, 12, 24 threads.
2) Run the new parallel code and take the execution time for 1, 2, 4, 8, 16 threads. Record the timing in a table. Change the schedule to dynamic and guided and measure the execution time for 1, 2, 4, 8, 16 threads.
3) What is the scheduling that provides the best performance? What is the reason for that?
4) What is the fastest parallel implementation of pi.c / pi.f90 program? What is the reason for it being the fastest? What would be an even faster implementation of pi.c / pi.f90 program?