From 872db15e29b3e04240a6ac84133e1f1837628625 Mon Sep 17 00:00:00 2001
From: Mathias Wagner <mathiasw@nvidia.com>
Date: Sat, 16 Nov 2019 11:00:09 +0100
Subject: [PATCH] fixes to the master notebook

---
 .../HandsOnGPUProgramming_master.ipynb        | 51 ++++++++++---------
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/4-GPU/HandsOn/.master/HandsOnGPUProgramming_master.ipynb b/4-GPU/HandsOn/.master/HandsOnGPUProgramming_master.ipynb
index a8c157c..5d09371 100644
--- a/4-GPU/HandsOn/.master/HandsOnGPUProgramming_master.ipynb
+++ b/4-GPU/HandsOn/.master/HandsOnGPUProgramming_master.ipynb
@@ -20,7 +20,7 @@
     "\n",
     "**This contains the output for the solutions.**\n",
     "\n",
-    "The solutions are described in the solution section. The directory links to the solution source files should work though. For the _html_ and _pdf_ versions please navigate to the corresponding directory to find the solution profiles and sources.\n",
+    "The solutions are described in the solution section. Please navigate to the corresponding directory to find the solution profiles and sources.\n",
     "\n",
     "\n",
     "### GPU Programming\n",
@@ -51,7 +51,7 @@
     "\n",
     "### Survey\n",
     " \n",
-    " * [Suvery](#survey) Please remember to take the survey !\n",
+    " * Please remember to take the [suvery](#survey) !\n",
     "\n",
     "---\n",
     "---"
@@ -69,9 +69,9 @@
     "\n",
     "#### Jupyter Lab execution\n",
     "\n",
-    "When using jupyter this notebook will guide you through the step. Note that if you execute a cell multiple times while optimizing the code the output will be replaced. You can however duplicate the cell you want to execute and keep its output. Check the _edit_ menu above.\n",
+    "When using jupyter this notebook will guide you through the tasks. Note that if you execute a cell multiple times while optimizing the code the output will be replaced. You can however duplicate the cell you want to execute and keep its output. Check the _edit_ menu above.\n",
     "\n",
-    "You will always find links to a file browser of the corresponding task subdirectory as well as direct links to the source files you will need to edit as well as the profiling output you need to open locally.\n",
+    "You can always use the file browser to locate the the source files you will need to edit as well as the profiling output you need to open locally.\n",
     "\n",
     "If you want you also can get a terminal in your browser by following the *File -> New -> Terminal* in the Jupyter Lab menu bar.\n",
     "\n",
@@ -79,19 +79,19 @@
     "The tasks are placed in directories named `[C/FORTRAN]/task[0-6]`.<br>\n",
     "*Note: The tasks using NVHSMEM (4-6) are only available in C.* \n",
     "\n",
-    "The files you will need to edit are always the `poisson2d.(C|F03)` files.\n",
+    "The files you will need to edit are always the `poisson2d.(c|F03)` files.\n",
     "\n",
-    "The makefile targets execute everything to compile, run and profile the code. Please take a look at the cells containing the make calls as a guide.\n",
+    "The makefile targets execute everything to compile, run and profile the code. Please take a look at the cells containing the make calls as guidane.\n",
     "\n",
-    "The outputs of profiling runs be placed in the working directory of the current task and are named like `*.pgprof` or `pgprof.*.tar.gz` in case of multiple files. You can use _scp/sftp_ to transfer files to your machine and for viewing them in pgprof/nvprof.\n",
+    "The outputs of profiling runs will be placed in the working directory of the current task and are named like `*.pgprof` or `pgprof.*.tar.gz` in case of multiple files. You can use _scp/sftp_ to transfer files to your machine and for viewing them in pgprof/nvprof.\n",
     "\n",
     "#### Viewing profiles in the NVIDIA Visual Profiler / PGI Profiler\n",
     "\n",
     "The profiles generated _pgprof / nvprof_ should be viewed on your local machine. You can install the PGI Community Edition (pgprof) or the NVIDIA CUDA Toolkit on your notebook (Windows, Mac, Linux). You don't need an NVIDIA GPU in your machine to use the profiler GUI.\n",
     "\n",
     "There are USB Sticks in the room that contain the installers for various platforms, but for reference you can also download it from:\n",
-    "* [NVIDIA CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)\n",
-    "* [PGI Community Edition](https://www.pgroup.com/products/community.htm)\n",
+    "* [NVIDIA CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) \n",
+    "* [PGI Community Edition](https://www.pgroup.com/products/community.htm) _For Windows and Linux only, there is no GPU support for Mac_\n",
     "\n",
     "After downloading the profiler output (more infos below) follow the steps outlined in:\n",
     "* [Import Session](https://docs.nvidia.com/cuda/profiler-users-guide/index.html#import-session)\n",
@@ -161,9 +161,14 @@
    "source": [
     "# Tasks<a name=\"top\"></a>\n",
     "\n",
-    "This session comes with multiple tasks. All tasks are available in C or FORTRAN and can be found in the `[C|Fortan]/task[0-3]` subdirectories. There you will also find Makefiles that are set up so that you can compile and submit all necessary tasks.\n",
+    "This session includes multiple tasks. The first tasks are available in C or FORTRAN and can be found in the `[C|Fortan]/task[0-3]` subdirectories. The *advanced / optional* NVSHMEM tasks are available only in C and located in the `C/task[4-6]` directories. \n",
+    "\n",
+    "*If you want to go for the advanced NVSHMEM tasks you should complete Task 2 but can skip Task 3 (or postpone it until the end).*\n",
+    "\n",
+    "In any case you will also Makefiles that are set up so that you can compile and submit all necessary tasks.\n",
+    "\n",
+    "Please choose from the task below. \n",
     "\n",
-    "Please choose from the task below. *If you want to go for the advanced NVSHMEM tasks you should complete Task 2 but can skip Task 3 (or postpone it until the end).*\n",
     "\n",
     "\n",
     "### GPU Programming\n",
@@ -190,7 +195,7 @@
     "\n",
     "### Survey\n",
     " \n",
-    " * [Suvery](#survey) Please remember to take the survey !"
+    " * Please remember to take the [suvery](#survey) !"
    ]
   },
   {
@@ -250,7 +255,7 @@
     "\n",
     "You can open the source code either in a terminal in an editor. Navigate to `(C|Fortran)/task0/` and open `poisson2d.c` in a editor of your choice.\n",
     "\n",
-    "If your are using the jupyter approach by following the link (for the language of your choice), This will open the source code in an editor in a new browser tab/window.\n",
+    "If your are using the jupyter approach by following the link (for the language of your choice). This will open the source code in an editor in a new browser tab/window.\n",
     "\n",
     "* [C Version](./C/task0/poisson2d.c)\n",
     "* [Fortran Version](.FORTAN/task0/poisson2d.F03)\n",
@@ -1634,9 +1639,9 @@
     "\n",
     "---\n",
     "\n",
-    "NVSHMEM enables efficient communication among GPUs.It supports an API for direct communication among GPUs, either initiated by the CPU or by GPUs inside of compute kernels. Inside compute kernels, NVSHMEM also supports direct load/store accesses to remote memory over PCIe or NVLink. The ability to initiate communication from inside kernels eliminates GPU-host-synchronization and associated overheads. It can also benefit from latency tolerance mechanisms available within GPUs. The tasks illustrate that progressing from an MPI-only app to an app that uses NVSHMEM can be straightforward.\n",
+    "NVSHMEM enables efficient communication among GPUs. It supports an API for direct communication among GPUs, either initiated by the CPU or by GPUs inside of compute kernels. Inside compute kernels, NVSHMEM also supports direct load/store accesses to remote memory over PCIe or NVLink. The ability to initiate communication from inside kernels eliminates GPU-host-synchronization and associated overheads. It can also benefit from latency tolerance mechanisms available within GPUs. The tasks illustrate that progressing from an MPI-only app to an app that uses NVSHMEM can be straightforward.\n",
     "\n",
-    "**NOTE**: Covering all feature of NVSHMEM, incuding communication calls in kernels, is not easily accessible through OpenACC and also exceed the scope of this tutorial. However, the OpenACC examples should give you a basic introduction to NVSHMEM.\n",
+    "**NOTE**: Covering all feature of NVSHMEM, including communication calls in kernels, is not easily accessible through OpenACC and also exceed the scope of this tutorial. However, the OpenACC examples should give you a basic introduction to NVSHMEM.\n",
     "\n",
     "You can check the developer guide and the other presentations \n",
     "\n",
@@ -1668,9 +1673,9 @@
     "\n",
     "\n",
     "\n",
-    "**For interoperability with OpenSHMEM NVSHMEM can also be set up to prefix all calls to NVHSMEM with `nv`. Please make sure to use these version, e.g. use `nvshmem_barrier` instead of `shmem_barrier`. The developer guide mostly uses the unprefixed versions.**\n",
+    "**For interoperability with OpenSHMEM NVSHMEM can also be set up to prefix all calls to NVHSMEM with `nv`. Please make sure to use these version, e.g. use `nvshmem_barrier` instead of `shmem_barrier`. The developer guide mostly uses the not prefixed versions.**\n",
     "\n",
-    "_Look for_ __TODOs__.\n",
+    "_Look for_ __TODOs__ in the code.\n",
     "\n",
     "\n",
     "\n",
@@ -2028,10 +2033,10 @@
    "source": [
     "## Task 5: <a name=\"task5\"></a>Make communication asynchronous\n",
     "\n",
-    "NVSHMEM allows you to put communications in *CUDA streams / OpenACC async queues*. This allows the CPU already set up communication and kernel launches while the GPU is still communicationg, effectively hiding the time spend in API calls.\n",
+    "NVSHMEM allows you to put communications in *CUDA streams / OpenACC async queues*. This allows the CPU already set up communication and kernel launches while the GPU is still communicating, effectively hiding the time spend in API calls.\n",
     "\n",
     "To do this you need to:\n",
-    "* use the `async` and `wait` keywords in the OpenACC pragmas to excute the kernels asynchronously in the OpenACC default queu\n",
+    "* use the `async` and `wait` keywords in the OpenACC pragmas to excute the kernels asynchronously in the OpenACC default queue\n",
     "* replace `nvshmem_double_put` calls with the `nvhsmemx_double_put_on_stream` version.<br>\n",
     "  use `use acc_get_cuda_stream` and `acc_get_default_async` to get the `cudaStream_t cudaStream` corresponding to the OpenACC default async queue.\n",
     "* make sure to synchronize before copying the data back to the CPU\n",
@@ -2399,7 +2404,7 @@
    "source": [
     "## Task 6: <a name=\"task5\"></a>Use direct load/store to remote memory\n",
     "\n",
-    "NVSHMEM allows you to put communications in the GPU kernels. Howerver, the `nvhsmem_put / nvshmem_get` calls are not easily avilable in OpenACC kernels. However, for *intranode* communication when all GPUs can use P2P (as in the nodes in Ascent and Summit) you can get a pointer to a remote GPUs memory using `nvshmem_ptr`.\n",
+    "NVSHMEM allows you to put communications in the GPU kernels. However, the `nvhsmem_put / nvshmem_get` calls are not easily available in OpenACC kernels. However, for *intranode* communication when all GPUs can use P2P (as in the nodes in Ascent and Summit) you can get a pointer to a remote GPUs memory using `nvshmem_ptr`.\n",
     "\n",
     "To do this you need to:\n",
     "* use the `nvshmem_ptr` to get pointers to your neighboring (top/bottom) `d_A` allocation\n",
@@ -4139,13 +4144,13 @@
     "## Solution 4:<a name=\"solution4\"></a>\n",
     "\n",
     "\n",
-    "Include NVSHMEM headers\n",
+    "First, include NVSHMEM headers\n",
     "\n",
     "```C\n",
     "#include <nvshmem.h>\n",
     "#include <nvshmemx.h>\n",
     "```\n",
-    "and initalize NVSHMEM with MPI\n",
+    "and initialize NVSHMEM with MPI\n",
     "```C\n",
     "MPI_Comm mpi_comm = MPI_COMM_WORLD;\n",
     "nvshmemx_init_attr_t attr;\n",
@@ -4842,7 +4847,7 @@
     "exercise": "solution"
    },
    "source": [
-    "## Solution 6:<a name=\"solution6\"></a> TODO\n",
+    "## Solution 6:<a name=\"solution6\"></a>\n",
     "\n",
     "\n",
     "\n",
-- 
GitLab