diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..763513e910f7036a1a3cdb21dce8e57da2451891 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.ipynb_checkpoints diff --git a/00_Introduction to IPython.ipynb b/00_Introduction to IPython.ipynb index f98177502321cf4127543a0724b5ca2b4862731a..285480644cc5be233ab56292c5ca80499e77e464 100644 --- a/00_Introduction to IPython.ipynb +++ b/00_Introduction to IPython.ipynb @@ -3,9 +3,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Introduction to IPython and Jupyter Notebook" @@ -20,7 +22,7 @@ }, "source": [ "<div class=\"dateauthor\">\n", - "12 June 2023 | Jan H. Meinke\n", + "07 June 2024 | Jan H. Meinke\n", "</div>" ] }, @@ -156,7 +158,7 @@ "source": [ "If you didn't get the documentation but only ``Object `random` not found``. Try importing the module first. Start by typing ``im`` and hit the tab key. Then type ``r`` and hit tab again. You'll get a dropdown box with available modules. You can continue typing until your choice is unique or select an item from the list. Give it a try. \n", "\n", - "Try ``random?`` after importing the module. If you use ``??`` instead of ``?`` you get the source code." + "Try ``random?`` after importing the module. If you use ``??`` instead of ``?`` you get the source code. Note: If you would like to temporarily hide a cell (e.g. an output cell with a very long text) just click on the blue bar displayed to the right of the cell." ] }, { @@ -266,9 +268,6 @@ "import matplotlib.pyplot as plt\n", "```\n", "\n", - "Alternatively, the command ``%pylab inline`` sets up interactive plotting and pulls all functions and modules from ``numpy`` and ``matplotlib.pyplot`` into the namespace.\n", - "\n", - "\n", "[matplotlib]: http://matplotlib.org/" ] }, @@ -283,7 +282,7 @@ "outputs": [], "source": [ "%matplotlib inline \n", - "# widget is an interactive alternative to inline\n", + "# depending on the installation widget and ipympl are interactive alternatives to inline\n", "import matplotlib.pyplot as plt\n", "import numpy" ] @@ -903,9 +902,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023 (local)", + "display_name": "HPC Python 2024 (local)", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -917,7 +916,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.6" + "version": "3.12.3" } }, "nbformat": 4, diff --git a/01_Bottlenecks.ipynb b/01_Bottlenecks.ipynb index 742c473076d110c9458a601cfa6992196aba7511..65dfc452510ff100a81d8ab182315812d4b7952f 100644 --- a/01_Bottlenecks.ipynb +++ b/01_Bottlenecks.ipynb @@ -3,24 +3,28 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Bottlenecks\n", "\n", "<div class=\"dateauthor\">\n", - "12 Jun 2023 | Jan H. Meinke\n", + "10 Jun 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## High-performance computing is computing at the limit" @@ -29,6 +33,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -41,6 +46,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -53,6 +59,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -65,6 +72,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -77,6 +85,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -89,9 +98,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## CPU\n", @@ -101,9 +112,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "\n", @@ -118,9 +131,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "There are 64 cores per socket\n", @@ -130,9 +145,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "This is the limit most people think of first, but it's often not the crucial one. Each core on JUSUF can perform ca. 36 GFlop/s if the code is completely *vectorized* and performs a *multiply and an add operation* at *each step*. If your code doesn't fulfill those requirements its peak performance will be less.\n", @@ -143,9 +160,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Memory hierarchy\n", @@ -155,9 +174,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "* L1 (per core):\n", @@ -174,9 +195,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "\n", @@ -186,9 +209,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "The memory bandwidth of a JUSUF node is about " @@ -197,9 +222,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "190.7 GiB/s (~400 cycles latency)" @@ -208,9 +235,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## A simple operation" @@ -219,9 +248,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "c = c + a * b (multiply-add)" @@ -230,9 +261,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "3 DP read, 1 DP write -> 24 bytes read, 8 bytes write" @@ -241,9 +274,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "190 GB/s / 24 bytes/op = 8 Gop/s (multiply-add -> 16 GFLOP/s)" @@ -252,9 +287,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "I assume that we are dealing with double precision numbers (8 bytes). Then I have to read 3 * 8 bytes = 24 bytes and write 8 bytes. This is a multiply-add operation, so each core can do 18 billion of those per second, but it only receives 190 GB/s. 190GB/s / 24 B/op = 8 Gop/s (16 GFLOP/s). This operation is clearly memory bound, if we have to get all the data from main memory." @@ -263,9 +300,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Matrix multiplication" @@ -274,9 +313,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "This operation is quite common. Let's look at a matrix multiplication $C=AB$. To calculate the element i, j of the result matrix C, we multiply row i of A with column j of B and sum the results. This is the scalar or dot product of row i of A and column j of B. In code this looks like this:" @@ -285,6 +326,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -303,6 +345,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -316,6 +359,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -351,9 +395,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "Let's take two small matrices A and B and see how long the above function takes." @@ -363,9 +409,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -378,9 +426,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -390,9 +440,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "A matrix multiplication of two n by n matrices performs $2n^3$ operations. The dot function achieves" @@ -402,9 +454,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -414,9 +468,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Wow, that's bad. Let's see if we can make this faster." @@ -425,9 +481,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "## Numba" @@ -437,13 +495,15 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ - "from numba import njit as jit\n", + "from numba import njit as jit # This is the default for numba 0.59.0 and later\n", "jdot = jit(dot)" ] }, @@ -451,9 +511,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -464,9 +526,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -476,9 +540,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "## Access order and cache lines" @@ -487,9 +553,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "From our estimate above, we should be able to get at least ten times this, but that's assuming we can achieve the maximum memory bandwidth. \n", @@ -505,9 +573,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -540,9 +610,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Now, elements in b are accessed in the proper order and a[i, k] is constant for the loop. This changes our estimate, because, now we read 8 bytes/op in the innermost loop. This gives us a maximum of 190 GB/s / 8 bytes/op = 24 Gop/s (48 GFLOP/s) making this compute bound on a single core." @@ -551,9 +623,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Effect on matrix multiplication" @@ -563,9 +637,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -576,9 +652,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -588,21 +666,25 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ - "This is not much better. Let's take a look at a bigger matrix." + "This is much better. Let's take a look at a bigger matrix." ] }, { "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -615,9 +697,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -628,9 +712,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -640,12 +726,14 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { - "slide_type": "notes" - } + "slide_type": "skip" + }, + "tags": [] }, "source": [ - "This is even worse and corresponds to a bandwidth of about 8 GB/s.\n", + "This is worse and corresponds to a bandwidth of about 18 GB/s on JUSUF and almost twice that in the cloud.\n", "\n", "A possible explanation is that a single core may not be able to access the full bandwidth of the socket.\n", "\n", @@ -655,9 +743,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { - "slide_type": "subslide" - } + "slide_type": "skip" + }, + "tags": [] }, "source": [ "## Numpy" @@ -666,9 +756,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Let's see how long numpy takes for this:" @@ -678,6 +770,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -692,9 +785,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -707,9 +802,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -718,7 +815,13 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "The maximum clock frequency of the processor is 3.4 GHz, which corresponds to a peak performance of about 54 GFLOP/s. This is pretty close." ] @@ -727,6 +830,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -741,6 +845,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -754,9 +859,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The numpy version we use here, uses a fast math library. That's what you want!\n", @@ -767,9 +874,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## The roofline model" @@ -778,9 +887,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The roofline model shows the memory bandwidth bound and compute bound with respect to the computational intensity. The computational intensity is just given by the number of bytes used divided by the number of operations performed." @@ -789,9 +900,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "" @@ -800,9 +913,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Depending on your algorithm, different limits may be relevant, for example, we only used a single thread, but used the peak performance of the entire processor with 64 cores. If the data fits completely in L2 cache the available bandwidth is higher once the data has been loaded. The following shows a plot with a few more limits." @@ -811,9 +926,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "" @@ -822,9 +939,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## I/O" @@ -833,9 +952,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "### GPFS File System\n", @@ -845,9 +966,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "Each node connected to file system with $\\mathcal{O}(100)$ GBit/s or about 12.5 GB/s." @@ -856,9 +979,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The scratch file system achieves read/write bandwidths that are very similar to the main memory bandwidth, but not for a single node. Each node is connected to the GPFS file system with $\\mathcal{O}(100)$ GBit/s connection. In other words, we can read/write about 12.5 GB/s. If we had to load the data in the previous calculation from disk, we could only achieve 12.5 GB/s / 24 bytes/op = 520 Mop/s. The main memory bandwidth or the peak performance of the CPU don't matter in this case." @@ -868,9 +993,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [] @@ -878,9 +1005,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -892,7 +1019,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/02_NumPy_concepts.ipynb b/02_NumPy_concepts.ipynb index 743b74bb52b32d5fc47f18f8e4c60abf0222d974..16164c67c78afb3676ada74917d67eee1dfd5cb3 100644 --- a/02_NumPy_concepts.ipynb +++ b/02_NumPy_concepts.ipynb @@ -3,24 +3,28 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# NumPy - an HPC perspective\n", "\n", "<div class=\"dateauthor\">\n", - "12 June 2023 | Olav Zimmermann\n", + "7 June 2024 | Olav Zimmermann\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Python is an interpreted language and as such it is extremely flexible, allowing to define everything, including code itself, \n", @@ -133,9 +137,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -147,9 +153,40 @@ { "cell_type": "markdown", "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "Note: In case of a wrapper like FlexiBlas ``show_config()`` will only show the wrapper. One possibility to get a hint which BLAS implementation NumPy is probably linked against is the following command:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "skip" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from threadpoolctl import threadpool_info\n", + "threadpool_info()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## ndarray" @@ -355,9 +392,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -369,7 +406,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/03_ThinkVector.ipynb b/03_ThinkVector.ipynb index 1c5fc8141855d69d78bba5be977d8940f9e707be..52af6ed56d10a3b385b0ee08e39e77b05f7c0ad2 100644 --- a/03_ThinkVector.ipynb +++ b/03_ThinkVector.ipynb @@ -3,6 +3,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -12,16 +13,18 @@ "# Think Vector\n", "\n", "<div class=\"dateauthor\">\n", - "12 June 2023 | Jan H. Meinke\n", + "07 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Dot product" @@ -440,9 +443,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Let's look at an example. We start with a 2d array of random values and fix the left boundary to a value of 0 and the right boundary to a value of 1. We do not want to change these boundary values. The top and bottom boundaries are connected so that our system forms a cylinder (periodic boundary conditions along y)." @@ -452,13 +457,15 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "-" - } + }, + "tags": [] }, "outputs": [], "source": [ - "A_orig = numpy.random.random((10, 10))\n", + "A_orig = numpy.random.random((30, 30))\n", "A_orig[:, 0] = 0\n", "A_orig[:, -1] = 1" ] @@ -504,12 +511,15 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "-" - } + }, + "tags": [] }, "outputs": [], "source": [ + "# note the use of the modulo operator % to encode the periodic boundary condition\n", "for i in range(A.shape[0]):\n", " for j in range(1, A.shape[1] - 1):\n", " B[i, j] = 0.25 * (A[(i + 1) % A.shape[0], j] + A[i - 1, j] \n", @@ -796,6 +806,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -808,18 +819,20 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, "tags": [] }, "source": [ - "Multiplying to complex numbers is more interesting: " + "Multiplying two complex numbers is more interesting: " ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -869,21 +882,24 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, "tags": [] }, "source": [ - "In short, we can use complex numbers just like any other numerical type. Here is a function that calculates the series and return the iteration at which $|z| > 2$:" + "In short, we can use complex numbers just like any other numerical type. Here is a function that calculates the series and returns the iteration at which $|z| > 2$:" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Escape time algorithm" @@ -947,18 +963,22 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "**Hints:** \n", "\n", "* You can use `numpy.meshgrid()` to generate your 2D array of points.\n", " \n", - " If you have `x = numpy.array([-1.1, -1, 0, 1.2])` and `y = numpy([-0.5j, 0j, 0.75j])` and call `XX, YY = numpy.meshgrid(x, y)`, it returns two arrays of shape 3 by 4. The first one contains 3 rows where each row is a copy of x. The second one contains 4 columns where each colomn is a copy of y.\n", + " If you have `x = numpy.array([-1.1, -1, 0, 1.2])` and `y = numpy.array([-0.5j, 0j, 0.75j])` and call `XX, YY = numpy.meshgrid(x, y)`, it returns two arrays of shape 3 by 4. The first one contains 3 rows where each row is a copy of x. The second one contains 4 columns where each colomn is a copy of y.\n", " \n", " Now you can add those two array to get points in the complex plane. `P = XX + YY`.\n", + "\n", + "* Another (even faster) way is to use broadcasting. For this you need to insert an empty dimension with `np.newaxis` (or `None`).\n", " \n", "* You somehow need to mask the points that already diverged in future iterations.\n", "* You don't have to put this in a function" @@ -979,7 +999,7 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2023 (local)", "language": "python", "name": "hpcpy23" }, @@ -993,7 +1013,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.12.3" } }, "nbformat": 4, diff --git a/04_Particle Dynamics.ipynb b/04_Particle Dynamics.ipynb index 2d6c356f8a912f14cd9d9ac19f858846a3c3fc8a..780c2af089f124d102e9e1e196f1c426cb20b3e8 100644 --- a/04_Particle Dynamics.ipynb +++ b/04_Particle Dynamics.ipynb @@ -12,7 +12,7 @@ "source": [ "# Particle Dynamics with Python\n", "<div class=\"dateauthor\">\n", - "12 June 2023 | Jan H. Meinke\n", + "10 June 2024 | Jan H. Meinke\n", "</div>" ] }, @@ -21,6 +21,7 @@ "execution_count": null, "id": "5822f3b3-bc03-4e2f-85f1-57cb246e3a05", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -38,6 +39,7 @@ "execution_count": null, "id": "f7d1939b-7d73-4c0c-9d8a-d6ea39d48b49", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -45,6 +47,7 @@ }, "outputs": [], "source": [ + "# Note: if available on your installation 'ipympl' and 'widget' provide interactive alternatives to 'inline'\n", "%matplotlib inline" ] }, @@ -52,6 +55,7 @@ "cell_type": "markdown", "id": "b6798959-bbef-4f71-b696-e1069554c403", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -67,6 +71,7 @@ "cell_type": "markdown", "id": "9f9b8f9d-c834-4b86-9ef1-e385694d4b8c", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -83,6 +88,7 @@ "cell_type": "markdown", "id": "2c250750-32b7-4a74-8c3e-5c3eb6c4a13d", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -102,6 +108,7 @@ "cell_type": "markdown", "id": "00ee5853-283f-4786-bd4c-81ca9ab7b3b2", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -115,6 +122,7 @@ "cell_type": "markdown", "id": "a6e75808-f266-4a57-9837-5b9aa69ee436", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -128,6 +136,7 @@ "cell_type": "markdown", "id": "27adecd9-7499-4a86-bb62-15dd40377c72", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -141,6 +150,7 @@ "cell_type": "markdown", "id": "35260044-1b70-46c5-8bfd-8475566037b4", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -155,6 +165,7 @@ "cell_type": "markdown", "id": "0167c3d7-4abc-4635-b53d-aa38072ff922", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -168,6 +179,7 @@ "cell_type": "markdown", "id": "96292513-eaee-4617-bacd-4d13a1f6f8ab", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -182,6 +194,10 @@ "cell_type": "markdown", "id": "cbab8258-28f9-41db-9dda-7f4a5be57603", "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [] }, "source": [ @@ -192,6 +208,7 @@ "cell_type": "markdown", "id": "c55acb8e-6cb4-459c-9241-9e42eb364b72", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -209,6 +226,7 @@ "cell_type": "markdown", "id": "d36faa34-7345-4e94-b19b-62e4419417e0", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -222,6 +240,7 @@ "cell_type": "markdown", "id": "32f7c975-ed21-4c70-9168-5b7bfa5ca276", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -234,7 +253,13 @@ { "cell_type": "markdown", "id": "4288de12-8bf3-41b2-96ca-5c3c47fc0d84", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "1. Calculate the force on each particle by summing up all the forces acting on it.\n", "2. Integrate the equation of motion\n", @@ -248,6 +273,7 @@ "cell_type": "markdown", "id": "efba7cbf-301a-4e5c-81d4-1394c5ec3c9f", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -275,6 +301,7 @@ "cell_type": "markdown", "id": "76d2db76-3bac-4465-9512-babcef5e721b", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -290,6 +317,7 @@ "execution_count": null, "id": "b4525c8a-378a-45b7-b1e2-b67f5f07d397", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -318,6 +346,7 @@ "cell_type": "markdown", "id": "8fd053d2-8c88-4666-82ed-0316fe21ac34", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -331,6 +360,7 @@ "cell_type": "markdown", "id": "ac5e70be-cafd-41cd-b866-5b98ee28fb0a", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -344,6 +374,10 @@ "cell_type": "markdown", "id": "c1d0d68d-23a4-45e1-a431-91e575056e21", "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [] }, "source": [ @@ -354,6 +388,7 @@ "cell_type": "markdown", "id": "0b29d4d1-b6ef-4615-ab11-0bed26267252", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -368,6 +403,7 @@ "execution_count": null, "id": "338142b6-f973-4f7a-b5a4-77e76f3b758f", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -385,6 +421,7 @@ "cell_type": "markdown", "id": "d0156a2d-13ae-46dd-b3a8-cb7eb1aca0bf", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -399,6 +436,7 @@ "execution_count": null, "id": "e841a076-504d-445b-b006-b931e3cb0bc2", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -418,6 +456,7 @@ "cell_type": "markdown", "id": "3de052ac-7591-4477-8285-cc15c0019a7a", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -431,6 +470,7 @@ "cell_type": "markdown", "id": "235e1971-24e0-4cf8-ac27-779e5ae37684", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -445,6 +485,10 @@ "execution_count": null, "id": "1133b4bb-111b-4aca-9326-22a7c29c8522", "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [] }, "outputs": [], @@ -458,6 +502,7 @@ "cell_type": "markdown", "id": "ccea23e5-4f4b-4ff6-b379-8d45e3fe15f4", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -471,6 +516,7 @@ "cell_type": "markdown", "id": "dba27f9b-350e-4e65-9f42-e3615ee30a84", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -484,7 +530,13 @@ "cell_type": "code", "execution_count": null, "id": "5ddc24f9-eaf3-491c-bf81-232efa584c1c", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "x = [i + v * dt + 0.5 * f / m * dt * dt for i, v, f in zip(x, vx, Fx)]\n", @@ -496,6 +548,7 @@ "cell_type": "markdown", "id": "52959ed7-d454-40fb-98f1-9df161873c87", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -510,6 +563,7 @@ "execution_count": null, "id": "2266d4e8-8f67-4979-ae47-abf8508673a4", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -526,6 +580,7 @@ "cell_type": "markdown", "id": "e4cff076-759c-477c-9758-41bb730cd606", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -539,6 +594,7 @@ "cell_type": "markdown", "id": "92a88a32-4ee1-44ce-b371-afd412359a3b", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -552,7 +608,13 @@ "cell_type": "code", "execution_count": null, "id": "bf48e0d0-34f6-47ba-8a30-ba0c1e19489d", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "ax = plt.figure(figsize=(5, 5)).add_subplot(projection='3d')\n", @@ -564,6 +626,7 @@ "cell_type": "markdown", "id": "65984f53-4b54-4f6d-aaa1-6de391150539", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -577,6 +640,7 @@ "cell_type": "markdown", "id": "f1f30004-a9c3-4499-84e0-976937b9f8a8", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -592,6 +656,7 @@ "execution_count": null, "id": "039819a6-698f-43a6-a4f0-4f7b8852fbb1", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -604,6 +669,7 @@ "cell_type": "markdown", "id": "5a141c1e-22b6-40be-80d5-25ad2648972c", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -616,9 +682,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023 (local)", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { diff --git a/05_Profiling a simple md code.ipynb b/05_Profiling a simple md code.ipynb index 33a7b5601c19ff31668884fb152a7747ba3de164..9f1dce8af79a2e00a01f0c9359ac47bea91d3e23 100644 --- a/05_Profiling a simple md code.ipynb +++ b/05_Profiling a simple md code.ipynb @@ -3,23 +3,27 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Profiling\n", "<div class=\"dateauthor\">\n", - "13 June 2023 | Jan H. Meinke\n", + "11 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Profiler" @@ -28,51 +32,67 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ - "cprofiler (standard module)" + "[cprofiler][] (standard module)\n", + "\n", + "[cprofiler]: https://docs.python.org/3.12/library/profile.html" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ - "line_profiler" + "[line_profiler][]\n", + "\n", + "[line_profiler]: https://kernprof.readthedocs.io/en/latest/" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ - "Intel Advisor since 2017 beta" + "[Intel VTune][]\n", + "\n", + "[Intel VTune]: https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2024-1/python-code-analysis.html" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, "tags": [] }, "source": [ - "Scalene" + "[Scalene][]\n", + "\n", + "[Scalene]: https://github.com/plasma-umass/scalene" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -85,9 +105,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Before you start to optimize a program, you should generate a profile. A profile shows how much time a program spends in which function, line of code, or even assembler instruction.\n", @@ -100,9 +122,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Profiling a simple particle dynamics code" @@ -111,9 +135,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "pair_force()\n", @@ -131,9 +157,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -278,9 +306,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## The main program" @@ -289,9 +319,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Initialization" @@ -300,7 +332,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "# 1000 particles\n", @@ -321,9 +359,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### The algorithm\n", @@ -339,9 +379,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -354,9 +396,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "This took quite some time. Let's measure how long it takes. Add a %%timeit statement just before nsteps (same line)." @@ -365,9 +409,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## The base line" @@ -377,9 +423,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -392,9 +440,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "OK, that's our base line. Next, we want to know where all this time is spent. I mentioned the cprofile module at the beginning. IPython has a magic for that called %%prun. Use it in front of the loop this time." @@ -403,9 +453,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Profiling with %%prun" @@ -414,7 +466,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "%%prun -r nsteps=1\n", @@ -426,9 +484,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ " 2003007 function calls in 8.561 seconds\n", @@ -452,9 +512,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The overhead shouldn't be too bad. I got about 10%. Most of the time (about 80%) is spent in pair_force. And 20% of that time is spent on np.array>" @@ -463,9 +525,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Line by line profiling" @@ -474,9 +538,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Unfortunately, this is a rather coarse grained profile. We don't know which part is the expensive part of this calculation and what we can do about it." @@ -486,9 +552,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -499,9 +567,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -511,9 +581,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "Total time: 9.86691 s \n", @@ -531,9 +603,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Timing individual operations" @@ -554,9 +628,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -568,9 +644,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -582,9 +660,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -596,9 +676,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -610,9 +692,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "We can now change the code so it calculates dx, dy, and dz first and then uses them later in the calculation. We can also use numba to speed up the simulation." @@ -621,9 +705,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "## Exercise: Time the other operations and optimize the code" @@ -643,9 +729,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -657,7 +743,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/06_LocalParallel.ipynb b/06_LocalParallel.ipynb index e716e9c43cba437fc60fb2df5e3c49a0642bcbdc..610d2ae67f9abeb6f1cfbde4a95df7562bba317a 100644 --- a/06_LocalParallel.ipynb +++ b/06_LocalParallel.ipynb @@ -3,40 +3,45 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Interactive Parallel Computing with IPython Parallel\n", "\n", "<div class=\"dateauthor\">\n", - "13 June 2023 | Jan H. Meinke\n", + "11 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "*Computers have more than one core.* Wouldn't it be nice if we could use all the cores of our local machine or a compute node of a cluster from our [Jupyter][IP] notebook? \n", "\n", "Click on the ``+``-sign at the top of the Files tab on the left to start a new launcher. In the launcher click on Terminal. A terminal will open as a new tab. Grab the tab and pull it to the right to have the terminal next to your notebook.\n", "\n", - "**Note**: The terminal does not have the same modules loaded as the notebook. To fix that type `source $PROJECT_training2318/hpcpy23`.\n", + "**Note**: The terminal does not have the same modules loaded as the notebook. To fix that type `source $PROJECT_training2421/hpcpy24`.\n", "\n", "In the terminal type ``ipcluster``. You'll see the help message telling you that you need to give it subcommand. Take a look at the message and then enter \n", "\n", "``` bash\n", - "export OMP_NUM_THREADS=32\n", + "export OMP_NUM_THREADS=XX\n", "ipcluster start --n=4\n", "```\n", + "with XX=32 if you are on a JUSUF node and XX=4 if you are on a JSCCloud instance.\n", "\n", - "This will start a cluster with four engines and should limit the number of threads to 32 threads per engine to avoid oversubscription.\n", + "This will start a cluster with four engines and should limit the number of threads per engine to avoid oversubscription.\n", "\n", "> If you use the classical [Jupyter][IP] notebook, this is even easier if you have the cluster extension installed. (We don't have that one on our JupyterHub, yet). One of the tabs of your browser has the title \"Home\". If you switch to that tab, there are several tabs within the web page. One of them is called \"IPython Clusters\". Click on \"IPython Clusters\", increase the number of engines in the \"default\" profile to 4, and click on Start. The status changes from stopped to running. After you did that come back to this tab.\n", "\n", @@ -59,9 +64,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Overview" @@ -105,9 +112,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Now let's see how we access the \"Cluster\". Originally, [ipyparallel][IPp] was developed as a part of [IPython][IP]. In the meantime it's developed separately. It is used to access the engines, we just started. We first need to import Client.\n", @@ -227,9 +236,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Before we go into the details of the interface of a `DirectView`--that's the name of the class, let's look at IPython magic.\n", @@ -243,9 +254,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -362,9 +375,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -456,18 +471,34 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Magic commands are blocking by default, i.e., the next cell can only be executed after all the engines have finished their work. We can pass the option ``--noblock`` to change that behavior." ] }, + { + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "In the next cell use XX=32 if you are on a JUSUF node and XX=4 if you are on a JSCCloud instance:" + ] + }, { "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -477,7 +508,7 @@ "source": [ "%%px --local\n", "import threadpoolctl\n", - "threadpoolctl.threadpool_limits(limits=32, user_api='blas')" + "threadpoolctl.threadpool_limits(limits=4, user_api='blas')" ] }, { @@ -875,9 +906,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -997,9 +1030,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1010,9 +1045,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1066,7 +1103,7 @@ } }, "source": [ - "Latency (the time until something happens) and bandwidth (the amount of data we get through the network) are two important properties of your parallel system that define what is practical and what is not. We will use the ``%timeit`` magic to measure these properties. ``%timeit`` and its sibbling ``%%timeit`` measure the run time of a statement (cell in the case of ``%%timeit``) by executing the statement multiple times (by default at least 7 repeats). For short running routines a loop of many executions is performed per repeat and the minimum time measured is then displayed. The number of loops and the number of repeats can be adjusted. Take a look at the documentation. Give it a try." + "Latency (the time until something happens) and bandwidth (the amount of data we get through the network) are two important properties of your parallel system that define what is practical and what is not. We will use the ``%timeit`` magic to measure these properties. ``%timeit`` and its sibling ``%%timeit`` measure the run time of a statement (cell in the case of ``%%timeit``) by executing the statement multiple times (by default at least 7 repeats). For short running routines a loop of many executions is performed per repeat and the minimum time measured is then displayed. The number of loops and the number of repeats can be adjusted. Take a look at the documentation. Give it a try." ] }, { @@ -1343,6 +1380,20 @@ "dview.block=True" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px --local\n", + "to_delete=[\"a\", \"b\", \"c\", \"A\", \"B\", \"C\"]\n", + "for x in dir():\n", + " if x in to_delete:\n", + " del globals()[x]\n", + " print(f'{x} deleted.')" + ] + }, { "cell_type": "code", "execution_count": null, @@ -1366,8 +1417,8 @@ }, "outputs": [], "source": [ - "%timeit -n 20 dview.push(dict(a=a))\n", - "%timeit -n 20 dview.push(dict(a=a[:128*1024]))\n", + "# %timeit -n 20 dview.push(dict(a=a))\n", + "# %timeit -n 20 dview.push(dict(a=a[:128*1024]))\n", "%timeit -n 20 dview.push(dict(a=a[:64*1024]))\n", "%timeit -n 20 dview.push(dict(a=a[:32*1024]))\n", "%timeit -n 20 dview.push(dict(a=a[:16*1024]))\n", @@ -1398,8 +1449,8 @@ }, "outputs": [], "source": [ - "bwmax = len(rc) * 256 * 8 / 9.83-3\n", - "bwmin = len(rc) * 8 / 4.25e-3\n", + "bwmax = len(rc) * 64 * 8 / 42.2e-3\n", + "bwmin = len(rc) * 8 / 18.5e-3\n", "print(\"The bandwidth is between %.2f kB/s and %.2f kB/s.\" %( bwmin, bwmax))" ] }, @@ -1613,7 +1664,7 @@ }, "outputs": [], "source": [ - "n = 4096\n", + "n = 2048\n", "A = np.random.random([n, n])\n", "B = np.random.random([n, n])" ] @@ -1702,26 +1753,13 @@ }, "outputs": [], "source": [ - "%%timeit -o\n", + "%%timeit\n", "c00 = np.dot(a00, b00) + np.dot(a01, b10)\n", "c01 = np.dot(a00, b01) + np.dot(a01, b11)\n", "c10 = np.dot(a10, b00) + np.dot(a11, b10)\n", "c11 = np.dot(a10, b01) + np.dot(a11, b11)" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "skip" - } - }, - "outputs": [], - "source": [ - "_.best / tdot.best" - ] - }, { "cell_type": "markdown", "metadata": { @@ -1812,6 +1850,20 @@ "c11 = c11h.get()" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px --local\n", + "to_delete=[\"a\", \"b\", \"c\", \"A\", \"B\", \"C\"]\n", + "for x in dir():\n", + " if x in to_delete:\n", + " del globals()[x]\n", + " print(f'{x} deleted.')" + ] + }, { "cell_type": "code", "execution_count": null, @@ -1845,21 +1897,14 @@ "\n", "The code is not any faster, because our implementation of numpy already blocks the matrices and uses all cores, but it shows the principle. Also, remember that we are transferring the data to the engines in every call!" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy" }, "language_info": { "codemirror_mode": { @@ -1871,7 +1916,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/07_LocalTaskParallel.ipynb b/07_LocalTaskParallel.ipynb index bdd868d7b777474ff12dc906f73536d85a982960..0fd1c4552d076d7538b689da9ca020fcfc1a4332 100644 --- a/07_LocalTaskParallel.ipynb +++ b/07_LocalTaskParallel.ipynb @@ -7,6 +7,15 @@ "# Parallel, Task-Based Computing with Load Balancing on your Local Machine" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<div class=\"dateauthor\">\n", + "11 June 2024 | Jan H. Meinke\n", + "</div>" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -19,7 +28,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "from ipyparallel import Client" @@ -28,7 +39,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "rc = Client()" @@ -44,7 +57,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "lview = rc.load_balanced_view()" @@ -53,7 +68,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "%px import numpy as np\n", @@ -63,21 +80,25 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "%%px --local\n", "import threadpoolctl\n", - "threadpoolctl.threadpool_limits(limits=32, user_api='blas')" + "threadpoolctl.threadpool_limits(limits=4, user_api='blas')" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "n = 4096\n", + "n = 2048\n", "A = np.random.random([n, n])\n", "B = np.random.random([n, n])\n", "C = np.dot(A, B)" @@ -86,7 +107,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "tnp = %timeit -o A@B" @@ -95,7 +118,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "a00 = A[:n // 2, :n // 2]\n", @@ -111,7 +136,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "c00h = lview.apply(lambda a, b, c, d : np.dot(a, b) + np.dot(c, d), a00, b00, a01, b10)\n", @@ -123,7 +150,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "c00h.wait()\n", @@ -135,7 +164,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "c00 = c00h.get()\n", @@ -177,6 +208,26 @@ "It's probably about the same, so why would we use the *load-balanced view*? For starters, we can throw more tasks at our engines than there are workers. In the previous example, we split our matrices in four blocks. Let's write a function that takes a square matrix with n rows and columns, where n is multiple of threshold, that uses tiles of size threshold." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Before we continue let's free some memory (in the drop-down menu that opens on a right mouse click you can open the Variable Inspector that also shows the size of the arrays):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "to_delete=[\"a00\", \"a01\", \"a10\", \"a11\", \"b00\", \"b01\", \"b10\", \"b11\", \"c00\", \"c01\", \"c10\", \"c11\"]\n", + "for x in dir():\n", + " if x in to_delete:\n", + " del globals()[x]\n", + " print(f'{x} deleted.')" + ] + }, { "cell_type": "code", "execution_count": null, @@ -337,6 +388,13 @@ "BlockMatrixMultiply?" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**The next cells will not work on a Cloud instance due to the amount of RAM required.** (A dummy value was inserted to avoid accidental killing of the engines)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -350,7 +408,7 @@ "metadata": {}, "outputs": [], "source": [ - "n = 16384\n", + "n=8 # 16384 # switch value only on a computer with sufficient RAM!\n", "A = np.random.random([n, n])\n", "B = np.random.random([n, n])\n", "C = np.dot(A, B)" @@ -396,9 +454,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy" }, "language_info": { "codemirror_mode": { @@ -410,7 +468,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/08_Numba vectorize.ipynb b/08_Numba vectorize.ipynb index 7a3da648d7ba77c8356e15dd321ed2583dc9fb52..5fe81a1c880237bc3fd110cd59b31405cad223e1 100644 --- a/08_Numba vectorize.ipynb +++ b/08_Numba vectorize.ipynb @@ -3,24 +3,28 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Numba vectorize\n", "\n", "<div class=\"dateauthor\">\n", - "13 June 2023 | Jan H. Meinke\n", + "11 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Numba offers a decorator `@vectorize` that allows us to generate **fast** [ufuncs](https://numpy.org/doc/stable/reference/ufuncs.html). " @@ -30,9 +34,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -46,9 +52,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## A simple trig function" @@ -57,9 +65,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Let's implement a simple trig function:" @@ -69,9 +79,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -82,9 +94,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -96,9 +110,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Passing numpy arrays as arguments" @@ -108,13 +124,15 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ - "n = 1000000\n", + "n = 1_000_000\n", "a = np.ones(n, dtype='int8')\n", "b = 2 * a" ] @@ -123,9 +141,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -135,20 +155,24 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ - "The function sinasinb is only defined for scalars, so we have to do something if we want to pass an array." + "The error is expected. The function `sinasinb` is only defined for scalars, so we have to do something if we want to pass an array." ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## numpy.vectorize" @@ -157,9 +181,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "NumPy provides the function `vectorize`." @@ -169,9 +195,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -182,9 +210,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -194,9 +224,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## numba.vectorize" @@ -205,9 +237,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Dynamic ufuncs" @@ -217,9 +251,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "-" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -230,9 +266,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -242,9 +280,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The function usinacosb is a *dynamic ufunc*. The arguments are determined when the function is called and only then is the function compiled." @@ -253,9 +293,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "### Eager compilation" @@ -264,9 +306,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Assume, we know with what kind of arguments a function is called, then numba can generate code as soon as we call numba vectorize. The decorator can take a list of [type specification](https://numba.readthedocs.io/en/stable/reference/types.html#signatures) strings of the form \"f8(f8, f8)\", where the type before the parentheses is the return type and the types within the parentheses are the argument types." @@ -276,9 +320,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -290,9 +336,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "### target" @@ -301,9 +349,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "If I use eager compilation I can give an addition keyword argument: *target*." @@ -312,9 +362,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "target=\"cpu\": default, run in a single thread on the CPU" @@ -323,9 +375,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "target=\"parallel\": run in multiple threads" @@ -334,9 +388,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "target=\"cuda\": run on a CUDA-capable GPU" @@ -346,9 +402,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -359,19 +417,28 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ + "numba.set_num_threads(16) # Limit the number of threads numba uses.\n", "%timeit pusinacosb(a,b)" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "skip" + }, + "tags": [] + }, "outputs": [], "source": [ "n = 100_000_000\n", @@ -382,19 +449,29 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "skip" + }, + "tags": [] + }, "outputs": [], "source": [ "%timeit usinacosb(a, b)\n", - "%timeit pusinacosb(a, b) " + "for t in [2, 4, 8, 16,32]:\n", + " numba.set_num_threads(t) # Limit the number of threads numba uses.\n", + " %timeit pusinacosb(a, b) " ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Exercise: The Mandelbrot set" @@ -403,9 +480,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The Mandelbrot set is the set of points *c* in the complex plane for which" @@ -413,7 +492,13 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "$$z_{i+1} = z_i^2 + c$$" ] @@ -421,9 +506,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "does not diverge.\n", @@ -434,9 +521,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Since it is impracticable to calculate an infinite number of iterations, one usually sets an upper limit for the number of iterations, for example, 20." @@ -445,9 +534,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "### Escape time algorithm" @@ -456,9 +547,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "A simple implementation of this algorithm is the following:" @@ -468,9 +561,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -497,9 +592,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Todo:\n", @@ -515,6 +612,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -524,25 +622,13 @@ "source": [ "%timeit M = escape_time_vec(P, 50)" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "skip" - }, - "tags": [] - }, - "outputs": [], - "source": [] } ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -554,7 +640,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/09_NumbaIntro.ipynb b/09_NumbaIntro.ipynb index 084c6f9233c9a1a994a2b4632e124286b804162a..6fd62c78ceebfd44ffd499f8d8b0cf899665299e 100644 --- a/09_NumbaIntro.ipynb +++ b/09_NumbaIntro.ipynb @@ -11,7 +11,7 @@ "# Introduction to Numba's jit compiler\n", "\n", "<div class=\"dateauthor\">\n", - "14 June 2023 | Jan H. Meinke\n", + "12 June 2024 | Jan H. Meinke\n", "</div>" ] }, @@ -278,7 +278,7 @@ "Sum: 5033.24 in 0.717281 µs. 13941.5 MFLOP. \n", "```\n", "\n", - "The function takes about 0.7 µs. This is more than 10,000 times faster than the interpreted Python loop. \n", + "The function takes about 0.7 µs. This is more than 1,000 times faster than the interpreted Python loop. \n", "Wouldn't it be great if we could take the Python code in `python_sum` and compile it to machine \n", "code to get some of this speedup?" ] @@ -542,7 +542,7 @@ } }, "source": [ - "OK, the Python loop is about 30000 times slower than numpy's `dot` method. Let's see if we can't make this faster using numba. This time, we'll use `jit` as a decorator." + "OK, the Python loop is about 4,500 times slower than numpy's `dot` method. Let's see if we can't make this faster using numba. This time, we'll use `jit` as a decorator." ] }, { @@ -970,7 +970,7 @@ } }, "source": [ - "Now, this is interesting. If you look at Line 3 of the version called with float32, it still defines\n", + "Now, this is interesting. If you look at Line 4 of the version called with float32, it still defines\n", "`res` as a double precision number! This will prevent it from vectorizing the loop using single \n", "precision arguments, which potentially cuts performance in half!\n", "\n", @@ -1050,8 +1050,8 @@ "source": [ "Doesn't look like it. \n", "\n", - "Let's dig a little deeper. A speedup would come from the fact that the Skylake-X processor used for \n", - "JUWELS Cluster can operate on 16 single precision numbers at once compared to 8 double precision \n", + "Let's dig a little deeper. A speedup would come from the fact that the AMD EPYC 7742 processor used for \n", + "JUSUF Cluster can operate on 8 single precision numbers at once compared to 4 double precision \n", "numbers, but that assumes it's using the right instructions. For that we have to look at the assembler.\n", "\n", "We define a helper function to find instructions in the assembler code." @@ -1224,9 +1224,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -1238,7 +1238,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/10_Speeding up your code with Cython.ipynb b/10_Speeding up your code with Cython.ipynb index f3b666962390f340416fd0781135126104bb9d25..4de402df09212343e152b8532fbee652fb7d5782 100644 --- a/10_Speeding up your code with Cython.ipynb +++ b/10_Speeding up your code with Cython.ipynb @@ -20,7 +20,7 @@ }, "source": [ "<div class=\"dateauthor\">\n", - "14 June 2023 | Jan H. Meinke\n", + "12 June 2024 | Jan H. Meinke\n", "</div>" ] }, @@ -170,7 +170,7 @@ } }, "source": [ - "Elementwise access to NumPy arrays is often slower as elementwise access to lists.\n", + "Elementwise access to NumPy arrays is often slower than elementwise access to lists.\n", "\n", "Now let us invoke Cython" ] @@ -603,6 +603,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -634,6 +635,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -655,7 +657,7 @@ } }, "source": [ - "Our data set is too small to benefit from parallelization. The overhead due to starting multiple threads is too large for this problem size." + "Our data set is too small to benefit from parallelization. The overhead due to starting multiple threads is too large for this problem size. Play around with the number of threads to see how many threads are beneficial." ] }, { @@ -709,9 +711,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "\n", @@ -722,9 +726,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -734,9 +740,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Building a Cython extension outside of a notebook" @@ -804,7 +812,13 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "```python\n", "from setuptools import Extension, setup\n", @@ -827,12 +841,14 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ - "**Exercise:** Take the Cython code that defines dot using `prange` in [Adding OpenMP](#Adding-OpenMP) and write it to `dot.pyx` using the `%%writefile` magic. Make sure to comment out the `cython magic`. Take the above code for setup.py and copy it into a file called `setup.py`. Change the setup.py code to build a module named dot and use `dot.pyx`. Then build the extension in a terminal window with the command. **Note:** Make sure our environment is loaded `source hpcpy21`.\n", + "**Exercise:** Take the Cython code that defines dot using `prange` in [Adding OpenMP](#Adding-OpenMP) and write it to `dot.pyx` using the `%%writefile` magic. Make sure to comment out the `cython magic`. Take the above code for setup.py and copy it into a file called `setup.py`. Change the setup.py code to build a module named dot and use `dot.pyx`. Then build the extension in a terminal window with the command. **Note:** Make sure our environment is loaded `source $PROJECT_traingin2421/hpcpy24`.\n", "\n", "```bash\n", "python setup.py build_ext --inplace\n", @@ -840,7 +856,7 @@ "\n", "If the build fails with `#include \"numpy/arrayobject.h\" not found`, you need to add the include path for numpy. Luckily, numpy has a function for that: `numpy.get_include()`. Add the include path to the extra_compile_args. Include paths are added using `-I/path/to/be/included`. Since `setup.py` is a Python script you can call `numpy.get_include()` in the script and don't have to hardcode the path.\n", "\n", - "Write a test program that loads and tests the extension. Add a doc string to the dot function and include an example section like this:\n", + "Let's add a doc string to the dot function and include an example section like this that loads and tests the extension:\n", "\n", "```python\n", "def dot(...):\n", @@ -864,9 +880,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "### Comparison with Numba" @@ -875,9 +893,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Numba, can generate fast functions, too." @@ -887,9 +907,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -906,9 +928,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -925,9 +949,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -938,9 +964,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -951,9 +979,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -965,9 +995,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -978,9 +1010,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -994,9 +1028,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1008,9 +1044,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1030,9 +1068,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Finally, let's compare the performance for a larger data set. Remember the last version of our dot function uses OpenMP." @@ -1042,9 +1082,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1054,15 +1096,18 @@ "with threadpoolctl.threadpool_limits(16):\n", " %timeit dot(v,w)\n", "%timeit udot(v,w)\n", + "%timeit udotg(v,w)\n", "%timeit np.dot(v,w)" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Cython and classes" @@ -1071,9 +1116,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Sometimes, we want to do more than just wrap a function. We might want an efficient data type that implements some operators, for example. For this Cython allows us to declare classes just like in Python:" @@ -1083,9 +1130,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1103,9 +1152,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1116,9 +1167,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1128,9 +1181,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Extension types" @@ -1139,9 +1194,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "There is a second type of classes called *[extension types](http://cython.readthedocs.io/en/latest/src/userguide/extension_types.html)*. An extension type stores its members and methods in a C struct instead of a Python dictionary. This makes them more efficient but also more restrictive. Let's look at an example:" @@ -1151,9 +1208,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1184,9 +1243,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The first thing to note is the definition using `cdef class`. It's the reason extension types are also referred to as cdef classes. We can define functions that are only visible to C using `cdef` and Python functions using `def` (or both at once with `cpdef`). For functions defined with `cdef`, we need to give the type of self as well as a return type.\n", @@ -1198,9 +1259,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1211,9 +1274,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "### Exercise" @@ -1222,9 +1287,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Try which methods of Point you can call." @@ -1267,9 +1334,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1281,9 +1350,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [] @@ -1291,9 +1362,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -1305,7 +1376,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/11_Writing your own Python bindings.ipynb b/11_Writing your own Python bindings.ipynb index 96beaf4ce4b378d1374c460ca1cf7cc266919ff9..3b1a3b1c70d1f19fc8226212befcc3650bad6950 100644 --- a/11_Writing your own Python bindings.ipynb +++ b/11_Writing your own Python bindings.ipynb @@ -3,9 +3,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Writing language bindings" @@ -13,19 +15,27 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "<div class=\"dateauthor\">\n", - "14 June 2023 | Jan H. Meinke\n", + "12 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Why bindings?" @@ -77,9 +87,12 @@ { "cell_type": "markdown", "metadata": { + "editable": true, + "jp-MarkdownHeadingCollapsed": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Preparations\n", @@ -92,15 +105,17 @@ "\n", "Wait until the build has finished and then continue with this notebook.\n", "\n", - "**Tip:** You can open a terminal from within JupyterLab by going to File->New->Terminal. To get the right environment in a terminal `source $PROJECT_training2318/hpcpy23`." + "**Tip:** You can open a terminal from within JupyterLab by going to File->New->Terminal. To get the right environment in a terminal `source $PROJECT_training2421/hpcpy24`." ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "## Ctypes" @@ -186,9 +201,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -199,9 +216,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -217,9 +236,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "What if word_frequency had been written in Fortran?" @@ -228,9 +249,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "```Fortran\n", @@ -247,9 +270,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "We can access Fortran functions almost like C functions. The exact function name may differ, though. The default symbol \n", @@ -261,14 +286,16 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "### Exercise\n", "Use the terminal that you used earlier to run `build.sh` or open a new one. Make sure you are in the \n", - "tutorial directory. Source `hpcpy23` using `source $PROJECT/hpcpy23`. Change into code/textstats/ and compile \n", + "tutorial directory. Source `hpcpy24` using `source $PROJECT/hpcpy24`. Change into code/text_stats/ and compile \n", "the file word_frequency.F90 with the following command:\n", "\n", "```bash\n", @@ -288,9 +315,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -306,9 +335,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "If you compiled the library with the option `-fno-underscoring`, you could use the original declaration without underscore with libwf.so.\n", @@ -485,9 +516,14 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "notes" + }, + "tags": [] + }, "source": [ - "**Note** Unfortunately, this doesn't work the way it's supposed to. Although `-L` should add the path to the library to the search path of the linker, the linker still doesn't find the library. To make it work, I added the path to libtext_stats.so to the `LD_LIBRARY_PATH` when the kernel is loaded." + "**Note** Unfortunately, this doesn't work the way it's supposed to *inside a JupyterLab*. Although `-L` should add the path to the library to the search path of the linker, the linker still doesn't find the library. To make it work, I added the path to libtext_stats.so to the `LD_LIBRARY_PATH` when the kernel is loaded." ] }, { @@ -928,6 +964,23 @@ } }, "outputs": [], + "source": [ + "p = PyPoint3D(1,1,1)\n", + "p.translate(-0.5, -0.5, -0.5)\n", + "p.coordinates()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "skip" + }, + "tags": [] + }, + "outputs": [], "source": [ "t_point_cython = %timeit -o p = PyPoint3D(1,1,1); p.translate(-0.5, -0.5, -0.5);p.coordinates()" ] @@ -1035,9 +1088,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1047,9 +1102,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Using the extension" @@ -1059,9 +1116,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1072,9 +1131,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1084,9 +1145,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Note that we didn't have to convert our string at all. It's done automatically by PyBind11." @@ -1095,9 +1158,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Wrapping a class with Pybind11" @@ -1106,9 +1171,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "PyBind11 can deal with classes, too. The following code wraps the Point3D class:" @@ -1117,9 +1184,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "```c++\n", @@ -1282,7 +1351,10 @@ }, "outputs": [], "source": [ - "!f2py -c code/point/points.f90 -m points_f" + "buildlog = !f2py -c code/point/points.f90 -m points_f\n", + "print('\\n'.join(buildlog[:8]))\n", + "print('...')\n", + "print('\\n'.join(buildlog[-1:]))" ] }, { @@ -1481,9 +1553,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -1495,7 +1567,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/12_Introduction to MPI.ipynb b/12_Introduction to MPI.ipynb index dd9cf88fce022a9e30469bc08cacd06ba6ee6b46..5639bc53e2b3fb3d650aece54fc76af9d1095c12 100644 --- a/12_Introduction to MPI.ipynb +++ b/12_Introduction to MPI.ipynb @@ -3,24 +3,28 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Introduction to MPI\n", "\n", "<div class=\"dateauthor\">\n", - "15 June 2023 | Jan H. Meinke\n", + "13 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "MPI (Message Passing Interface) is the most used protocol for communicating between processes. It doesn't matter if the processes that want to talk to each other are on the same or different nodes (i.e., computers). In this tutorial, we'll use `mpi4py` to learn about MPI and its API.\n", @@ -192,6 +196,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -200,6 +205,8 @@ "outputs": [], "source": [ "%%writefile hello_mpi.py\n", + "#!/usr/bin/env python3\n", + "\n", "from mpi4py import MPI\n", "comm = MPI.COMM_WORLD\n", "rank = comm.Get_rank()\n", @@ -211,6 +218,22 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "!chmod u+x hello_mpi.py" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -218,15 +241,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --reservation tr2318-20230615-cpu python3 hello_mpi.py " + "!srun -n 4 -p batch -A training2421 --reservation hpcwp_20240613 ./hello_mpi.py " ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Point to point" @@ -276,9 +301,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -308,6 +335,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -315,15 +343,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --time 00:10:00 --reservation tr2318-20230615-cpu python3 hello_ptp.py" + "!srun -n 4 -p batch -A training2421 --time 00:10:00 --reservation hpcwp_20240613 python3 hello_ptp.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "**Note**, how we used `rank` to perform different work on the task with rank 0 and the task with rank 1 using if statements. This is a common pattern in MPI programs. The task with rank 0 is often referred to as *root*." @@ -369,6 +399,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -398,6 +429,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -405,15 +437,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --time 00:10:00 --reservation tr2318-20230615-cpu python3 hello_sendrecv.py" + "!srun -n 4 -p batch -A training2421 --time 00:10:00 --reservation hpcwp_20240613 python3 hello_sendrecv.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Parallel reduction" @@ -492,6 +526,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -527,9 +562,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Since we are dealing with NumPy arrays, we can use the efficient uppercase versions of the MPI calls. Scatter distributes an array evenly among all nodes. Note, the sendbuf only needs to be allocated on node zero, but the variable must exist everywhere." @@ -551,6 +588,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -558,15 +596,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --time 00:10:00 --reservation tr2318-20230615-cpu python3 mpi_reduction.py" + "!srun -n 4 -p batch -A training2421 --time 00:10:00 --reservation hpcwp_20240613 python3 mpi_reduction.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Upper vs. lowercase in mpi4py" @@ -575,17 +615,25 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ - "`mpi4py` offers two version of many calls. The first one is written in uppercase. It uses memory buffers, e.g., `numpy.array`, and maps the call directly to the appropriate C call. The second version is written in lower case and takes arbitrary Python object. The result is given as the return value. Note, that for the uppercase versions all `a_partial` must have the same size!" + "`mpi4py` offers two versions of many calls. The first one is written in uppercase. It uses memory buffers, e.g., `numpy.array`, and maps the call directly to the appropriate C call. The second version is written in lowercase and takes arbitrary Python objects. The result is given as the return value. Note, that for the uppercase versions all `a_partial` must have the same size!" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "```python\n", "a_partial = numpy.empty(N)\n", @@ -620,6 +668,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -658,6 +707,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -665,15 +715,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --time 00:10:00 --reservation tr2318-20230615-cpu python3 mpi_upper.py" + "!srun -n 4 -p batch -A training2421 --time 00:10:00 --reservation hpcwp_20240613 python3 mpi_upper.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "The following code uses the lowercase versions of the calls and works independent of the size of a_partial:" @@ -682,6 +734,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -707,6 +760,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -744,6 +798,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -751,15 +806,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --time 00:10:00 --reservation tr2318-20230615-cpu python3 mpi_lower.py" + "!srun -n 4 -p batch -A training2421 --time 00:10:00 --reservation hpcwp_20240613 python3 mpi_lower.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Now, `a_all` contains a `list` of `np.array`s.\n", @@ -800,9 +857,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Domain decomposition" @@ -839,9 +898,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -854,9 +915,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -868,7 +931,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "plt.figure(figsize=(15, 5))\n", @@ -878,9 +947,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "The system is basically a square grid. " @@ -1010,20 +1081,26 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ - "I would recommend using an editor for this exercise. Jupyter Lab comes with an editor that supports syntax highlighting but no auto completion. You can find it under File->New->Text File. The new file will be called `Untitled.txt`. You can change the file name by righ-clicking on the editor tab or right-clicking on the file in the file browser view on the left." + "I would recommend using an editor for this exercise. Jupyter Lab comes with an editor that supports syntax highlighting but no auto completion. You can find it under File->New->Text File. The new file will be called `Untitled.txt`. You can change the file name by righ-clicking on the editor tab or right-clicking on the file in the file browser view on the left. Use one of the srun commands we used earlier to start your program from a terminal.\n", + "\n", + "**Note**: The terminal does not have the same modules loaded as the notebook. To fix that type `source $PROJECT_training2421/hpcpy24`." ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "1. Take the program from the [Stencil][TV_Stencils] and use a 1d domain decomposition as described \n", @@ -1084,9 +1161,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "$$t_1 = t_s + t_p$$" @@ -1095,9 +1174,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The runtime for $n$ processors is then" @@ -1228,9 +1309,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1242,9 +1325,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1255,6 +1340,10 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [] }, "outputs": [], @@ -1268,9 +1357,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "## Speedup using Amdahl's law" @@ -1279,7 +1370,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "plt.figure(figsize=(5, 2.5), dpi=150)\n", @@ -1291,9 +1388,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "## IPyParallel and MPI" @@ -1302,9 +1401,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "### Starting the engines" @@ -1336,6 +1437,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1344,12 +1446,13 @@ "source": [ "Click on the ``+``-sign at the top of the Files tab on the left to start a new launcher. In the launcher click on Terminal. A terminal will open as a new tab. Grab the tab and pull it to the right to have the terminal next to your notebook.\n", "\n", - "**Note**: The terminal does not have the same modules loaded as the notebook. To fix that type `source $PROJECT_training2318/hpcpy23`." + "**Note**: The terminal does not have the same modules loaded as the notebook. To fix that type `source $PROJECT_training2421/hpcpy24`." ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1362,7 +1465,7 @@ "\n", "```bash\n", "export OMP_NUM_THREADS=32\n", - "srun -n 4 -c 32 --ntasks-per-node 4 --time 00:30:00 -A training2318 --reservation tr2318-20230615-cpu ipengine start\n", + "srun -n 4 -c 32 --ntasks-per-node 4 --time 00:30:00 -A training2421 --reservation hpcwp_20240613 ipengine start\n", "```\n", "\n", "**Note**, you can can start the controller and the engines in separate terminals. That will keep the output separate." @@ -1371,9 +1474,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "### Connecting to the engines" @@ -1394,9 +1499,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1740,16 +1847,22 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "skip" + }, + "tags": [] + }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023 (local)", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { diff --git a/13_Introduction to CuPy.ipynb b/13_Introduction to CuPy.ipynb index e2cafd82e34cb46fdf02a52671fb6913d4869f42..fc083c5454f9d3c5dea21557539f89e4c02761e7 100644 --- a/13_Introduction to CuPy.ipynb +++ b/13_Introduction to CuPy.ipynb @@ -10,7 +10,7 @@ "source": [ "# Introduction to CuPy\n", "<div class=\"dateauthor\">\n", - "15 June 2023 | Jan H. Meinke\n", + "13 June 2024 | Jan H. Meinke\n", "</div>\n", "<img src=\"images/cupy.png\" style=\"float:right\">" ] @@ -134,12 +134,13 @@ }, "outputs": [], "source": [ - "!srun --pty -N 1 -p gpus -A training2318 --time 00:10:00 --reservation tr2318-20230615-gpu python3 cupy_matrix_mul.py" + "!srun --pty -N 1 -p gpus -A training2421 --time 00:10:00 --reservation hpcwp_gpu_20240613 python3 cupy_matrix_mul.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -207,6 +208,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -214,30 +216,18 @@ }, "outputs": [], "source": [ - "!srun --pty -N 1 -p gpus -A training2318 --time 00:10:00 --reservation tr2318-20230615-gpu python3 cupy_matrix_mul_w_timing.py" + "!srun --pty -N 1 -p gpus -A training2421 --time 00:10:00 --reservation hpcwp_gpu_20240613 python3 cupy_matrix_mul_w_timing.py" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, "tags": [] }, - "outputs": [], - "source": [ - "!srun --pty -N 1 -p develgpus -A training2318 --time 00:10:00 python3 cupy_matrix_mul_w_timing.py" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "skip" - } - }, "source": [ "### Exercise\n", "In [Think Vector][TV], you [calculated the Mandelbrot set][TV_Mandelbrot] using [NumPy][] and vectorization. Take either your solution or ours and convert it to [CuPy][]. Visualize the result.\n", @@ -261,9 +251,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -281,6 +273,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -288,7 +281,7 @@ }, "outputs": [], "source": [ - "!srun -p gpus -A slbio python cupy_mandelbrot_exercise.py\n", + "!srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 python cupy_mandelbrot_exercise.py\n", "image = matplotlib.image.imread(\"cupy_mandelbrot_exercise.png\")\n", "plt.imshow(image)\n", "plt.axis('off')" @@ -350,6 +343,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -382,6 +376,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -389,12 +384,13 @@ }, "outputs": [], "source": [ - "!srun -p gpus -A training2318 --reservation tr2318-20230615-gpu python3 cupy_matrix_mul_w_timing2.py" + "!srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 python3 cupy_matrix_mul_w_timing2.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -408,6 +404,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -427,9 +424,9 @@ " B = numpy.random.random((N, N)).astype(numpy.float32)\n", " #C = A @ B\n", "\n", - " for nt in [16, 32, 64, 128, 256]:\n", + " for nt in [16, 32, 64, 128]: # This part is not required for the exercise\n", " t0 = time.time()\n", - " with threadpoolctl.threadpool_limits(limits=nt, user_api='blas'):\n", + " with threadpoolctl.threadpool_limits(limits=nt, user_api='openmp'): # May have to use blas instead of openmp in other environemnts\n", " for r in range(repeats):\n", " C = A @ B\n", " t1 = time.time()\n", @@ -441,6 +438,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -448,12 +446,13 @@ }, "outputs": [], "source": [ - "!srun -p batch -n 1 -c 256 -A training2318 --pty --reservation tr2318-20230615-cpu python3 numpy_matrix_mul_w_timing2.py" + "!OMP_NUM_THREADS=128 srun -p batch -n 1 -c 128 --hint=nomultithread -A training2421 --pty --reservation hpcwp_20240613 python3 numpy_matrix_mul_w_timing2.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -575,6 +574,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -582,12 +582,13 @@ }, "outputs": [], "source": [ - "!srun -p gpus -A training2318 --reservation tr2318-20230615-gpu python3 cupy_to_and_fro.py" + "!srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 python3 cupy_to_and_fro.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -670,7 +671,7 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2023 (local)", "language": "python", "name": "hpcpy23" }, @@ -684,7 +685,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.12.3" } }, "nbformat": 4, diff --git a/14_CUDA for Python.ipynb b/14_CUDA for Python.ipynb index 4964eb38d57825c9a645595c8bc66a1a93e61f73..32062e163c9ef8d3ebc420662a2b5d6045483ef9 100644 --- a/14_CUDA for Python.ipynb +++ b/14_CUDA for Python.ipynb @@ -3,15 +3,17 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Numba and GPUs\n", "\n", "<div class=\"dateauthor\">\n", - "15 June 2023 | Jan H. Meinke\n", + "13 June 2024 | Jan H. Meinke\n", "</div>" ] }, @@ -19,9 +21,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -37,9 +41,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Ufunc" @@ -48,9 +54,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "We already learned how to vectorize a function. Remember the Mandelbrot set. We defined a function that returns the number of iterations needed to decide if the algorithm diverges." @@ -60,9 +68,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -84,9 +94,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -100,9 +112,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -114,9 +128,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -126,9 +142,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "If you replace `target=\"parallel\"` with `target=\"cuda\"` the function runs on the GPU instead. Give it a try and compare the performance for different sizes of the grid:" @@ -138,9 +156,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -175,6 +195,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -182,7 +203,7 @@ }, "outputs": [], "source": [ - "res = !srun -p gpus -A training2318 --reservation tr2318-20230615-gpu ipython mandelbrot_vectorize_cuda.ipy\n", + "res = !srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 ipython mandelbrot_vectorize_cuda.ipy\n", "t_gpu = numpy.array(eval(res[-1]))\n", "print(f\"Runtime: {t_gpu.mean():.3f}±{t_gpu.std():.3f} s.\")" ] @@ -190,9 +211,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## CUDA for Python" @@ -379,9 +402,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -406,9 +431,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -420,9 +447,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Notice that for every pair (i, j), we calculate the escape time. This makes\n", @@ -826,6 +855,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -833,7 +863,7 @@ }, "outputs": [], "source": [ - "res = !srun -p gpus -A training2318 --reservation tr2318-20230615-gpu ipython cuda_mandelbrot1.ipy\n", + "res = !srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 ipython cuda_mandelbrot1.ipy\n", "t_gpu = numpy.array(eval(res[-1]))\n", "print(f\"Runtime: {t_gpu.mean() * 1000:.3f}±{t_gpu.std() * 1000:.3f} ms.\")" ] @@ -841,9 +871,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "b) The kernel calculates dx and dy for every pixel although it is the same for all of them. Change the kernel so that it takes dx and dy as arguments and calculate dx and dy before you call the kernel. Does this improve the performance?" @@ -853,9 +885,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -900,6 +934,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -907,7 +942,7 @@ }, "outputs": [], "source": [ - "res = !srun -p gpus -A training2318 --reservation tr2318-20230615-gpu ipython cuda_mandelbrot2.ipy\n", + "res = !srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 ipython cuda_mandelbrot2.ipy\n", "t_gpu = numpy.array(eval(res[-1]))\n", "print(f\"Runtime: {t_gpu.mean() * 1000:.3f}±{t_gpu.std() * 1000:.3f} ms.\")" ] @@ -915,9 +950,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "c) Add an additional argument `maxtime` to the kernel, so that you can time the kernel for different escape time values. Don't forget to add the new argument to the documentation." @@ -959,6 +996,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -966,7 +1004,7 @@ }, "outputs": [], "source": [ - "res = !srun -p gpus -A training2318 --reservation tr2318-20230615-gpu ipython cuda_mandelbrot3.ipy\n", + "res = !srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 ipython cuda_mandelbrot3.ipy\n", "t_gpu = numpy.array(eval(res[-1]))\n", "print(f\"Runtime: {t_gpu.mean() * 1000:.3f}±{t_gpu.std() * 1000:.3f} ms.\")" ] @@ -1140,6 +1178,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1147,7 +1186,7 @@ }, "outputs": [], "source": [ - "res = !srun -p gpus -A training2318 --reservation tr2318-20230615-gpu ipython cuda_mandelbrot4.ipy\n", + "res = !srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 ipython cuda_mandelbrot4.ipy\n", "t_gpu = numpy.array(eval(res[-1]))\n", "print(f\"Runtime: {t_gpu.mean() * 1000:.3f}±{t_gpu.std() * 1000:.3f} ms.\")" ] @@ -1200,6 +1239,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1207,15 +1247,17 @@ }, "outputs": [], "source": [ - "!srun -p gpus -A training2318 --reservation tr2318-20230615-gpu python3 cuda_matrixmul.py" + "!srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 python3 cuda_matrixmul.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Using shared memory" @@ -1224,14 +1266,16 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "As you learned in Bottlenecks, the matrix matrix multiplication tends to be memory bandwidth bound. This is true on the GPU, too.\n", "\n", - "The way to make it faster is to use faster memory. On a CPU this usually means, dividing the matrix into blocks that fit in cache and hope for the best. On a GPU at lease part of the fast memory is usually programmable. In CUDA this memory is called *shared memory*.\n", + "The way to make it faster is to use faster memory. On a CPU this usually means, dividing the matrix into blocks that fit in cache and hope for the best. On a GPU at least part of the fast memory is usually programmable. In CUDA this memory is called *shared memory*.\n", "\n", "Shared memory is available to all *threads in a thread block*. Usually, each thread loads data from device memory into shared memory. This is followed by barrier, so that all threads are finished reading. Then the shared memory is reused as often as possible. Another barrier makes sure that all threads are done." ] @@ -1239,9 +1283,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Let's look at an example:" @@ -1250,9 +1296,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "## Matrix multiplication with shared memory" @@ -1378,9 +1426,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023 (local)", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { diff --git a/15_CUDA and MPI.ipynb b/15_CUDA and MPI.ipynb index 8f0eafbe5f7eab32910ddac524e2601f144eb1fb..b6633651f25d56d210b94f658fdef0c5f6d3b733 100644 --- a/15_CUDA and MPI.ipynb +++ b/15_CUDA and MPI.ipynb @@ -3,24 +3,28 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# CUDA for Python and MPI4Py\n", "\n", "<div class=\"dateauthor\">\n", - "15 June 2023 | Jan H. Meinke\n", + "13 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## The Kernel" @@ -52,9 +56,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "-" - } + }, + "tags": [] }, "source": [ "```python\n", @@ -256,13 +262,15 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ - "%writefile parallel_shift.py\n", + "%%writefile parallel_shift.py\n", "\n", "Your code goes here\n", "\n", @@ -273,9 +281,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Picking a device" @@ -297,9 +307,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "-" - } + }, + "tags": [] }, "source": [ "```python\n", @@ -344,7 +356,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [] }, @@ -476,6 +494,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -543,6 +562,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -550,13 +570,19 @@ }, "outputs": [], "source": [ - "!srun -p gpus -n 4 -A training2318 --reservation tr2318-20230615-gpu xenv -L mpi-settings/CUDA python3 cuda_aware_mpi_shift.py" + "!srun -p gpus -n 4 -A training2421 --reservation hpcwp_gpu_20240613 --cuda-mps --pty xenv -L MPI-settings/CUDA python3 cuda_aware_mpi_shift.py" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [] } @@ -577,7 +603,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.3" + "version": "3.12.3" } }, "nbformat": 4, diff --git a/16_Introduction to Dask.ipynb b/16_Introduction to Dask.ipynb index 8df4e8f0f98a604af0c256340176fa79f9b21f2e..e908899f2f1c28708f358fb4e3fcf21fdcd92222 100644 --- a/16_Introduction to Dask.ipynb +++ b/16_Introduction to Dask.ipynb @@ -3,27 +3,31 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Introduction to Dask\n", "\n", "<div class=\"dateauthor\">\n", - "16 June 2023 | Olav Zimmermann\n", + "14 June 2024 | Olav Zimmermann\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ - "Dask implements flexible **intra- and inter-node parallel execution based on a task model**. It features data structures that 'feel' like ordinary numpy ndarrys or pandas dataframes but under the hood have been enabled to work on **distributed data**.\n", + "Dask implements flexible **intra- and inter-node parallel execution based on a task model**. It features data structures that 'feel' like ordinary numpy ndarrays or pandas dataframes but under the hood have been enabled to work on **distributed data**.\n", "While the task based scheduling enables parallel execution of even highly irregular computation pipelines, the distributed data structures make dask also an interesting choice for processing of data volumes that are larger than main memory.\n", "\n", "Among the distinctive features of dask is peer-to-peer data sharing between workers, and high resilience provided by nanny processes that can restart failing workers.\n", @@ -34,9 +38,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## dask.delayed " @@ -44,33 +50,43 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "`dask.delayed` can be used to formulate arbitrary task graphs. \n", "\n", "It can either be employed as a decorator `@delayed` (not show in this tutorial) or as a wrapper function `dask.delayed(func)`. \n", - "This function marks a function to be scheduled by Dask. Delayed functions will be evaluated lazily, e.g. not before their result is needed. " + "This function marks a function to be scheduled by Dask. Delayed functions will be evaluated lazily, e.g., not before their result is needed. " ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "This is similar in spirit to other lazy evaluation schemes in python (e.g. `eval()`, `lambda` or `concurrent.futures`) and also similar to other task frameworks such as tensorflow. \n", "\n", - "As dask.delayed works on the level of individual functions, the user remains in control which functions will be evaluated eagerly and which ones lazily. Although Dask has a sophisticated scheduler for lazy task evaluation, eager evaluation can be preferable in some situations, e.g. for functions that control routing in the task graph, such as functions calculating data used in `if-`statements." + "As dask.delayed works on the level of individual functions, the user remains in control which functions will be evaluated eagerly and which ones lazily. Although Dask has a sophisticated scheduler for lazy task evaluation, eager evaluation can be preferable in some situations, e.g., for functions that control routing in the task graph, such as functions calculating data used in `if-`statements." ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "We first do some settings:" @@ -80,9 +96,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -93,9 +111,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "The next cell implements some dummy functions and builds a simple pipeline with some data dependencies." @@ -105,9 +125,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -146,23 +168,27 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Try to think about it ahead of running the next cells:\n", "- What is the minimal wall time possible?\n", - "- How many tasks does the task graph have for range(8) in prepared?\n", + "- How many tasks does the task graph in `prepared` have?\n", "- How many inputs could you process maximally in the same time it takes for 8 inputs?" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "The task graph generated by `dask` can be visualized (don't try this for large graphs, i.e. more input tasks!)." @@ -171,7 +197,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "import graphviz\n", @@ -181,9 +213,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "The computation of any of the tasks is delayed until the execution is triggered by an explicit command to compute dlresult upon which the individual tasks are scheduled according to the dependency structure." @@ -192,7 +226,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "%time dlresult.compute()" @@ -201,13 +241,15 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "- How close to optimal is the observed scheduling?\n", - "- What is the largest number of inputs you can process under 8 seconds? Why?\n", + "- What is the largest number of inputs you can process in under 8 seconds? Why?\n", "- Change the program in a way that enables you to estimate how much overhead per task is incurred by Dask." ] }, @@ -252,7 +294,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "from operator import add\n", @@ -262,22 +310,30 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "l=[x for x in range(1000000)]\n", - "s= db.from_sequence(l,npartitions=4) # you can manually set the number of partitions\n", + "s= db.from_sequence(l,npartitions=4) # you can manually set the number of partitions\n", "mysum=s.fold(add) # fold performs a parallel reduction \n", - "mysum.dask # another inpection method for task graphs in dask" + "mysum.dask # another inpection method for task graphs in dask" ] }, { "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -287,7 +343,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "%time result=mysum.compute()\n", @@ -299,9 +361,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -311,9 +375,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "(The syntax is kind of unfortunate since Python is moving away from filter and map to list comprehensions and generator expressions.)" @@ -351,17 +417,29 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "import dask.dataframe as dd\n", - "df = dd.read_csv(\"data/iris.csv\") # not a reasonably sized task (too small!)" + "df = dd.read_csv(\"data/iris.csv\") # not a reasonably sized task (too small!)" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "h=df.groupby(df.Name).SepalLength.mean()\n", @@ -371,9 +449,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## dask.array\n", @@ -384,7 +464,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "import dask.array as da\n", @@ -396,9 +482,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -408,9 +496,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "## numpy or dask.array?\n", @@ -424,9 +514,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -437,9 +529,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -453,9 +547,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -467,9 +563,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -477,16 +575,18 @@ "x_rechunked=x_dask.rechunk((2500,3000)) # larger chunks are no longer better for dot product calculation\n", "y_dask = x_rechunked.transpose()\n", "result=x_dask.dot(y_dask)\n", - "#with ProgressBar():\n", - "%timeit result.compute(scheduler=\"threads\")" + "with ProgressBar():\n", + " %timeit result.compute(scheduler=\"threads\")" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## dask.distributed\n", @@ -582,7 +682,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.3" + "version": "3.12.3" } }, "nbformat": 4, diff --git a/17_Debugging.ipynb b/17_Debugging.ipynb index dbd87d3b7b918b6cec67cd9dd28d88982d5a3f9e..3b34e256104f9319087184a00251336626cd4bdd 100644 --- a/17_Debugging.ipynb +++ b/17_Debugging.ipynb @@ -2,17 +2,29 @@ "cells": [ { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "# Debugging Python\n", "<div class=\"dateauthor\">\n", - "06 June 2023 | Jan H. Meinke, Olav Zimmermann\n", + "14 June 2024 | Jan H. Meinke, Olav Zimmermann\n", "</div>" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "What do you do if a program doesn't produce the results you want? You can stare at the code and try to figure out the mistake. You can add lots of print statements to your code. Or you can use a debugger.\n", "\n", @@ -46,7 +58,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Before running the following cell try to guess what will happen: will it throw an error or a warning or will it execute normally? \n", + "**Before running the following cell** please read the code and try to guess what will happen: will it throw an error or a warning or will it execute normally? \n", "If it is one of the latter two cases, what will it print?" ] }, @@ -75,13 +87,13 @@ "source": [ "Using a debugger to execute a code (or part of it) step by step is also called **runtime debugging**. \n", "\n", - "You can switch on JupyterLab's internal debugger by clicking on the small bug icon at the top right of the notebook, next to the kernel name. You will see several panels appear in the right sidebar. In addition, each code cell of the notebook now got line numbers.\n", + "You can switch on JupyterLab's internal debugger by clicking on the small bug icon at the top right of the notebook, before the kernel name. You will see several panels appear in the right sidebar. In addition, each code cell of the notebook now got line numbers.\n", "\n", "Click on the line number of line 11 in the code cell above. A red dot appearing in front of the line number indicates that you just set a **break point**. At a break point the debugger will stop, allowing you to inspect the state of each variable that is defined at this point. To start the debugger and let it execute the code up to the break point just re-execute the cell [Shift-Return].\n", "\n", - "The navigation symbols at the top of the CallStack panel will now no longer be grayed out and allow you to execute the code line by line. With \"next\" you step over function calls within the line. With \"step in\" you can jump into the python functions called in this line of code (but not into any C library functions).\n", + "The navigation symbols at the top of the CallStack panel (depending on your Jupyter version you may have to click on the Bug symbol in the right side bar first) will now no longer be grayed out and allow you to execute the code line by line. With \"next\" you step over function calls within the line. With \"step in\" you can jump into the python functions called in this line of code (but not into any C library functions).\n", "\n", - "The \"Variables\" panel allows you to view either the global or the local variables and to switch between tree and table view. (for arrays the table view is preferable)\n", + "The \"Variables\" panel allows you to view either the global or the local variables and to switch between tree and table view (the **table view** is generally preferable, in particular for numpy arrays).\n", "\n", "**Exercise:** Try to find the bug in the code above. You can set a break point at any line. In case that you want to reset the kernel use the circle arrow button at the top of the notebook.\n", "\n", @@ -115,7 +127,7 @@ "metadata": {}, "outputs": [], "source": [ - "#%%writefile buggy.py\n", + "%%writefile buggy.py\n", "def imabuggyincrement(i,a):\n", " \"\"\"Increment a[i] by 1.\"\"\"\n", " if ii < len(a):\n", @@ -176,7 +188,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Next start pudb in a terminal with the script name as an argument. If you haven't done this in this terminal shell before, you need to source hpcpy23:" + "Next start pudb in a terminal with the script name as an argument. If you haven't done this in this terminal shell before, you need to source hpcpy24:" ] }, { @@ -184,7 +196,7 @@ "metadata": {}, "source": [ "```bash\n", - "source $PROJECT_training2318/hpcpy23\n", + "source $PROJECT_training2421/hpcpy24\n", "pudb buggy.py\n", "```" ] @@ -259,14 +271,16 @@ "* [pdb][] (builtin)\n", "* [pudb][]\n", "* IDEs (All the IDEs we mentioned have debugging support)\n", - "* [Linaro DDT][], former name ARMForge DDT (commercial, support for debugging parallel codes and C/C++ code, only rudimentary Python support)\n", - "* [TotalView][] (commercial, support for debugging parallel codes and C/C++ code, requires debug version of CPython, supports mixed language debugging, aware of cython, pybind11 and other bindings)\n", + "* [Linaro DDT][], former name ARMForge DDT (commercial, support for debugging parallel codes and C/C++ code, only rudimentary Python support: see [here][])\n", + "* [TotalView][] (commercial, support for debugging parallel codes and C/C++ code, requires debug version of CPython, supports mixed language debugging, aware of cython, pybind11 and other bindings. However, debugging of the python code itself, i.e., stepping or breakpoints, is not supported, see [TotalView User Guide][])\n", "\n", "[pdb]: https://docs.python.org/3/library/pdb.html\n", "[pudb]: https://github.com/inducer/pudb\n", "[Linaro DDT]: https://www.linaroforge.com/linaroDdt/\n", + "[here]: https://docs.linaroforge.com/24.0.1/html/forge/ddt/get_started_ddt/python_debugging.html\n", "[ARMForge DDT]: https://developer.arm.com/tools-and-software/server-and-hpc/debug-and-profile/arm-forge/arm-ddt\n", - "[TotalView]: https://help.totalview.io/current/HTML/index.html#page/TotalView/totalviewlhug-python.13.01.html#ww1893192" + "[TotalView]: https://help.totalview.io/current/HTML/index.html#page/TotalView/totalviewlhug-python.13.01.html#ww1893192\n", + "[TotalView User Guide]: https://help.totalview.io/current/PDFs/TotalView_User_Guide.pdf#G12.1893806" ] }, { @@ -280,7 +294,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For example, PyDev, Wing Personal, Visual Studio, and PyCharm Professional (199 €/a with perpetual fallback license) support remote debugging. It can also be done with the ``ptvsd`` and Visual Studio Code." + "Some IDEs like PyDev, Wing Pro, Visual Studio, and PyCharm Professional support remote debugging. For Visual Studio Code there is [debugpy][] that supports [debugging via SSH][].\n", + "\n", + "[debugpy]: https://github.com/microsoft/debugpy/\n", + "[debugging via SSH]: https://github.com/microsoft/debugpy/wiki/Debugging-over-SSH" ] }, { @@ -294,9 +311,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The following video shows how to debug mixed Python and C++ code using Visual Studio.\n", + "The following video shows how to debug mixed Python and C++ code using Visual Studio Code and gdb.\n", "\n", - "You can go back to to the beginning of the video to learn how write a Python extension in Visual Studio." + "You can go back to to the beginning of the video to learn how write a Python extension in Visual Studio Code." ] }, { @@ -320,9 +337,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "HPC Python 2024 (local)", "language": "python", - "name": "python3" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -334,7 +351,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.6" + "version": "3.12.3" } }, "nbformat": 4, diff --git a/18_IDEs.ipynb b/18_IDEs.ipynb index ba2acfd2506f6c71080e8dec05a8cd0fe28ff46a..939785f2c5ebbf257464b8c003b83b5d7c033dc4 100644 --- a/18_IDEs.ipynb +++ b/18_IDEs.ipynb @@ -268,7 +268,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.5" + "version": "3.12.3" } }, "nbformat": 4, diff --git a/HPC_misc_2024.ipynb b/HPC_misc_2024.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..9b366ae47815368b13e70063687ba3ed54bec8a9 --- /dev/null +++ b/HPC_misc_2024.ipynb @@ -0,0 +1,190 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4d4d8c6e-50c4-412a-9d09-c8e9aa8c5c52", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "slide" + }, + "tags": [] + }, + "source": [ + "# Miscellaneous\n", + "\n", + "<div class=\"dateauthor\">\n", + "14 June 2024 | Olav Zimmermann\n", + "</div>" + ] + }, + { + "cell_type": "markdown", + "id": "5291e2ec-7168-40c1-8348-4495f1126940", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "slide" + }, + "tags": [] + }, + "source": [ + "## Python as an HPC language" + ] + }, + { + "cell_type": "markdown", + "id": "b7a6c503-493e-45e9-b9b7-ec170561ddfe", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "fragment" + }, + "tags": [] + }, + "source": [ + "### benefits\n", + "- established as an alternative to C++, Fortran, etc. for many HPC use cases\n", + "- maintains high development performance\n", + "- speed improvements without learning another language\n", + "- sometimes module replacement just works \n", + "- many ways to start and expand\n", + "- many HPC tools/frameworks for Python are open source\n", + "- not only more speed but also higher (energy) efficiency" + ] + }, + { + "cell_type": "markdown", + "id": "726e2aa8-ce42-430f-a418-138627e0b5d4", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "slide" + }, + "tags": [] + }, + "source": [ + "### no free lunch: challenges\n", + "\n", + "- many tricks of the trade\n", + "- not implemented, implemented differently\n", + "- parallel: flops/byte bottleneck, overhead, deadlocks, resilience\n", + "- task based computing: debugging\n", + "- mixed language: environment, not many tools (debuggers, profilers)\n", + "- licenses, longevity, security\n", + "- infrastructure access: heterogeneity, schedulers, resilience" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "4a01e44c-d039-4a11-a5dd-420676807709", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "slide" + }, + "tags": [] + }, + "source": [ + "## Things we did not cover but which may be are worth looking at...\n", + "\n", + "## Tools:\n", + "\n", + "- Profilers: [scalene](https://github.com/plasma-umass/scalene), [Intel Advisor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/advisor.html)\n", + "- Debuggers: [Linaro DDT]( https://docs.linaroforge.com/24.0.1/html/forge/ddt/get_started_ddt/python_debugging.html)\n", + "- Parallelisation frameworks: [Joblib](https://joblib.readthedocs.io/en/stable/), [Ray](https://docs.ray.io/en/latest/), [RAPIDS](https://rapids.ai/), [Legion](https://legion.stanford.edu/)([cuNumeric](https://github.com/nv-legate/cunumeric))\n", + "- Combinations: [ipython on MPI](https://ipyparallel.readthedocs.io/en/latest/reference/mpi.html), [Dask on CUDA](https://docs.rapids.ai/api/dask-cuda/stable/), [Dask on Ray](https://docs.ray.io/en/latest/ray-more-libs/dask-on-ray.html), etc.\n", + "- [Pandas](https://pandas.pydata.org/) and its HPC derivatives: [cuDF](https://github.com/rapidsai/cudf), [modin](https://github.com/modin-project/modin), [vaex](https://vaex.io/)\n", + "- ML frameworks and derivatives: [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [JAX](https://github.com/google/jax), [HeAT](https://github.com/helmholtz-analytics/heat) (=Helmholtz Analytics Toolkit)\n", + "- IDEs: [VS Code](https://code.visualstudio.com/), [PyCharm](https://www.jetbrains.com/pycharm/), etc." + ] + }, + { + "cell_type": "markdown", + "id": "c6984029-51b8-4186-a30f-6a3a00d59e54", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "slide" + }, + "tags": [] + }, + "source": [ + "## big data:\n", + "\n", + "### have a look into:\n", + "- indexing, hashing, precalculation for random access\n", + "- compression, memory mapped files for faster data availability\n", + "- [HDF5](https://docs.h5py.org/en/stable/), [NetCDF](https://github.com/Unidata/netcdf4-python), [SIONlib](https://www.fz-juelich.de/en/ias/jsc/services/user-support/jsc-software-tools/sionlib), [MPI-I/O](https://mpi4py.readthedocs.io/en/stable/tutorial.html#mpi-io) for parallel file access\n", + "- see also https://www.fz-juelich.de/en/ias/jsc/education/training-courses/training-materials/course-material-parallel-i-o-and-portable-data-formats\n", + "- scalable database management systems for complex data (many with python API):\n", + " - object-relational *,\n", + " - array *, \n", + " - graph *, \n", + " - in-memory *, \n", + " - key-value stores, \n", + " - object * \n", + " - etc. (*= database management system)" + ] + }, + { + "cell_type": "markdown", + "id": "7416f9c5-5919-4192-ae7a-30f6fb9eda72", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "slide" + }, + "tags": [] + }, + "source": [ + "## perhaps the next big waves on the HPC Python horizon:\n", + "\n", + "- [Python 3.13](https://docs.python.org/3.13/whatsnew/3.13.html) (due June 2024) will allow to build it either with no GIL or with a JIT compiler \n", + "- Python compilers and compiled Python-like languages: [codon](https://github.com/exaloop/codon), [bend](https://github.com/HigherOrderCO/Bend), [Taichi](https://github.com/taichi-dev/taichi), [Mojo](https://www.modular.com/mojo)\n", + "- AI assisted coding: [ChatGPT](https://realpython.com/chatgpt-coding-mentor-python/), [Github Copilot](https://realpython.com/github-copilot-python/), [HPC-GPT for HPC programming](https://dl.acm.org/doi/fullHtml/10.1145/3624062.3624172)....\n", + "- HPC on cloud(s)\n", + "\n", + "There are many HPC Python pages on the web including other courses and tutorials:\n", + "https://abpcomputing.web.cern.ch/guides/hpc_python/\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8569bac9-b9fd-4e1f-b398-16f3c474cf3a", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/README.md b/README.md new file mode 100644 index 0000000000000000000000000000000000000000..175a1f3267d089a4a319dd1eeaebc27887d562a6 --- /dev/null +++ b/README.md @@ -0,0 +1,49 @@ +# High-Performance Computing with Python @ JSC + +Python is increasingly used in high-performance computing projects. It can be used either as a high-level interface to existing HPC applications and libraries, as embedded interpreter, or directly. + +This course combines lectures and hands-on sessions. It shows how Python can be used on parallel architectures and how to optimize critical parts of the kernel using various tools. + +The following topics will be covered: + +- Short review of vectorized programming with NumPy +- Interactive parallel programming with IPython +- Profiling and optimization +- High-performance NumPy +- Just-in-time compilation with numba +- Distributed-memory parallel programming with Python and MPI +- Bindings to other programming languages and HPC libraries +- Interfaces to GPUs + +This course is aimed at scientists who wish to explore the productivity gains made possible by Python for HPC. + +## Setting up the environment + +After cloning or downloading the repository, you can create a Python environment similar to the one used on the HPC Systems at JSC based on the packages listed in `hpcpy24.yaml`[^1] using [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html) ([Conda](https://conda.io/projects/conda/en/latest/index.html) or [Mamba](https://mamba.readthedocs.io/en/latest/index.html) should work, too, but have not been tested). First change into the directory with the material and then execute +```bash +micromamba create -f hpcpy24.yaml +``` +This downloads about 1 GB of data and creates the environment `hpc-python-2024`. Activate the environment using +```bash +micromamba activate hpc_python_2024 +``` +You can now start Jupyter Lab +```bash +jupyter lab +``` +and work with the notebooks. + +[^1]: Windows user may want to change gcc to clang, gxx to clangxx, and gfortran to flang and the version from 12.3.0 to 18.1.7 in hpcpy24.yaml. + +## Updating your copy of the repository + +If you cloned the repository, you can pull updates by first committing the changes you made +```bash +git commit -a -m "add a short message describing your changes" +``` +(replace `add a short message describing your changes` with a short message describing your changes) then pulling updates + +git pull --rebase + + + diff --git a/build.sh b/build.sh index df60a479ee86c74ff1e5991b1a70d6f7ea579276..42b9dc911589aa31cc07b525d78e399c38cafb3b 100755 --- a/build.sh +++ b/build.sh @@ -1,5 +1,5 @@ #!/bin/bash -source $PROJECT_training2318/hpcpy23 +source $PROJECT_training2421/hpcpy24 # Build points pushd code/point rm -rf build diff --git a/hpcpy23 b/hpcpy24 similarity index 50% rename from hpcpy23 rename to hpcpy24 index 447fd63d8484bbd3ef267c5ba2d0015066b785b9..304315ab7adacf3bec7df3be8632c08d660b8f00 100755 --- a/hpcpy23 +++ b/hpcpy24 @@ -1,6 +1,6 @@ #!/bin/bash module purge -module load Stages/2023 +module load Stages/2024 module load GCC module load ParaStationMPI module load CMake @@ -10,17 +10,18 @@ module load numba module load dask module load mpi4py module load h5py -#module load Jupyter +module load jupyter-server/.2.14.0 +module load ipyparallel/.8.8.0 module load CUDA module load cuTENSOR module load NCCL module load cuDNN +module load CuPy #export NUMBAPRO_NVVM=$CUDA_HOME/nvvm/lib64/libnvvm.so #export NUMBAPRO_LIBDEVICE=$CUDA_HOME/nvvm/libdevice -export LD_LIBRARY_PATH=/p/project/training2318/resources/code/text_stats/build:$LD_LIBRARY_PATH -export LD_LIBRARY_PATH=/p/project/training2318/resources/code/point/build:$LD_LIBRARY_PATH -export PYTHONPATH=/p/project/training2318/packages/lib/python3.10/site-packages:$PYTHONPATH -export PATH=/p/project/training2318/packages/bin:$PATH -export HPCPY2023=1 -#exec $(which python) -m ipykernel $@ - +export LD_LIBRARY_PATH=/p/project/training2421/resources/code/text_stats/build:$LD_LIBRARY_PATH +export LD_LIBRARY_PATH=/p/project/training2421/resources/code/point/build:$LD_LIBRARY_PATH +export PYTHONPATH=/p/project/training2421/packages/lib/python3.11/site-packages:$PYTHONPATH +export PATH=/p/project/training2421/packages/bin:$PATH +export HPCPY2024=1 +# exec $(which python) -m ipykernel $@ diff --git a/hpcpy24.yaml b/hpcpy24.yaml new file mode 100644 index 0000000000000000000000000000000000000000..65b5b03a36ebebd98b2c03e212eeab00b1ff2c1c --- /dev/null +++ b/hpcpy24.yaml @@ -0,0 +1,24 @@ +name: hpcpy_jsc_2024 +channels: + - conda-forge +dependencies: + - python = 3.11.3 + - numpy = 1.25.1 + - scipy = 1.11.1 + - matplotlib = 3.7.2 + - numba = 0.58.1 + - dask = 2023.9.2 + - mpi4py = 3.1.4 + - ipyparallel = 8.8.0 + - cython = 0.29.35 + - pybind11 = 2.11.1 + - cffi = 1.15.1 + - python-graphviz = 0.20.3 + - cupy = 12.2.0 + - jupyterlab = 4.2.1 + - gcc = 12.3.0 + - gfortran = 12.3.0 + - gxx = 12.3.0 + - cmake = 3.26.3 + - threadpoolctl = 3.1.0 + - pudb diff --git a/solutions/00_Introduction to IPython.ipynb b/solutions/00_Introduction to IPython.ipynb index 6f3cf8c9dea11c5f171ec78714bf7a649ad4cdef..4eec41a34470c969fc68e9a91e35db02a150da67 100644 --- a/solutions/00_Introduction to IPython.ipynb +++ b/solutions/00_Introduction to IPython.ipynb @@ -3,9 +3,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Introduction to IPython and Jupyter Notebook" @@ -20,7 +22,7 @@ }, "source": [ "<div class=\"dateauthor\">\n", - "06 June 2023 | Jan H. Meinke\n", + "07 June 2024 | Jan H. Meinke\n", "</div>" ] }, @@ -156,7 +158,7 @@ "source": [ "If you didn't get the documentation but only ``Object `random` not found``. Try importing the module first. Start by typing ``im`` and hit the tab key. Then type ``r`` and hit tab again. You'll get a dropdown box with available modules. You can continue typing until your choice is unique or select an item from the list. Give it a try. \n", "\n", - "Try ``random?`` after importing the module. If you use ``??`` instead of ``?`` you get the source code." + "Try ``random?`` after importing the module. If you use ``??`` instead of ``?`` you get the source code. Note: If you would like to temporarily hide a cell (e.g. an output cell with a very long text) just click on the blue bar displayed to the right of the cell." ] }, { @@ -266,9 +268,6 @@ "import matplotlib.pyplot as plt\n", "```\n", "\n", - "Alternatively, the command ``%pylab inline`` sets up interactive plotting and pulls all functions and modules from ``numpy`` and ``matplotlib.pyplot`` into the namespace.\n", - "\n", - "\n", "[matplotlib]: http://matplotlib.org/" ] }, @@ -283,7 +282,7 @@ "outputs": [], "source": [ "%matplotlib inline \n", - "# widget is an interactive alternative to inline\n", + "# depending on the installation widget and ipympl are interactive alternatives to inline\n", "import matplotlib.pyplot as plt\n", "import numpy" ] @@ -903,9 +902,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -917,7 +916,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/solutions/01_Bottlenecks.ipynb b/solutions/01_Bottlenecks.ipynb index 742c473076d110c9458a601cfa6992196aba7511..65dfc452510ff100a81d8ab182315812d4b7952f 100644 --- a/solutions/01_Bottlenecks.ipynb +++ b/solutions/01_Bottlenecks.ipynb @@ -3,24 +3,28 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Bottlenecks\n", "\n", "<div class=\"dateauthor\">\n", - "12 Jun 2023 | Jan H. Meinke\n", + "10 Jun 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## High-performance computing is computing at the limit" @@ -29,6 +33,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -41,6 +46,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -53,6 +59,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -65,6 +72,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -77,6 +85,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -89,9 +98,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## CPU\n", @@ -101,9 +112,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "\n", @@ -118,9 +131,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "There are 64 cores per socket\n", @@ -130,9 +145,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "This is the limit most people think of first, but it's often not the crucial one. Each core on JUSUF can perform ca. 36 GFlop/s if the code is completely *vectorized* and performs a *multiply and an add operation* at *each step*. If your code doesn't fulfill those requirements its peak performance will be less.\n", @@ -143,9 +160,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Memory hierarchy\n", @@ -155,9 +174,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "* L1 (per core):\n", @@ -174,9 +195,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "\n", @@ -186,9 +209,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "The memory bandwidth of a JUSUF node is about " @@ -197,9 +222,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "190.7 GiB/s (~400 cycles latency)" @@ -208,9 +235,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## A simple operation" @@ -219,9 +248,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "c = c + a * b (multiply-add)" @@ -230,9 +261,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "3 DP read, 1 DP write -> 24 bytes read, 8 bytes write" @@ -241,9 +274,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "190 GB/s / 24 bytes/op = 8 Gop/s (multiply-add -> 16 GFLOP/s)" @@ -252,9 +287,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "I assume that we are dealing with double precision numbers (8 bytes). Then I have to read 3 * 8 bytes = 24 bytes and write 8 bytes. This is a multiply-add operation, so each core can do 18 billion of those per second, but it only receives 190 GB/s. 190GB/s / 24 B/op = 8 Gop/s (16 GFLOP/s). This operation is clearly memory bound, if we have to get all the data from main memory." @@ -263,9 +300,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Matrix multiplication" @@ -274,9 +313,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "This operation is quite common. Let's look at a matrix multiplication $C=AB$. To calculate the element i, j of the result matrix C, we multiply row i of A with column j of B and sum the results. This is the scalar or dot product of row i of A and column j of B. In code this looks like this:" @@ -285,6 +326,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -303,6 +345,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -316,6 +359,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -351,9 +395,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "Let's take two small matrices A and B and see how long the above function takes." @@ -363,9 +409,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -378,9 +426,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -390,9 +440,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "A matrix multiplication of two n by n matrices performs $2n^3$ operations. The dot function achieves" @@ -402,9 +454,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -414,9 +468,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Wow, that's bad. Let's see if we can make this faster." @@ -425,9 +481,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "## Numba" @@ -437,13 +495,15 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ - "from numba import njit as jit\n", + "from numba import njit as jit # This is the default for numba 0.59.0 and later\n", "jdot = jit(dot)" ] }, @@ -451,9 +511,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -464,9 +526,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -476,9 +540,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "## Access order and cache lines" @@ -487,9 +553,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "From our estimate above, we should be able to get at least ten times this, but that's assuming we can achieve the maximum memory bandwidth. \n", @@ -505,9 +573,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -540,9 +610,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Now, elements in b are accessed in the proper order and a[i, k] is constant for the loop. This changes our estimate, because, now we read 8 bytes/op in the innermost loop. This gives us a maximum of 190 GB/s / 8 bytes/op = 24 Gop/s (48 GFLOP/s) making this compute bound on a single core." @@ -551,9 +623,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Effect on matrix multiplication" @@ -563,9 +637,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -576,9 +652,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -588,21 +666,25 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ - "This is not much better. Let's take a look at a bigger matrix." + "This is much better. Let's take a look at a bigger matrix." ] }, { "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -615,9 +697,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -628,9 +712,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -640,12 +726,14 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { - "slide_type": "notes" - } + "slide_type": "skip" + }, + "tags": [] }, "source": [ - "This is even worse and corresponds to a bandwidth of about 8 GB/s.\n", + "This is worse and corresponds to a bandwidth of about 18 GB/s on JUSUF and almost twice that in the cloud.\n", "\n", "A possible explanation is that a single core may not be able to access the full bandwidth of the socket.\n", "\n", @@ -655,9 +743,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { - "slide_type": "subslide" - } + "slide_type": "skip" + }, + "tags": [] }, "source": [ "## Numpy" @@ -666,9 +756,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Let's see how long numpy takes for this:" @@ -678,6 +770,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -692,9 +785,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -707,9 +802,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -718,7 +815,13 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "The maximum clock frequency of the processor is 3.4 GHz, which corresponds to a peak performance of about 54 GFLOP/s. This is pretty close." ] @@ -727,6 +830,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -741,6 +845,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -754,9 +859,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The numpy version we use here, uses a fast math library. That's what you want!\n", @@ -767,9 +874,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## The roofline model" @@ -778,9 +887,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The roofline model shows the memory bandwidth bound and compute bound with respect to the computational intensity. The computational intensity is just given by the number of bytes used divided by the number of operations performed." @@ -789,9 +900,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "" @@ -800,9 +913,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Depending on your algorithm, different limits may be relevant, for example, we only used a single thread, but used the peak performance of the entire processor with 64 cores. If the data fits completely in L2 cache the available bandwidth is higher once the data has been loaded. The following shows a plot with a few more limits." @@ -811,9 +926,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "" @@ -822,9 +939,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## I/O" @@ -833,9 +952,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "### GPFS File System\n", @@ -845,9 +966,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "Each node connected to file system with $\\mathcal{O}(100)$ GBit/s or about 12.5 GB/s." @@ -856,9 +979,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The scratch file system achieves read/write bandwidths that are very similar to the main memory bandwidth, but not for a single node. Each node is connected to the GPFS file system with $\\mathcal{O}(100)$ GBit/s connection. In other words, we can read/write about 12.5 GB/s. If we had to load the data in the previous calculation from disk, we could only achieve 12.5 GB/s / 24 bytes/op = 520 Mop/s. The main memory bandwidth or the peak performance of the CPU don't matter in this case." @@ -868,9 +993,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [] @@ -878,9 +1005,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -892,7 +1019,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/solutions/02_NumPy_concepts.ipynb b/solutions/02_NumPy_concepts.ipynb index 743b74bb52b32d5fc47f18f8e4c60abf0222d974..16164c67c78afb3676ada74917d67eee1dfd5cb3 100644 --- a/solutions/02_NumPy_concepts.ipynb +++ b/solutions/02_NumPy_concepts.ipynb @@ -3,24 +3,28 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# NumPy - an HPC perspective\n", "\n", "<div class=\"dateauthor\">\n", - "12 June 2023 | Olav Zimmermann\n", + "7 June 2024 | Olav Zimmermann\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Python is an interpreted language and as such it is extremely flexible, allowing to define everything, including code itself, \n", @@ -133,9 +137,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -147,9 +153,40 @@ { "cell_type": "markdown", "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "Note: In case of a wrapper like FlexiBlas ``show_config()`` will only show the wrapper. One possibility to get a hint which BLAS implementation NumPy is probably linked against is the following command:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "skip" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from threadpoolctl import threadpool_info\n", + "threadpool_info()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## ndarray" @@ -355,9 +392,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -369,7 +406,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/solutions/03_ThinkVector.ipynb b/solutions/03_ThinkVector.ipynb index 404514f480e4e0e8538a0bb2cd85c8925a0833e1..d54afcbd966efcb58e0b620788f16a6f41e5bac3 100644 --- a/solutions/03_ThinkVector.ipynb +++ b/solutions/03_ThinkVector.ipynb @@ -3,6 +3,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -12,16 +13,18 @@ "# Think Vector\n", "\n", "<div class=\"dateauthor\">\n", - "12 June 2023 | Jan H. Meinke\n", + "07 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Dot product" @@ -475,9 +478,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Let's look at an example. We start with a 2d array of random values and fix the left boundary to a value of 0 and the right boundary to a value of 1. We do not want to change these boundary values. The top and bottom boundaries are connected so that our system forms a cylinder (periodic boundary conditions along y)." @@ -487,13 +492,15 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "-" - } + }, + "tags": [] }, "outputs": [], "source": [ - "A_orig = numpy.random.random((10, 10))\n", + "A_orig = numpy.random.random((30, 30))\n", "A_orig[:, 0] = 0\n", "A_orig[:, -1] = 1" ] @@ -539,12 +546,15 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "-" - } + }, + "tags": [] }, "outputs": [], "source": [ + "# note the use of the modulo operator % to encode the periodic boundary condition\n", "for i in range(A.shape[0]):\n", " for j in range(1, A.shape[1] - 1):\n", " B[i, j] = 0.25 * (A[(i + 1) % A.shape[0], j] + A[i - 1, j] \n", @@ -899,6 +909,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -911,18 +922,20 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, "tags": [] }, "source": [ - "Multiplying to complex numbers is more interesting: " + "Multiplying two complex numbers is more interesting: " ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -972,21 +985,24 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, "tags": [] }, "source": [ - "In short, we can use complex numbers just like any other numerical type. Here is a function that calculates the series and return the iteration at which $|z| > 2$:" + "In short, we can use complex numbers just like any other numerical type. Here is a function that calculates the series and returns the iteration at which $|z| > 2$:" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Escape time algorithm" @@ -1050,18 +1066,22 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "**Hints:** \n", "\n", "* You can use `numpy.meshgrid()` to generate your 2D array of points.\n", " \n", - " If you have `x = numpy.array([-1.1, -1, 0, 1.2])` and `y = numpy([-0.5j, 0j, 0.75j])` and call `XX, YY = numpy.meshgrid(x, y)`, it returns two arrays of shape 3 by 4. The first one contains 3 rows where each row is a copy of x. The second one contains 4 columns where each colomn is a copy of y.\n", + " If you have `x = numpy.array([-1.1, -1, 0, 1.2])` and `y = numpy.array([-0.5j, 0j, 0.75j])` and call `XX, YY = numpy.meshgrid(x, y)`, it returns two arrays of shape 3 by 4. The first one contains 3 rows where each row is a copy of x. The second one contains 4 columns where each colomn is a copy of y.\n", " \n", " Now you can add those two array to get points in the complex plane. `P = XX + YY`.\n", + "\n", + "* Another (even faster) way is to use broadcasting. For this you need to insert an empty dimension with `np.newaxis` (or `None`).\n", " \n", "* You somehow need to mask the points that already diverged in future iterations.\n", "* You don't have to put this in a function" @@ -1070,6 +1090,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1278,9 +1299,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -1292,7 +1313,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/solutions/04_Particle Dynamics.ipynb b/solutions/04_Particle Dynamics.ipynb index 76c01d9ea8865e6d176d9538c88a6e0bdaf6db37..611758ced0e88fa845e945bf838a74ac3f97bdfe 100644 --- a/solutions/04_Particle Dynamics.ipynb +++ b/solutions/04_Particle Dynamics.ipynb @@ -12,7 +12,7 @@ "source": [ "# Particle Dynamics with Python\n", "<div class=\"dateauthor\">\n", - "12 June 2023 | Jan H. Meinke\n", + "10 June 2024 | Jan H. Meinke\n", "</div>" ] }, @@ -21,6 +21,7 @@ "execution_count": null, "id": "5822f3b3-bc03-4e2f-85f1-57cb246e3a05", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -38,6 +39,7 @@ "execution_count": null, "id": "f7d1939b-7d73-4c0c-9d8a-d6ea39d48b49", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -45,6 +47,7 @@ }, "outputs": [], "source": [ + "# Note: if available on your installation 'ipympl' and 'widget' provide interactive alternatives to 'inline'\n", "%matplotlib inline" ] }, @@ -52,6 +55,7 @@ "cell_type": "markdown", "id": "b6798959-bbef-4f71-b696-e1069554c403", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -67,6 +71,7 @@ "cell_type": "markdown", "id": "9f9b8f9d-c834-4b86-9ef1-e385694d4b8c", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -83,6 +88,7 @@ "cell_type": "markdown", "id": "2c250750-32b7-4a74-8c3e-5c3eb6c4a13d", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -102,6 +108,7 @@ "cell_type": "markdown", "id": "00ee5853-283f-4786-bd4c-81ca9ab7b3b2", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -115,6 +122,7 @@ "cell_type": "markdown", "id": "a6e75808-f266-4a57-9837-5b9aa69ee436", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -128,6 +136,7 @@ "cell_type": "markdown", "id": "27adecd9-7499-4a86-bb62-15dd40377c72", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -141,6 +150,7 @@ "cell_type": "markdown", "id": "35260044-1b70-46c5-8bfd-8475566037b4", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -155,6 +165,7 @@ "cell_type": "markdown", "id": "0167c3d7-4abc-4635-b53d-aa38072ff922", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -168,6 +179,7 @@ "cell_type": "markdown", "id": "96292513-eaee-4617-bacd-4d13a1f6f8ab", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -182,6 +194,10 @@ "cell_type": "markdown", "id": "cbab8258-28f9-41db-9dda-7f4a5be57603", "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [] }, "source": [ @@ -192,6 +208,7 @@ "cell_type": "markdown", "id": "c55acb8e-6cb4-459c-9241-9e42eb364b72", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -209,6 +226,7 @@ "cell_type": "markdown", "id": "d36faa34-7345-4e94-b19b-62e4419417e0", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -222,6 +240,7 @@ "cell_type": "markdown", "id": "32f7c975-ed21-4c70-9168-5b7bfa5ca276", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -234,7 +253,13 @@ { "cell_type": "markdown", "id": "4288de12-8bf3-41b2-96ca-5c3c47fc0d84", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "1. Calculate the force on each particle by summing up all the forces acting on it.\n", "2. Integrate the equation of motion\n", @@ -248,6 +273,7 @@ "cell_type": "markdown", "id": "efba7cbf-301a-4e5c-81d4-1394c5ec3c9f", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -275,6 +301,7 @@ "cell_type": "markdown", "id": "76d2db76-3bac-4465-9512-babcef5e721b", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -290,6 +317,7 @@ "execution_count": null, "id": "b4525c8a-378a-45b7-b1e2-b67f5f07d397", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -318,6 +346,7 @@ "cell_type": "markdown", "id": "8fd053d2-8c88-4666-82ed-0316fe21ac34", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -331,6 +360,7 @@ "cell_type": "markdown", "id": "ac5e70be-cafd-41cd-b866-5b98ee28fb0a", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -344,6 +374,10 @@ "cell_type": "markdown", "id": "c1d0d68d-23a4-45e1-a431-91e575056e21", "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [] }, "source": [ @@ -354,6 +388,7 @@ "cell_type": "markdown", "id": "0b29d4d1-b6ef-4615-ab11-0bed26267252", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -368,6 +403,7 @@ "execution_count": null, "id": "338142b6-f973-4f7a-b5a4-77e76f3b758f", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -385,6 +421,7 @@ "cell_type": "markdown", "id": "d0156a2d-13ae-46dd-b3a8-cb7eb1aca0bf", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -399,6 +436,7 @@ "execution_count": null, "id": "e841a076-504d-445b-b006-b931e3cb0bc2", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -418,6 +456,7 @@ "cell_type": "markdown", "id": "3de052ac-7591-4477-8285-cc15c0019a7a", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -431,6 +470,7 @@ "cell_type": "markdown", "id": "235e1971-24e0-4cf8-ac27-779e5ae37684", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -445,6 +485,10 @@ "execution_count": null, "id": "1133b4bb-111b-4aca-9326-22a7c29c8522", "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [] }, "outputs": [], @@ -458,6 +502,7 @@ "cell_type": "markdown", "id": "ccea23e5-4f4b-4ff6-b379-8d45e3fe15f4", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -471,6 +516,7 @@ "cell_type": "markdown", "id": "dba27f9b-350e-4e65-9f42-e3615ee30a84", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -484,7 +530,13 @@ "cell_type": "code", "execution_count": null, "id": "5ddc24f9-eaf3-491c-bf81-232efa584c1c", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "x = [i + v * dt + 0.5 * f / m * dt * dt for i, v, f in zip(x, vx, Fx)]\n", @@ -496,6 +548,7 @@ "cell_type": "markdown", "id": "52959ed7-d454-40fb-98f1-9df161873c87", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" }, @@ -510,6 +563,7 @@ "execution_count": null, "id": "2266d4e8-8f67-4979-ae47-abf8508673a4", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -526,6 +580,7 @@ "cell_type": "markdown", "id": "e4cff076-759c-477c-9758-41bb730cd606", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -539,6 +594,7 @@ "cell_type": "markdown", "id": "92a88a32-4ee1-44ce-b371-afd412359a3b", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -552,7 +608,13 @@ "cell_type": "code", "execution_count": null, "id": "bf48e0d0-34f6-47ba-8a30-ba0c1e19489d", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "ax = plt.figure(figsize=(5, 5)).add_subplot(projection='3d')\n", @@ -564,6 +626,7 @@ "cell_type": "markdown", "id": "65984f53-4b54-4f6d-aaa1-6de391150539", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -577,6 +640,7 @@ "cell_type": "markdown", "id": "f1f30004-a9c3-4499-84e0-976937b9f8a8", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -592,6 +656,7 @@ "execution_count": null, "id": "039819a6-698f-43a6-a4f0-4f7b8852fbb1", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -604,6 +669,10 @@ "cell_type": "markdown", "id": "8cb45f43-29e2-49df-a976-bf7790fe5a44", "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [ "Solution" ] @@ -617,6 +686,7 @@ "execution_count": null, "id": "ccfc4eca-c09e-448f-93db-2122be7b484c", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -635,6 +705,7 @@ "execution_count": null, "id": "599a1169-0356-4841-a495-2b113021c652", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -653,6 +724,7 @@ "execution_count": null, "id": "405b8627-e887-4ec8-85cf-c391877c0b19", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -674,6 +746,7 @@ "cell_type": "markdown", "id": "0e3c0e11-dc33-46e0-a6da-05cb42ecfd9a", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -694,6 +767,7 @@ "execution_count": null, "id": "8d8cad39-dd89-4379-b549-64a67465db3f", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -714,6 +788,7 @@ "execution_count": null, "id": "ba5403cc-d3d0-46d8-a1bb-01f5056d0963", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -733,6 +808,7 @@ "execution_count": null, "id": "500347dc-2dfd-4481-a81b-4276cbc00863", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -752,6 +828,7 @@ "execution_count": null, "id": "e0745105-fa07-4054-8ddc-274de4a510f8", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -771,6 +848,7 @@ "execution_count": null, "id": "699fb1a4-349b-46ad-acfc-177465aade2a", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -790,6 +868,7 @@ "execution_count": null, "id": "3fe70091-fffd-4613-a798-1f9c54bfbfa4", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -808,6 +887,7 @@ "cell_type": "markdown", "id": "5a141c1e-22b6-40be-80d5-25ad2648972c", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -820,9 +900,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023 (local)", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { diff --git a/solutions/05_Profiling a simple md code.ipynb b/solutions/05_Profiling a simple md code.ipynb index eeca5b2485065930e1290fa83e0fd6272cd1dd78..7e7261af0be388c0f50708d9b3cb02f330827ac7 100644 --- a/solutions/05_Profiling a simple md code.ipynb +++ b/solutions/05_Profiling a simple md code.ipynb @@ -3,23 +3,27 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Profiling\n", "<div class=\"dateauthor\">\n", - "13 June 2023 | Jan H. Meinke\n", + "11 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Profiler" @@ -28,51 +32,67 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ - "cprofiler (standard module)" + "[cprofiler][] (standard module)\n", + "\n", + "[cprofiler]: https://docs.python.org/3.12/library/profile.html" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ - "line_profiler" + "[line_profiler][]\n", + "\n", + "[line_profiler]: https://kernprof.readthedocs.io/en/latest/" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ - "Intel Advisor since 2017 beta" + "[Intel VTune][]\n", + "\n", + "[Intel VTune]: https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2024-1/python-code-analysis.html" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, "tags": [] }, "source": [ - "Scalene" + "[Scalene][]\n", + "\n", + "[Scalene]: https://github.com/plasma-umass/scalene" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -85,9 +105,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Before you start to optimize a program, you should generate a profile. A profile shows how much time a program spends in which function, line of code, or even assembler instruction.\n", @@ -100,9 +122,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Profiling a simple particle dynamics code" @@ -111,9 +135,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "pair_force()\n", @@ -131,9 +157,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -278,9 +306,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## The main program" @@ -289,9 +319,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Initialization" @@ -300,7 +332,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "# 1000 particles\n", @@ -321,9 +359,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### The algorithm\n", @@ -339,9 +379,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -354,9 +396,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "This took quite some time. Let's measure how long it takes. Add a %%timeit statement just before nsteps (same line)." @@ -365,9 +409,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## The base line" @@ -377,9 +423,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -392,9 +440,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "OK, that's our base line. Next, we want to know where all this time is spent. I mentioned the cprofile module at the beginning. IPython has a magic for that called %%prun. Use it in front of the loop this time." @@ -403,9 +453,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Profiling with %%prun" @@ -414,7 +466,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "%%prun -r nsteps=1\n", @@ -426,9 +484,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ " 2003007 function calls in 8.561 seconds\n", @@ -452,9 +512,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The overhead shouldn't be too bad. I got about 10%. Most of the time (about 80%) is spent in pair_force. And 20% of that time is spent on np.array>" @@ -463,9 +525,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Line by line profiling" @@ -474,9 +538,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Unfortunately, this is a rather coarse grained profile. We don't know which part is the expensive part of this calculation and what we can do about it." @@ -486,9 +552,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -499,9 +567,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -511,9 +581,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "Total time: 9.86691 s \n", @@ -531,9 +603,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Timing individual operations" @@ -554,9 +628,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -568,9 +644,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -582,9 +660,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -596,9 +676,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -610,9 +692,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "We can now change the code so it calculates dx, dy, and dz first and then uses them later in the calculation. We can also use numba to speed up the simulation." @@ -621,9 +705,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "## Exercise: Time the other operations and optimize the code" @@ -674,9 +760,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -688,7 +774,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/solutions/06_LocalParallel.ipynb b/solutions/06_LocalParallel.ipynb index e716e9c43cba437fc60fb2df5e3c49a0642bcbdc..80745acaa5f9a8e8d30daaf47051a36e372daaf5 100644 --- a/solutions/06_LocalParallel.ipynb +++ b/solutions/06_LocalParallel.ipynb @@ -3,40 +3,45 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Interactive Parallel Computing with IPython Parallel\n", "\n", "<div class=\"dateauthor\">\n", - "13 June 2023 | Jan H. Meinke\n", + "11 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "*Computers have more than one core.* Wouldn't it be nice if we could use all the cores of our local machine or a compute node of a cluster from our [Jupyter][IP] notebook? \n", "\n", "Click on the ``+``-sign at the top of the Files tab on the left to start a new launcher. In the launcher click on Terminal. A terminal will open as a new tab. Grab the tab and pull it to the right to have the terminal next to your notebook.\n", "\n", - "**Note**: The terminal does not have the same modules loaded as the notebook. To fix that type `source $PROJECT_training2318/hpcpy23`.\n", + "**Note**: The terminal does not have the same modules loaded as the notebook. To fix that type `source $PROJECT_training2421/hpcpy24`.\n", "\n", "In the terminal type ``ipcluster``. You'll see the help message telling you that you need to give it subcommand. Take a look at the message and then enter \n", "\n", "``` bash\n", - "export OMP_NUM_THREADS=32\n", + "export OMP_NUM_THREADS=XX\n", "ipcluster start --n=4\n", "```\n", + "with XX=32 if you are on a JUSUF node and XX=4 if you are on a JSCCloud instance.\n", "\n", - "This will start a cluster with four engines and should limit the number of threads to 32 threads per engine to avoid oversubscription.\n", + "This will start a cluster with four engines and should limit the number of threads per engine to avoid oversubscription.\n", "\n", "> If you use the classical [Jupyter][IP] notebook, this is even easier if you have the cluster extension installed. (We don't have that one on our JupyterHub, yet). One of the tabs of your browser has the title \"Home\". If you switch to that tab, there are several tabs within the web page. One of them is called \"IPython Clusters\". Click on \"IPython Clusters\", increase the number of engines in the \"default\" profile to 4, and click on Start. The status changes from stopped to running. After you did that come back to this tab.\n", "\n", @@ -59,9 +64,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Overview" @@ -105,9 +112,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Now let's see how we access the \"Cluster\". Originally, [ipyparallel][IPp] was developed as a part of [IPython][IP]. In the meantime it's developed separately. It is used to access the engines, we just started. We first need to import Client.\n", @@ -227,9 +236,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Before we go into the details of the interface of a `DirectView`--that's the name of the class, let's look at IPython magic.\n", @@ -243,9 +254,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -362,9 +375,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -456,18 +471,34 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Magic commands are blocking by default, i.e., the next cell can only be executed after all the engines have finished their work. We can pass the option ``--noblock`` to change that behavior." ] }, + { + "cell_type": "markdown", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "In the next cell use XX=32 if you are on a JUSUF node and XX=4 if you are on a JSCCloud instance:" + ] + }, { "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -477,7 +508,7 @@ "source": [ "%%px --local\n", "import threadpoolctl\n", - "threadpoolctl.threadpool_limits(limits=32, user_api='blas')" + "threadpoolctl.threadpool_limits(limits=4, user_api='blas')" ] }, { @@ -875,9 +906,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -997,9 +1030,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1010,9 +1045,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1066,7 +1103,7 @@ } }, "source": [ - "Latency (the time until something happens) and bandwidth (the amount of data we get through the network) are two important properties of your parallel system that define what is practical and what is not. We will use the ``%timeit`` magic to measure these properties. ``%timeit`` and its sibbling ``%%timeit`` measure the run time of a statement (cell in the case of ``%%timeit``) by executing the statement multiple times (by default at least 7 repeats). For short running routines a loop of many executions is performed per repeat and the minimum time measured is then displayed. The number of loops and the number of repeats can be adjusted. Take a look at the documentation. Give it a try." + "Latency (the time until something happens) and bandwidth (the amount of data we get through the network) are two important properties of your parallel system that define what is practical and what is not. We will use the ``%timeit`` magic to measure these properties. ``%timeit`` and its sibling ``%%timeit`` measure the run time of a statement (cell in the case of ``%%timeit``) by executing the statement multiple times (by default at least 7 repeats). For short running routines a loop of many executions is performed per repeat and the minimum time measured is then displayed. The number of loops and the number of repeats can be adjusted. Take a look at the documentation. Give it a try." ] }, { @@ -1343,6 +1380,20 @@ "dview.block=True" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px --local\n", + "to_delete=[\"a\", \"b\", \"c\", \"A\", \"B\", \"C\"]\n", + "for x in dir():\n", + " if x in to_delete:\n", + " del globals()[x]\n", + " print(f'{x} deleted.')" + ] + }, { "cell_type": "code", "execution_count": null, @@ -1366,8 +1417,8 @@ }, "outputs": [], "source": [ - "%timeit -n 20 dview.push(dict(a=a))\n", - "%timeit -n 20 dview.push(dict(a=a[:128*1024]))\n", + "# %timeit -n 20 dview.push(dict(a=a))\n", + "# %timeit -n 20 dview.push(dict(a=a[:128*1024]))\n", "%timeit -n 20 dview.push(dict(a=a[:64*1024]))\n", "%timeit -n 20 dview.push(dict(a=a[:32*1024]))\n", "%timeit -n 20 dview.push(dict(a=a[:16*1024]))\n", @@ -1398,8 +1449,8 @@ }, "outputs": [], "source": [ - "bwmax = len(rc) * 256 * 8 / 9.83-3\n", - "bwmin = len(rc) * 8 / 4.25e-3\n", + "bwmax = len(rc) * 64 * 8 / 42.2e-3\n", + "bwmin = len(rc) * 8 / 18.5e-3\n", "print(\"The bandwidth is between %.2f kB/s and %.2f kB/s.\" %( bwmin, bwmax))" ] }, @@ -1613,7 +1664,7 @@ }, "outputs": [], "source": [ - "n = 4096\n", + "n = 2048\n", "A = np.random.random([n, n])\n", "B = np.random.random([n, n])" ] @@ -1702,26 +1753,13 @@ }, "outputs": [], "source": [ - "%%timeit -o\n", + "%%timeit\n", "c00 = np.dot(a00, b00) + np.dot(a01, b10)\n", "c01 = np.dot(a00, b01) + np.dot(a01, b11)\n", "c10 = np.dot(a10, b00) + np.dot(a11, b10)\n", "c11 = np.dot(a10, b01) + np.dot(a11, b11)" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "skip" - } - }, - "outputs": [], - "source": [ - "_.best / tdot.best" - ] - }, { "cell_type": "markdown", "metadata": { @@ -1812,6 +1850,20 @@ "c11 = c11h.get()" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%px --local\n", + "to_delete=[\"a\", \"b\", \"c\", \"A\", \"B\", \"C\"]\n", + "for x in dir():\n", + " if x in to_delete:\n", + " del globals()[x]\n", + " print(f'{x} deleted.')" + ] + }, { "cell_type": "code", "execution_count": null, @@ -1845,21 +1897,14 @@ "\n", "The code is not any faster, because our implementation of numpy already blocks the matrices and uses all cores, but it shows the principle. Also, remember that we are transferring the data to the engines in every call!" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024 (local)", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -1871,7 +1916,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.12.3" } }, "nbformat": 4, diff --git a/solutions/07_LocalTaskParallel.ipynb b/solutions/07_LocalTaskParallel.ipynb index bdd868d7b777474ff12dc906f73536d85a982960..0fd1c4552d076d7538b689da9ca020fcfc1a4332 100644 --- a/solutions/07_LocalTaskParallel.ipynb +++ b/solutions/07_LocalTaskParallel.ipynb @@ -7,6 +7,15 @@ "# Parallel, Task-Based Computing with Load Balancing on your Local Machine" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<div class=\"dateauthor\">\n", + "11 June 2024 | Jan H. Meinke\n", + "</div>" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -19,7 +28,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "from ipyparallel import Client" @@ -28,7 +39,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "rc = Client()" @@ -44,7 +57,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "lview = rc.load_balanced_view()" @@ -53,7 +68,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "%px import numpy as np\n", @@ -63,21 +80,25 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "%%px --local\n", "import threadpoolctl\n", - "threadpoolctl.threadpool_limits(limits=32, user_api='blas')" + "threadpoolctl.threadpool_limits(limits=4, user_api='blas')" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "n = 4096\n", + "n = 2048\n", "A = np.random.random([n, n])\n", "B = np.random.random([n, n])\n", "C = np.dot(A, B)" @@ -86,7 +107,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "tnp = %timeit -o A@B" @@ -95,7 +118,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "a00 = A[:n // 2, :n // 2]\n", @@ -111,7 +136,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "c00h = lview.apply(lambda a, b, c, d : np.dot(a, b) + np.dot(c, d), a00, b00, a01, b10)\n", @@ -123,7 +150,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "c00h.wait()\n", @@ -135,7 +164,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "c00 = c00h.get()\n", @@ -177,6 +208,26 @@ "It's probably about the same, so why would we use the *load-balanced view*? For starters, we can throw more tasks at our engines than there are workers. In the previous example, we split our matrices in four blocks. Let's write a function that takes a square matrix with n rows and columns, where n is multiple of threshold, that uses tiles of size threshold." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Before we continue let's free some memory (in the drop-down menu that opens on a right mouse click you can open the Variable Inspector that also shows the size of the arrays):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "to_delete=[\"a00\", \"a01\", \"a10\", \"a11\", \"b00\", \"b01\", \"b10\", \"b11\", \"c00\", \"c01\", \"c10\", \"c11\"]\n", + "for x in dir():\n", + " if x in to_delete:\n", + " del globals()[x]\n", + " print(f'{x} deleted.')" + ] + }, { "cell_type": "code", "execution_count": null, @@ -337,6 +388,13 @@ "BlockMatrixMultiply?" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**The next cells will not work on a Cloud instance due to the amount of RAM required.** (A dummy value was inserted to avoid accidental killing of the engines)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -350,7 +408,7 @@ "metadata": {}, "outputs": [], "source": [ - "n = 16384\n", + "n=8 # 16384 # switch value only on a computer with sufficient RAM!\n", "A = np.random.random([n, n])\n", "B = np.random.random([n, n])\n", "C = np.dot(A, B)" @@ -396,9 +454,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy" }, "language_info": { "codemirror_mode": { @@ -410,7 +468,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/solutions/08_Numba vectorize.ipynb b/solutions/08_Numba vectorize.ipynb index 85f5eaa9aae3df49f0e0c72347379684e96627cf..c56de98923b9cc3f345a296b7ae9db1212065c77 100644 --- a/solutions/08_Numba vectorize.ipynb +++ b/solutions/08_Numba vectorize.ipynb @@ -3,24 +3,28 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Numba vectorize\n", "\n", "<div class=\"dateauthor\">\n", - "13 June 2023 | Jan H. Meinke\n", + "11 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Numba offers a decorator `@vectorize` that allows us to generate **fast** [ufuncs](https://numpy.org/doc/stable/reference/ufuncs.html). " @@ -30,9 +34,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -46,9 +52,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## A simple trig function" @@ -57,9 +65,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Let's implement a simple trig function:" @@ -69,9 +79,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -82,9 +94,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -96,9 +110,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Passing numpy arrays as arguments" @@ -108,13 +124,15 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ - "n = 1000000\n", + "n = 1_000_000\n", "a = np.ones(n, dtype='int8')\n", "b = 2 * a" ] @@ -123,9 +141,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -135,20 +155,24 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ - "The function sinasinb is only defined for scalars, so we have to do something if we want to pass an array." + "The error is expected. The function `sinasinb` is only defined for scalars, so we have to do something if we want to pass an array." ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## numpy.vectorize" @@ -157,9 +181,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "NumPy provides the function `vectorize`." @@ -169,9 +195,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -182,9 +210,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -194,9 +224,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## numba.vectorize" @@ -205,9 +237,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Dynamic ufuncs" @@ -217,9 +251,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "-" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -230,9 +266,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -242,9 +280,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The function usinacosb is a *dynamic ufunc*. The arguments are determined when the function is called and only then is the function compiled." @@ -253,9 +293,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "### Eager compilation" @@ -264,9 +306,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Assume, we know with what kind of arguments a function is called, then numba can generate code as soon as we call numba vectorize. The decorator can take a list of [type specification](https://numba.readthedocs.io/en/stable/reference/types.html#signatures) strings of the form \"f8(f8, f8)\", where the type before the parentheses is the return type and the types within the parentheses are the argument types." @@ -276,9 +320,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -290,9 +336,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "### target" @@ -301,9 +349,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "If I use eager compilation I can give an addition keyword argument: *target*." @@ -312,9 +362,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "target=\"cpu\": default, run in a single thread on the CPU" @@ -323,9 +375,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "target=\"parallel\": run in multiple threads" @@ -334,9 +388,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "target=\"cuda\": run on a CUDA-capable GPU" @@ -346,9 +402,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -359,19 +417,28 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ + "numba.set_num_threads(16) # Limit the number of threads numba uses.\n", "%timeit pusinacosb(a,b)" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "skip" + }, + "tags": [] + }, "outputs": [], "source": [ "n = 100_000_000\n", @@ -382,19 +449,29 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "skip" + }, + "tags": [] + }, "outputs": [], "source": [ "%timeit usinacosb(a, b)\n", - "%timeit pusinacosb(a, b) " + "for t in [2, 4, 8, 16,32]:\n", + " numba.set_num_threads(t) # Limit the number of threads numba uses.\n", + " %timeit pusinacosb(a, b) " ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Exercise: The Mandelbrot set" @@ -403,9 +480,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The Mandelbrot set is the set of points *c* in the complex plane for which" @@ -413,7 +492,13 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "$$z_{i+1} = z_i^2 + c$$" ] @@ -421,9 +506,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "does not diverge.\n", @@ -434,9 +521,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Since it is impracticable to calculate an infinite number of iterations, one usually sets an upper limit for the number of iterations, for example, 20." @@ -445,9 +534,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "### Escape time algorithm" @@ -456,9 +547,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "A simple implementation of this algorithm is the following:" @@ -468,9 +561,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -497,9 +592,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Todo:\n", @@ -515,6 +612,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -538,6 +636,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -556,6 +655,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -572,6 +672,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -581,25 +682,13 @@ "source": [ "%timeit M = escape_time_vec(P, 50)" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "skip" - }, - "tags": [] - }, - "outputs": [], - "source": [] } ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -611,7 +700,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/solutions/09_NumbaIntro.ipynb b/solutions/09_NumbaIntro.ipynb index f2f784c0331c9060cea40da0fe8923c415c8a9c6..9accc95abb1dcaf4df59e48820f6566d9d30030e 100644 --- a/solutions/09_NumbaIntro.ipynb +++ b/solutions/09_NumbaIntro.ipynb @@ -11,7 +11,7 @@ "# Introduction to Numba's jit compiler\n", "\n", "<div class=\"dateauthor\">\n", - "14 June 2023 | Jan H. Meinke\n", + "12 June 2024 | Jan H. Meinke\n", "</div>" ] }, @@ -295,7 +295,7 @@ "Sum: 5033.24 in 0.717281 µs. 13941.5 MFLOP. \n", "```\n", "\n", - "The function takes about 0.7 µs. This is more than 10,000 times faster than the interpreted Python loop. \n", + "The function takes about 0.7 µs. This is more than 1,000 times faster than the interpreted Python loop. \n", "Wouldn't it be great if we could take the Python code in `python_sum` and compile it to machine \n", "code to get some of this speedup?" ] @@ -576,7 +576,7 @@ } }, "source": [ - "OK, the Python loop is about 30000 times slower than numpy's `dot` method. Let's see if we can't make this faster using numba. This time, we'll use `jit` as a decorator." + "OK, the Python loop is about 4,500 times slower than numpy's `dot` method. Let's see if we can't make this faster using numba. This time, we'll use `jit` as a decorator." ] }, { @@ -1214,7 +1214,7 @@ } }, "source": [ - "Now, this is interesting. If you look at Line 3 of the version called with float32, it still defines\n", + "Now, this is interesting. If you look at Line 4 of the version called with float32, it still defines\n", "`res` as a double precision number! This will prevent it from vectorizing the loop using single \n", "precision arguments, which potentially cuts performance in half!\n", "\n", @@ -1294,8 +1294,8 @@ "source": [ "Doesn't look like it. \n", "\n", - "Let's dig a little deeper. A speedup would come from the fact that the Skylake-X processor used for \n", - "JUWELS Cluster can operate on 16 single precision numbers at once compared to 8 double precision \n", + "Let's dig a little deeper. A speedup would come from the fact that the AMD EPYC 7742 processor used for \n", + "JUSUF Cluster can operate on 8 single precision numbers at once compared to 4 double precision \n", "numbers, but that assumes it's using the right instructions. For that we have to look at the assembler.\n", "\n", "We define a helper function to find instructions in the assembler code." @@ -1502,9 +1502,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -1516,7 +1516,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/solutions/10_Speeding up your code with Cython.ipynb b/solutions/10_Speeding up your code with Cython.ipynb index b7f8e44e14834de7f5035697086424801d8c0984..fd531522226c7e81911279ca73f179aeb6c98380 100644 --- a/solutions/10_Speeding up your code with Cython.ipynb +++ b/solutions/10_Speeding up your code with Cython.ipynb @@ -20,7 +20,7 @@ }, "source": [ "<div class=\"dateauthor\">\n", - "14 June 2023 | Jan H. Meinke\n", + "12 June 2024 | Jan H. Meinke\n", "</div>" ] }, @@ -170,7 +170,7 @@ } }, "source": [ - "Elementwise access to NumPy arrays is often slower as elementwise access to lists.\n", + "Elementwise access to NumPy arrays is often slower than elementwise access to lists.\n", "\n", "Now let us invoke Cython" ] @@ -603,6 +603,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -634,6 +635,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -655,7 +657,7 @@ } }, "source": [ - "Our data set is too small to benefit from parallelization. The overhead due to starting multiple threads is too large for this problem size." + "Our data set is too small to benefit from parallelization. The overhead due to starting multiple threads is too large for this problem size. Play around with the number of threads to see how many threads are beneficial." ] }, { @@ -710,6 +712,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -729,6 +732,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -744,9 +748,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "\n", @@ -757,9 +763,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -770,6 +778,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -789,6 +798,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -804,9 +814,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Building a Cython extension outside of a notebook" @@ -874,7 +886,13 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "```python\n", "from setuptools import Extension, setup\n", @@ -897,12 +915,14 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ - "**Exercise:** Take the Cython code that defines dot using `prange` in [Adding OpenMP](#Adding-OpenMP) and write it to `dot.pyx` using the `%%writefile` magic. Make sure to comment out the `cython magic`. Take the above code for setup.py and copy it into a file called `setup.py`. Change the setup.py code to build a module named dot and use `dot.pyx`. Then build the extension in a terminal window with the command. **Note:** Make sure our environment is loaded `source hpcpy21`.\n", + "**Exercise:** Take the Cython code that defines dot using `prange` in [Adding OpenMP](#Adding-OpenMP) and write it to `dot.pyx` using the `%%writefile` magic. Make sure to comment out the `cython magic`. Take the above code for setup.py and copy it into a file called `setup.py`. Change the setup.py code to build a module named dot and use `dot.pyx`. Then build the extension in a terminal window with the command. **Note:** Make sure our environment is loaded `source $PROJECT_traingin2421/hpcpy24`.\n", "\n", "```bash\n", "python setup.py build_ext --inplace\n", @@ -910,7 +930,7 @@ "\n", "If the build fails with `#include \"numpy/arrayobject.h\" not found`, you need to add the include path for numpy. Luckily, numpy has a function for that: `numpy.get_include()`. Add the include path to the extra_compile_args. Include paths are added using `-I/path/to/be/included`. Since `setup.py` is a Python script you can call `numpy.get_include()` in the script and don't have to hardcode the path.\n", "\n", - "Write a test program that loads and tests the extension. Add a doc string to the dot function and include an example section like this:\n", + "Let's add a doc string to the dot function and include an example section like this that loads and tests the extension:\n", "\n", "```python\n", "def dot(...):\n", @@ -934,9 +954,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "### Comparison with Numba" @@ -945,9 +967,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Numba, can generate fast functions, too." @@ -957,9 +981,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -976,9 +1002,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -995,9 +1023,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1008,9 +1038,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1021,9 +1053,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1035,9 +1069,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1048,9 +1084,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1064,9 +1102,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1078,9 +1118,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1100,9 +1142,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Finally, let's compare the performance for a larger data set. Remember the last version of our dot function uses OpenMP." @@ -1112,9 +1156,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1124,15 +1170,18 @@ "with threadpoolctl.threadpool_limits(16):\n", " %timeit dot(v,w)\n", "%timeit udot(v,w)\n", + "%timeit udotg(v,w)\n", "%timeit np.dot(v,w)" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Cython and classes" @@ -1141,9 +1190,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Sometimes, we want to do more than just wrap a function. We might want an efficient data type that implements some operators, for example. For this Cython allows us to declare classes just like in Python:" @@ -1153,9 +1204,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1173,9 +1226,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1186,9 +1241,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1198,9 +1255,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Extension types" @@ -1209,9 +1268,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "There is a second type of classes called *[extension types](http://cython.readthedocs.io/en/latest/src/userguide/extension_types.html)*. An extension type stores its members and methods in a C struct instead of a Python dictionary. This makes them more efficient but also more restrictive. Let's look at an example:" @@ -1221,9 +1282,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1254,9 +1317,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The first thing to note is the definition using `cdef class`. It's the reason extension types are also referred to as cdef classes. We can define functions that are only visible to C using `cdef` and Python functions using `def` (or both at once with `cpdef`). For functions defined with `cdef`, we need to give the type of self as well as a return type.\n", @@ -1268,9 +1333,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1281,9 +1348,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "### Exercise" @@ -1292,9 +1361,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Try which methods of Point you can call." @@ -1355,9 +1426,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1369,9 +1442,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [] @@ -1379,9 +1454,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -1393,7 +1468,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/solutions/11_Writing your own Python bindings.ipynb b/solutions/11_Writing your own Python bindings.ipynb index ec69f6d7cc68e7e07b456d106fc0bed6f2f921ae..ebca2354da974f9be4baa4b23baf7a79db95585e 100644 --- a/solutions/11_Writing your own Python bindings.ipynb +++ b/solutions/11_Writing your own Python bindings.ipynb @@ -3,9 +3,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Writing language bindings" @@ -13,19 +15,27 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "<div class=\"dateauthor\">\n", - "14 June 2023 | Jan H. Meinke\n", + "12 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Why bindings?" @@ -77,9 +87,12 @@ { "cell_type": "markdown", "metadata": { + "editable": true, + "jp-MarkdownHeadingCollapsed": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Preparations\n", @@ -92,15 +105,17 @@ "\n", "Wait until the build has finished and then continue with this notebook.\n", "\n", - "**Tip:** You can open a terminal from within JupyterLab by going to File->New->Terminal. To get the right environment in a terminal `source $PROJECT_training2318/hpcpy23`." + "**Tip:** You can open a terminal from within JupyterLab by going to File->New->Terminal. To get the right environment in a terminal `source $PROJECT_training2421/hpcpy24`." ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "## Ctypes" @@ -186,9 +201,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -199,9 +216,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -217,9 +236,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "What if word_frequency had been written in Fortran?" @@ -228,9 +249,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "```Fortran\n", @@ -247,9 +270,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "We can access Fortran functions almost like C functions. The exact function name may differ, though. The default symbol \n", @@ -261,14 +286,16 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "### Exercise\n", "Use the terminal that you used earlier to run `build.sh` or open a new one. Make sure you are in the \n", - "tutorial directory. Source `hpcpy23` using `source $PROJECT/hpcpy23`. Change into code/textstats/ and compile \n", + "tutorial directory. Source `hpcpy24` using `source $PROJECT/hpcpy24`. Change into code/text_stats/ and compile \n", "the file word_frequency.F90 with the following command:\n", "\n", "```bash\n", @@ -288,9 +315,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -307,6 +336,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -329,9 +359,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "If you compiled the library with the option `-fno-underscoring`, you could use the original declaration without underscore with libwf.so.\n", @@ -352,7 +384,7 @@ }, "outputs": [], "source": [ - "# Solution if compiled with ifort and -assume nounderscore (or gfortran and -fno-underscoring)\n", + "# Solution if compiled with ifort/ifx and -assume nounderscore (or gfortran and -fno-underscoring)\n", "ffi = cffi.FFI()\n", "ffi.cdef(\"\"\"\n", " int word_frequency(char* filename, char* word);\n", @@ -531,9 +563,14 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "notes" + }, + "tags": [] + }, "source": [ - "**Note** Unfortunately, this doesn't work the way it's supposed to. Although `-L` should add the path to the library to the search path of the linker, the linker still doesn't find the library. To make it work, I added the path to libtext_stats.so to the `LD_LIBRARY_PATH` when the kernel is loaded." + "**Note** Unfortunately, this doesn't work the way it's supposed to *inside a JupyterLab*. Although `-L` should add the path to the library to the search path of the linker, the linker still doesn't find the library. To make it work, I added the path to libtext_stats.so to the `LD_LIBRARY_PATH` when the kernel is loaded." ] }, { @@ -974,6 +1011,23 @@ } }, "outputs": [], + "source": [ + "p = PyPoint3D(1,1,1)\n", + "p.translate(-0.5, -0.5, -0.5)\n", + "p.coordinates()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "skip" + }, + "tags": [] + }, + "outputs": [], "source": [ "t_point_cython = %timeit -o p = PyPoint3D(1,1,1); p.translate(-0.5, -0.5, -0.5);p.coordinates()" ] @@ -1081,9 +1135,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1093,9 +1149,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Using the extension" @@ -1105,9 +1163,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1118,9 +1178,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1130,9 +1192,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Note that we didn't have to convert our string at all. It's done automatically by PyBind11." @@ -1141,9 +1205,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "### Wrapping a class with Pybind11" @@ -1152,9 +1218,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "PyBind11 can deal with classes, too. The following code wraps the Point3D class:" @@ -1163,9 +1231,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "```c++\n", @@ -1328,7 +1398,10 @@ }, "outputs": [], "source": [ - "!f2py -c code/point/points.f90 -m points_f" + "buildlog = !f2py -c code/point/points.f90 -m points_f\n", + "print('\\n'.join(buildlog[:8]))\n", + "print('...')\n", + "print('\\n'.join(buildlog[-1:]))" ] }, { @@ -1527,9 +1600,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -1541,7 +1614,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/solutions/12_Introduction to MPI.ipynb b/solutions/12_Introduction to MPI.ipynb index dc37c65ac761011aa573c411bd5ece7e15778983..4a2731bb1b16d5f16e146734a120d657f06b4e18 100644 --- a/solutions/12_Introduction to MPI.ipynb +++ b/solutions/12_Introduction to MPI.ipynb @@ -3,24 +3,28 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Introduction to MPI\n", "\n", "<div class=\"dateauthor\">\n", - "15 June 2023 | Jan H. Meinke\n", + "13 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "MPI (Message Passing Interface) is the most used protocol for communicating between processes. It doesn't matter if the processes that want to talk to each other are on the same or different nodes (i.e., computers). In this tutorial, we'll use `mpi4py` to learn about MPI and its API.\n", @@ -192,6 +196,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -200,6 +205,8 @@ "outputs": [], "source": [ "%%writefile hello_mpi.py\n", + "#!/usr/bin/env python3\n", + "\n", "from mpi4py import MPI\n", "comm = MPI.COMM_WORLD\n", "rank = comm.Get_rank()\n", @@ -211,6 +218,22 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "!chmod u+x hello_mpi.py" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -218,15 +241,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --reservation tr2318-20230615-cpu python3 hello_mpi.py " + "!srun -n 4 -p batch -A training2421 --reservation hpcwp_20240613 ./hello_mpi.py " ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Point to point" @@ -276,9 +301,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -308,6 +335,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -315,15 +343,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --time 00:10:00 --reservation tr2318-20230615-cpu python3 hello_ptp.py" + "!srun -n 4 -p batch -A training2421 --time 00:10:00 --reservation hpcwp_20240613 python3 hello_ptp.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "**Note**, how we used `rank` to perform different work on the task with rank 0 and the task with rank 1 using if statements. This is a common pattern in MPI programs. The task with rank 0 is often referred to as *root*." @@ -369,6 +399,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -398,6 +429,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -405,15 +437,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --time 00:10:00 --reservation tr2318-20230615-cpu python3 hello_sendrecv.py" + "!srun -n 4 -p batch -A training2421 --time 00:10:00 --reservation hpcwp_20240613 python3 hello_sendrecv.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Parallel reduction" @@ -492,6 +526,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -527,9 +562,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Since we are dealing with NumPy arrays, we can use the efficient uppercase versions of the MPI calls. Scatter distributes an array evenly among all nodes. Note, the sendbuf only needs to be allocated on node zero, but the variable must exist everywhere." @@ -551,6 +588,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -558,15 +596,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --time 00:10:00 --reservation tr2318-20230615-cpu python3 mpi_reduction.py" + "!srun -n 4 -p batch -A training2421 --time 00:10:00 --reservation hpcwp_20240613 python3 mpi_reduction.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Upper vs. lowercase in mpi4py" @@ -575,17 +615,25 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ - "`mpi4py` offers two version of many calls. The first one is written in uppercase. It uses memory buffers, e.g., `numpy.array`, and maps the call directly to the appropriate C call. The second version is written in lower case and takes arbitrary Python object. The result is given as the return value. Note, that for the uppercase versions all `a_partial` must have the same size!" + "`mpi4py` offers two versions of many calls. The first one is written in uppercase. It uses memory buffers, e.g., `numpy.array`, and maps the call directly to the appropriate C call. The second version is written in lowercase and takes arbitrary Python objects. The result is given as the return value. Note, that for the uppercase versions all `a_partial` must have the same size!" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "```python\n", "a_partial = numpy.empty(N)\n", @@ -620,6 +668,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -658,6 +707,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -665,15 +715,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --time 00:10:00 --reservation tr2318-20230615-cpu python3 mpi_upper.py" + "!srun -n 4 -p batch -A training2421 --time 00:10:00 --reservation hpcwp_20240613 python3 mpi_upper.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "The following code uses the lowercase versions of the calls and works independent of the size of a_partial:" @@ -682,6 +734,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" }, @@ -707,6 +760,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -744,6 +798,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -751,15 +806,17 @@ }, "outputs": [], "source": [ - "!srun --pty -n 4 -p batch -A training2318 --time 00:10:00 --reservation tr2318-20230615-cpu python3 mpi_lower.py" + "!srun -n 4 -p batch -A training2421 --time 00:10:00 --reservation hpcwp_20240613 python3 mpi_lower.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Now, `a_all` contains a `list` of `np.array`s.\n", @@ -815,6 +872,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -854,6 +912,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -891,22 +950,28 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, + "slideshow": { + "slide_type": "skip" + }, "tags": [ "Solution" ] }, "outputs": [], "source": [ - "!srun --pty -n 4 -A training2318 --time 00:10:00 python3 mpi_ptp1.py\n", - "!srun --pty -n 4 -A training2318 --time 00:10:00 python3 mpi_ptp2.py" + "!srun -n 4 -p batch -A training2421 --reservation hpcwp_20240613 --time 00:10:00 python3 mpi_ptp1.py\n", + "!srun -n 4 -p batch -A training2421 --reservation hpcwp_20240613 --time 00:10:00 python3 mpi_ptp2.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Domain decomposition" @@ -943,9 +1008,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -958,9 +1025,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -972,7 +1041,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "plt.figure(figsize=(15, 5))\n", @@ -982,9 +1057,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "The system is basically a square grid. " @@ -1114,20 +1191,26 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ - "I would recommend using an editor for this exercise. Jupyter Lab comes with an editor that supports syntax highlighting but no auto completion. You can find it under File->New->Text File. The new file will be called `Untitled.txt`. You can change the file name by righ-clicking on the editor tab or right-clicking on the file in the file browser view on the left." + "I would recommend using an editor for this exercise. Jupyter Lab comes with an editor that supports syntax highlighting but no auto completion. You can find it under File->New->Text File. The new file will be called `Untitled.txt`. You can change the file name by righ-clicking on the editor tab or right-clicking on the file in the file browser view on the left. Use one of the srun commands we used earlier to start your program from a terminal.\n", + "\n", + "**Note**: The terminal does not have the same modules loaded as the notebook. To fix that type `source $PROJECT_training2421/hpcpy24`." ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "1. Take the program from the [Stencil][TV_Stencils] and use a 1d domain decomposition as described \n", @@ -1154,6 +1237,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1296,9 +1380,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "source": [ "$$t_1 = t_s + t_p$$" @@ -1307,9 +1393,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "The runtime for $n$ processors is then" @@ -1440,9 +1528,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1454,9 +1544,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1467,6 +1559,10 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [] }, "outputs": [], @@ -1480,9 +1576,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "## Speedup using Amdahl's law" @@ -1491,7 +1589,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "plt.figure(figsize=(5, 2.5), dpi=150)\n", @@ -1503,9 +1607,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "## IPyParallel and MPI" @@ -1514,9 +1620,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "### Starting the engines" @@ -1548,6 +1656,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1556,12 +1665,13 @@ "source": [ "Click on the ``+``-sign at the top of the Files tab on the left to start a new launcher. In the launcher click on Terminal. A terminal will open as a new tab. Grab the tab and pull it to the right to have the terminal next to your notebook.\n", "\n", - "**Note**: The terminal does not have the same modules loaded as the notebook. To fix that type `source $PROJECT_training2318/hpcpy23`." + "**Note**: The terminal does not have the same modules loaded as the notebook. To fix that type `source $PROJECT_training2421/hpcpy24`." ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1574,7 +1684,7 @@ "\n", "```bash\n", "export OMP_NUM_THREADS=32\n", - "srun -n 4 -c 32 --ntasks-per-node 4 --time 00:30:00 -A training2318 --reservation tr2318-20230615-cpu ipengine start\n", + "srun -n 4 -c 32 --ntasks-per-node 4 --time 00:30:00 -A training2421 --reservation hpcwp_20240613 ipengine start\n", "```\n", "\n", "**Note**, you can can start the controller and the engines in separate terminals. That will keep the output separate." @@ -1583,9 +1693,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "### Connecting to the engines" @@ -1606,9 +1718,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1952,16 +2066,22 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "skip" + }, + "tags": [] + }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023 (local)", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { diff --git a/solutions/13_Introduction to CuPy.ipynb b/solutions/13_Introduction to CuPy.ipynb index 807eb348f8c01c4c666a8ccbf9504315882b0e40..03aee5576248e34910c2364eb98e20f1a98f30bf 100644 --- a/solutions/13_Introduction to CuPy.ipynb +++ b/solutions/13_Introduction to CuPy.ipynb @@ -10,7 +10,7 @@ "source": [ "# Introduction to CuPy\n", "<div class=\"dateauthor\">\n", - "15 June 2023 | Jan H. Meinke\n", + "13 June 2024 | Jan H. Meinke\n", "</div>\n", "<img src=\"images/cupy.png\" style=\"float:right\">" ] @@ -134,12 +134,13 @@ }, "outputs": [], "source": [ - "!srun --pty -N 1 -p gpus -A training2318 --time 00:10:00 --reservation tr2318-20230615-gpu python3 cupy_matrix_mul.py" + "!srun --pty -N 1 -p gpus -A training2421 --time 00:10:00 --reservation hpcwp_gpu_20240613 python3 cupy_matrix_mul.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -207,6 +208,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -214,30 +216,18 @@ }, "outputs": [], "source": [ - "!srun --pty -N 1 -p gpus -A training2318 --time 00:10:00 --reservation tr2318-20230615-gpu python3 cupy_matrix_mul_w_timing.py" + "!srun --pty -N 1 -p gpus -A training2421 --time 00:10:00 --reservation hpcwp_gpu_20240613 python3 cupy_matrix_mul_w_timing.py" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, "tags": [] }, - "outputs": [], - "source": [ - "!srun --pty -N 1 -p develgpus -A training2318 --time 00:10:00 python3 cupy_matrix_mul_w_timing.py" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "skip" - } - }, "source": [ "### Exercise\n", "In [Think Vector][TV], you [calculated the Mandelbrot set][TV_Mandelbrot] using [NumPy][] and vectorization. Take either your solution or ours and convert it to [CuPy][]. Visualize the result.\n", @@ -261,9 +251,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -281,6 +273,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -288,7 +281,7 @@ }, "outputs": [], "source": [ - "!srun -p gpus -A slbio python cupy_mandelbrot_exercise.py\n", + "!srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 python cupy_mandelbrot_exercise.py\n", "image = matplotlib.image.imread(\"cupy_mandelbrot_exercise.png\")\n", "plt.imshow(image)\n", "plt.axis('off')" @@ -298,6 +291,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -337,6 +331,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -408,6 +403,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -440,6 +436,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -447,12 +444,13 @@ }, "outputs": [], "source": [ - "!srun -p gpus -A training2318 --reservation tr2318-20230615-gpu python3 cupy_matrix_mul_w_timing2.py" + "!srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 python3 cupy_matrix_mul_w_timing2.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -466,6 +464,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -485,9 +484,9 @@ " B = numpy.random.random((N, N)).astype(numpy.float32)\n", " #C = A @ B\n", "\n", - " for nt in [16, 32, 64, 128, 256]:\n", + " for nt in [16, 32, 64, 128]: # This part is not required for the exercise\n", " t0 = time.time()\n", - " with threadpoolctl.threadpool_limits(limits=nt, user_api='blas'):\n", + " with threadpoolctl.threadpool_limits(limits=nt, user_api='openmp'): # May have to use blas instead of openmp in other environemnts\n", " for r in range(repeats):\n", " C = A @ B\n", " t1 = time.time()\n", @@ -499,6 +498,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -506,12 +506,13 @@ }, "outputs": [], "source": [ - "!srun -p batch -n 1 -c 256 -A training2318 --pty --reservation tr2318-20230615-cpu python3 numpy_matrix_mul_w_timing2.py" + "!OMP_NUM_THREADS=128 srun -p batch -n 1 -c 128 --hint=nomultithread -A training2421 --pty --reservation hpcwp_20240613 python3 numpy_matrix_mul_w_timing2.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -633,6 +634,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -658,6 +660,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -665,12 +668,13 @@ }, "outputs": [], "source": [ - "!srun -p gpus -A training2318 --reservation tr2318-20230615-gpu python3 cupy_to_and_fro.py" + "!srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 python3 cupy_to_and_fro.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" }, @@ -753,9 +757,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -767,7 +771,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.11.3" } }, "nbformat": 4, diff --git a/solutions/14_CUDA for Python.ipynb b/solutions/14_CUDA for Python.ipynb index 6c82362a4f7ef4e45c727abbf832ae7aa32b30ca..4122dc6fb95e22a702e32547fb42964430170fcb 100644 --- a/solutions/14_CUDA for Python.ipynb +++ b/solutions/14_CUDA for Python.ipynb @@ -3,15 +3,17 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Numba and GPUs\n", "\n", "<div class=\"dateauthor\">\n", - "15 June 2023 | Jan H. Meinke\n", + "13 June 2024 | Jan H. Meinke\n", "</div>" ] }, @@ -19,9 +21,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -37,9 +41,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Ufunc" @@ -48,9 +54,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "We already learned how to vectorize a function. Remember the Mandelbrot set. We defined a function that returns the number of iterations needed to decide if the algorithm diverges." @@ -60,9 +68,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -84,9 +94,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -100,9 +112,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -114,9 +128,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -126,9 +142,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "If you replace `target=\"parallel\"` with `target=\"cuda\"` the function runs on the GPU instead. Give it a try and compare the performance for different sizes of the grid:" @@ -138,9 +156,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -175,6 +195,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -214,6 +235,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -221,7 +243,7 @@ }, "outputs": [], "source": [ - "res = !srun -p gpus -A training2318 --reservation tr2318-20230615-gpu ipython mandelbrot_vectorize_cuda.ipy\n", + "res = !srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 ipython mandelbrot_vectorize_cuda.ipy\n", "t_gpu = numpy.array(eval(res[-1]))\n", "print(f\"Runtime: {t_gpu.mean():.3f}±{t_gpu.std():.3f} s.\")" ] @@ -229,9 +251,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## CUDA for Python" @@ -418,9 +442,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "fragment" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -445,9 +471,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -459,9 +487,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Notice that for every pair (i, j), we calculate the escape time. This makes\n", @@ -865,6 +895,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -930,6 +961,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -937,7 +969,7 @@ }, "outputs": [], "source": [ - "res = !srun -p gpus -A training2318 --reservation tr2318-20230615-gpu ipython cuda_mandelbrot1.ipy\n", + "res = !srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 ipython cuda_mandelbrot1.ipy\n", "t_gpu = numpy.array(eval(res[-1]))\n", "print(f\"Runtime: {t_gpu.mean() * 1000:.3f}±{t_gpu.std() * 1000:.3f} ms.\")" ] @@ -945,9 +977,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "b) The kernel calculates dx and dy for every pixel although it is the same for all of them. Change the kernel so that it takes dx and dy as arguments and calculate dx and dy before you call the kernel. Does this improve the performance?" @@ -957,9 +991,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -1004,6 +1040,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1071,6 +1108,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1078,7 +1116,7 @@ }, "outputs": [], "source": [ - "res = !srun -p gpus -A training2318 --reservation tr2318-20230615-gpu ipython cuda_mandelbrot2.ipy\n", + "res = !srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 ipython cuda_mandelbrot2.ipy\n", "t_gpu = numpy.array(eval(res[-1]))\n", "print(f\"Runtime: {t_gpu.mean() * 1000:.3f}±{t_gpu.std() * 1000:.3f} ms.\")" ] @@ -1086,9 +1124,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "c) Add an additional argument `maxtime` to the kernel, so that you can time the kernel for different escape time values. Don't forget to add the new argument to the documentation." @@ -1200,6 +1240,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1207,7 +1248,7 @@ }, "outputs": [], "source": [ - "res = !srun -p gpus -A training2318 --reservation tr2318-20230615-gpu ipython cuda_mandelbrot3.ipy\n", + "res = !srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 ipython cuda_mandelbrot3.ipy\n", "t_gpu = numpy.array(eval(res[-1]))\n", "print(f\"Runtime: {t_gpu.mean() * 1000:.3f}±{t_gpu.std() * 1000:.3f} ms.\")" ] @@ -1381,6 +1422,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1452,6 +1494,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1459,7 +1502,7 @@ }, "outputs": [], "source": [ - "res = !srun -p gpus -A training2318 --reservation tr2318-20230615-gpu ipython cuda_mandelbrot4.ipy\n", + "res = !srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 ipython cuda_mandelbrot4.ipy\n", "t_gpu = numpy.array(eval(res[-1]))\n", "print(f\"Runtime: {t_gpu.mean() * 1000:.3f}±{t_gpu.std() * 1000:.3f} ms.\")" ] @@ -1467,6 +1510,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1526,6 +1570,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1574,6 +1619,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -1581,15 +1627,17 @@ }, "outputs": [], "source": [ - "!srun -p gpus -A training2318 --reservation tr2318-20230615-gpu python3 cuda_matrixmul.py" + "!srun -p gpus -A training2421 --reservation hpcwp_gpu_20240613 python3 cuda_matrixmul.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Using shared memory" @@ -1598,14 +1646,16 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "As you learned in Bottlenecks, the matrix matrix multiplication tends to be memory bandwidth bound. This is true on the GPU, too.\n", "\n", - "The way to make it faster is to use faster memory. On a CPU this usually means, dividing the matrix into blocks that fit in cache and hope for the best. On a GPU at lease part of the fast memory is usually programmable. In CUDA this memory is called *shared memory*.\n", + "The way to make it faster is to use faster memory. On a CPU this usually means, dividing the matrix into blocks that fit in cache and hope for the best. On a GPU at least part of the fast memory is usually programmable. In CUDA this memory is called *shared memory*.\n", "\n", "Shared memory is available to all *threads in a thread block*. Usually, each thread loads data from device memory into shared memory. This is followed by barrier, so that all threads are finished reading. Then the shared memory is reused as often as possible. Another barrier makes sure that all threads are done." ] @@ -1613,9 +1663,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "Let's look at an example:" @@ -1624,9 +1676,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "subslide" - } + }, + "tags": [] }, "source": [ "## Matrix multiplication with shared memory" @@ -1752,9 +1806,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023 (local)", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { diff --git a/solutions/15_CUDA and MPI.ipynb b/solutions/15_CUDA and MPI.ipynb index 982c4a3aa2c129fec067be8b1f5ea86a4a85bd13..fd7be9dcd84f4324a6086bdb9e86ede320dd84e7 100644 --- a/solutions/15_CUDA and MPI.ipynb +++ b/solutions/15_CUDA and MPI.ipynb @@ -3,24 +3,28 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# CUDA for Python and MPI4Py\n", "\n", "<div class=\"dateauthor\">\n", - "15 June 2023 | Jan H. Meinke\n", + "13 June 2024 | Jan H. Meinke\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## The Kernel" @@ -52,9 +56,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "-" - } + }, + "tags": [] }, "source": [ "```python\n", @@ -256,13 +262,15 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ - "%writefile parallel_shift.py\n", + "%%writefile parallel_shift.py\n", "\n", "Your code goes here\n", "\n", @@ -274,6 +282,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -342,21 +351,27 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [ "Solution" ] }, "outputs": [], "source": [ - "!srun -n 4 -c 32 -p gpus -A training2318 python3 parallel_shift.py" + "!srun -n 4 -c 32 -p gpus -A training2421 --reservation hpcwp_gpu_20240613 --cuda-mps --pty python3 parallel_shift.py" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## Picking a device" @@ -378,9 +393,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "-" - } + }, + "tags": [] }, "source": [ "```python\n", @@ -393,6 +410,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -497,6 +515,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -595,6 +614,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -604,13 +624,19 @@ }, "outputs": [], "source": [ - "!srun -n 4 -c 32 -p gpus -A training2318 python3 cuda_mpi_mandelbrot.py" + "!srun -n 4 -c 32 -p gpus -A training2421 --reservation hpcwp_gpu_20240613 --cuda-mps --pty python3 cuda_mpi_mandelbrot.py" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [] }, @@ -742,6 +768,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -809,6 +836,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -816,22 +844,28 @@ }, "outputs": [], "source": [ - "!srun -p gpus -n 4 -A training2318 --reservation tr2318-20230615-gpu xenv -L mpi-settings/CUDA python3 cuda_aware_mpi_shift.py" + "!srun -p gpus -n 4 -A training2421 --reservation hpcwp_gpu_20240613 --cuda-mps --pty xenv -L MPI-settings/CUDA python3 cuda_aware_mpi_shift.py" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023 (local)", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { diff --git a/solutions/16_Introduction to Dask.ipynb b/solutions/16_Introduction to Dask.ipynb index c57ff9a764d016a54105b794cb165416b0e3c87e..cafa9429ff52202216ede87008365081414b25e1 100644 --- a/solutions/16_Introduction to Dask.ipynb +++ b/solutions/16_Introduction to Dask.ipynb @@ -3,27 +3,31 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "# Introduction to Dask\n", "\n", "<div class=\"dateauthor\">\n", - "16 June 2023 | Olav Zimmermann\n", + "14 June 2024 | Olav Zimmermann\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ - "Dask implements flexible **intra- and inter-node parallel execution based on a task model**. It features data structures that 'feel' like ordinary numpy ndarrys or pandas dataframes but under the hood have been enabled to work on **distributed data**.\n", + "Dask implements flexible **intra- and inter-node parallel execution based on a task model**. It features data structures that 'feel' like ordinary numpy ndarrays or pandas dataframes but under the hood have been enabled to work on **distributed data**.\n", "While the task based scheduling enables parallel execution of even highly irregular computation pipelines, the distributed data structures make dask also an interesting choice for processing of data volumes that are larger than main memory.\n", "\n", "Among the distinctive features of dask is peer-to-peer data sharing between workers, and high resilience provided by nanny processes that can restart failing workers.\n", @@ -34,9 +38,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## dask.delayed " @@ -44,33 +50,43 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "`dask.delayed` can be used to formulate arbitrary task graphs. \n", "\n", "It can either be employed as a decorator `@delayed` (not show in this tutorial) or as a wrapper function `dask.delayed(func)`. \n", - "This function marks a function to be scheduled by Dask. Delayed functions will be evaluated lazily, e.g. not before their result is needed. " + "This function marks a function to be scheduled by Dask. Delayed functions will be evaluated lazily, e.g., not before their result is needed. " ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "This is similar in spirit to other lazy evaluation schemes in python (e.g. `eval()`, `lambda` or `concurrent.futures`) and also similar to other task frameworks such as tensorflow. \n", "\n", - "As dask.delayed works on the level of individual functions, the user remains in control which functions will be evaluated eagerly and which ones lazily. Although Dask has a sophisticated scheduler for lazy task evaluation, eager evaluation can be preferable in some situations, e.g. for functions that control routing in the task graph, such as functions calculating data used in `if-`statements." + "As dask.delayed works on the level of individual functions, the user remains in control which functions will be evaluated eagerly and which ones lazily. Although Dask has a sophisticated scheduler for lazy task evaluation, eager evaluation can be preferable in some situations, e.g., for functions that control routing in the task graph, such as functions calculating data used in `if-`statements." ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "We first do some settings:" @@ -80,9 +96,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -93,9 +111,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "The next cell implements some dummy functions and builds a simple pipeline with some data dependencies." @@ -105,9 +125,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -146,20 +168,23 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "Try to think about it ahead of running the next cells:\n", "- What is the minimal wall time possible?\n", - "- How many tasks does the task graph have for range(8) in prepared?\n", + "- How many tasks does the task graph in `prepared` have?\n", "- How many inputs could you process maximally in the same time it takes for 8 inputs?" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -178,9 +203,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "The task graph generated by `dask` can be visualized (don't try this for large graphs, i.e. more input tasks!)." @@ -189,7 +216,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "import graphviz\n", @@ -199,9 +232,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "The computation of any of the tasks is delayed until the execution is triggered by an explicit command to compute dlresult upon which the individual tasks are scheduled according to the dependency structure." @@ -210,7 +245,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "%time dlresult.compute()" @@ -219,19 +260,22 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "- How close to optimal is the observed scheduling?\n", - "- What is the largest number of inputs you can process under 8 seconds? Why?\n", + "- What is the largest number of inputs you can process in under 8 seconds? Why?\n", "- Change the program in a way that enables you to estimate how much overhead per task is incurred by Dask." ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -287,7 +331,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "from operator import add\n", @@ -297,22 +347,30 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "l=[x for x in range(1000000)]\n", - "s= db.from_sequence(l,npartitions=4) # you can manually set the number of partitions\n", + "s= db.from_sequence(l,npartitions=4) # you can manually set the number of partitions\n", "mysum=s.fold(add) # fold performs a parallel reduction \n", - "mysum.dask # another inpection method for task graphs in dask" + "mysum.dask # another inpection method for task graphs in dask" ] }, { "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -322,7 +380,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "%time result=mysum.compute()\n", @@ -334,9 +398,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -346,9 +412,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "source": [ "(The syntax is kind of unfortunate since Python is moving away from filter and map to list comprehensions and generator expressions.)" @@ -373,6 +441,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -387,6 +456,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -402,6 +472,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -419,6 +490,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -434,6 +506,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -450,6 +523,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -465,6 +539,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -484,6 +559,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -499,6 +575,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -515,6 +592,7 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -530,6 +608,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" }, @@ -549,6 +628,10 @@ { "cell_type": "markdown", "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [ "Solution" ] @@ -576,17 +659,29 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "import dask.dataframe as dd\n", - "df = dd.read_csv(\"data/iris.csv\") # not a reasonably sized task (too small!)" + "df = dd.read_csv(\"data/iris.csv\") # not a reasonably sized task (too small!)" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "h=df.groupby(df.Name).SepalLength.mean()\n", @@ -596,9 +691,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## dask.array\n", @@ -609,7 +706,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [], "source": [ "import dask.array as da\n", @@ -621,9 +724,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -633,9 +738,11 @@ { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "notes" - } + }, + "tags": [] }, "source": [ "## numpy or dask.array?\n", @@ -649,9 +756,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -662,9 +771,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -678,9 +789,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -692,9 +805,11 @@ "cell_type": "code", "execution_count": null, "metadata": { + "editable": true, "slideshow": { "slide_type": "skip" - } + }, + "tags": [] }, "outputs": [], "source": [ @@ -702,16 +817,18 @@ "x_rechunked=x_dask.rechunk((2500,3000)) # larger chunks are no longer better for dot product calculation\n", "y_dask = x_rechunked.transpose()\n", "result=x_dask.dot(y_dask)\n", - "#with ProgressBar():\n", - "%timeit result.compute(scheduler=\"threads\")" + "with ProgressBar():\n", + " %timeit result.compute(scheduler=\"threads\")" ] }, { "cell_type": "markdown", "metadata": { + "editable": true, "slideshow": { "slide_type": "slide" - } + }, + "tags": [] }, "source": [ "## dask.distributed\n", @@ -793,9 +910,9 @@ ], "metadata": { "kernelspec": { - "display_name": "HPC Python 2023 (local)", + "display_name": "HPC Python 2024", "language": "python", - "name": "hpcpy23" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { diff --git a/solutions/17_Debugging.ipynb b/solutions/17_Debugging.ipynb index dbd87d3b7b918b6cec67cd9dd28d88982d5a3f9e..4444058701054f44959e796f4cbfa01c0f74728b 100644 --- a/solutions/17_Debugging.ipynb +++ b/solutions/17_Debugging.ipynb @@ -6,7 +6,7 @@ "source": [ "# Debugging Python\n", "<div class=\"dateauthor\">\n", - "06 June 2023 | Jan H. Meinke, Olav Zimmermann\n", + "08 June 2024 | Jan H. Meinke, Olav Zimmermann\n", "</div>" ] }, @@ -46,7 +46,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Before running the following cell try to guess what will happen: will it throw an error or a warning or will it execute normally? \n", + "**Before running the following cell** please read the code and try to guess what will happen: will it throw an error or a warning or will it execute normally? \n", "If it is one of the latter two cases, what will it print?" ] }, @@ -75,13 +75,13 @@ "source": [ "Using a debugger to execute a code (or part of it) step by step is also called **runtime debugging**. \n", "\n", - "You can switch on JupyterLab's internal debugger by clicking on the small bug icon at the top right of the notebook, next to the kernel name. You will see several panels appear in the right sidebar. In addition, each code cell of the notebook now got line numbers.\n", + "You can switch on JupyterLab's internal debugger by clicking on the small bug icon at the top right of the notebook, before the kernel name. You will see several panels appear in the right sidebar. In addition, each code cell of the notebook now got line numbers.\n", "\n", "Click on the line number of line 11 in the code cell above. A red dot appearing in front of the line number indicates that you just set a **break point**. At a break point the debugger will stop, allowing you to inspect the state of each variable that is defined at this point. To start the debugger and let it execute the code up to the break point just re-execute the cell [Shift-Return].\n", "\n", - "The navigation symbols at the top of the CallStack panel will now no longer be grayed out and allow you to execute the code line by line. With \"next\" you step over function calls within the line. With \"step in\" you can jump into the python functions called in this line of code (but not into any C library functions).\n", + "The navigation symbols at the top of the CallStack panel (depending on your Jupyter version you may have to click on the Bug symbol in the right side bar first) will now no longer be grayed out and allow you to execute the code line by line. With \"next\" you step over function calls within the line. With \"step in\" you can jump into the python functions called in this line of code (but not into any C library functions).\n", "\n", - "The \"Variables\" panel allows you to view either the global or the local variables and to switch between tree and table view. (for arrays the table view is preferable)\n", + "The \"Variables\" panel allows you to view either the global or the local variables and to switch between tree and table view (the **table view** is generally preferable, in particular for numpy arrays).\n", "\n", "**Exercise:** Try to find the bug in the code above. You can set a break point at any line. In case that you want to reset the kernel use the circle arrow button at the top of the notebook.\n", "\n", @@ -115,7 +115,7 @@ "metadata": {}, "outputs": [], "source": [ - "#%%writefile buggy.py\n", + "%%writefile buggy.py\n", "def imabuggyincrement(i,a):\n", " \"\"\"Increment a[i] by 1.\"\"\"\n", " if ii < len(a):\n", @@ -176,7 +176,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Next start pudb in a terminal with the script name as an argument. If you haven't done this in this terminal shell before, you need to source hpcpy23:" + "Next start pudb in a terminal with the script name as an argument. If you haven't done this in this terminal shell before, you need to source hpcpy24:" ] }, { @@ -184,7 +184,7 @@ "metadata": {}, "source": [ "```bash\n", - "source $PROJECT_training2318/hpcpy23\n", + "source $PROJECT_training2421/hpcpy24\n", "pudb buggy.py\n", "```" ] @@ -259,14 +259,16 @@ "* [pdb][] (builtin)\n", "* [pudb][]\n", "* IDEs (All the IDEs we mentioned have debugging support)\n", - "* [Linaro DDT][], former name ARMForge DDT (commercial, support for debugging parallel codes and C/C++ code, only rudimentary Python support)\n", - "* [TotalView][] (commercial, support for debugging parallel codes and C/C++ code, requires debug version of CPython, supports mixed language debugging, aware of cython, pybind11 and other bindings)\n", + "* [Linaro DDT][], former name ARMForge DDT (commercial, support for debugging parallel codes and C/C++ code, only rudimentary Python support: see [here][])\n", + "* [TotalView][] (commercial, support for debugging parallel codes and C/C++ code, requires debug version of CPython, supports mixed language debugging, aware of cython, pybind11 and other bindings. However, debugging of the python code itself, i.e., stepping or breakpoints, is not supported, see [TotalView User Guide][])\n", "\n", "[pdb]: https://docs.python.org/3/library/pdb.html\n", "[pudb]: https://github.com/inducer/pudb\n", "[Linaro DDT]: https://www.linaroforge.com/linaroDdt/\n", + "[here]: https://docs.linaroforge.com/24.0.1/html/forge/ddt/get_started_ddt/python_debugging.html\n", "[ARMForge DDT]: https://developer.arm.com/tools-and-software/server-and-hpc/debug-and-profile/arm-forge/arm-ddt\n", - "[TotalView]: https://help.totalview.io/current/HTML/index.html#page/TotalView/totalviewlhug-python.13.01.html#ww1893192" + "[TotalView]: https://help.totalview.io/current/HTML/index.html#page/TotalView/totalviewlhug-python.13.01.html#ww1893192\n", + "[TotalView User Guide]: https://help.totalview.io/current/PDFs/TotalView_User_Guide.pdf#G12.1893806" ] }, { @@ -280,7 +282,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For example, PyDev, Wing Personal, Visual Studio, and PyCharm Professional (199 €/a with perpetual fallback license) support remote debugging. It can also be done with the ``ptvsd`` and Visual Studio Code." + "Some IDEs like PyDev, Wing Pro, Visual Studio, and PyCharm Professional support remote debugging. For Visual Studio Code there is [debugpy][] that supports [debugging via SSH][].\n", + "\n", + "[debugpy]: https://github.com/microsoft/debugpy/\n", + "[debugging via SSH]: https://github.com/microsoft/debugpy/wiki/Debugging-over-SSH" ] }, { @@ -294,9 +299,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The following video shows how to debug mixed Python and C++ code using Visual Studio.\n", + "The following video shows how to debug mixed Python and C++ code using Visual Studio Code and gdb.\n", "\n", - "You can go back to to the beginning of the video to learn how write a Python extension in Visual Studio." + "You can go back to to the beginning of the video to learn how write a Python extension in Visual Studio Code." ] }, { @@ -320,9 +325,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "HPC Python 2024 (local)", "language": "python", - "name": "python3" + "name": "hpcpy24" }, "language_info": { "codemirror_mode": { @@ -334,7 +339,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.6" + "version": "3.12.3" } }, "nbformat": 4, diff --git a/solutions/code b/solutions/code index c787d1ee6a68815e557245d52ad924db7e184eae..2edff2610e81084123a9969fc73223981f6d87b8 120000 --- a/solutions/code +++ b/solutions/code @@ -1 +1 @@ -../code/ \ No newline at end of file +../code \ No newline at end of file