From 551cde26dfb8b42fd2f8ff3de46611a28fe5511c Mon Sep 17 00:00:00 2001
From: Jens Henrik Goebbert <j.goebbert@fz-juelich.de>
Date: Wed, 24 Apr 2024 08:41:19 +0200
Subject: [PATCH] update dask

---
 .../2_dask/1-Introduction-to-Dask.ipynb       |  52 -
 .../2_dask/1_dask_MonteCarloPi.ipynb          | 374 ++++++++
 .../2_dask/2_dask_example.ipynb               | 892 ------------------
 3 files changed, 374 insertions(+), 944 deletions(-)
 delete mode 100644 day2_hpcenv/4_parallel-programming/2_dask/1-Introduction-to-Dask.ipynb
 create mode 100644 day2_hpcenv/4_parallel-programming/2_dask/1_dask_MonteCarloPi.ipynb
 delete mode 100644 day2_hpcenv/4_parallel-programming/2_dask/2_dask_example.ipynb

diff --git a/day2_hpcenv/4_parallel-programming/2_dask/1-Introduction-to-Dask.ipynb b/day2_hpcenv/4_parallel-programming/2_dask/1-Introduction-to-Dask.ipynb
deleted file mode 100644
index d603fb5..0000000
--- a/day2_hpcenv/4_parallel-programming/2_dask/1-Introduction-to-Dask.ipynb
+++ /dev/null
@@ -1,52 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "linear-bangkok",
-   "metadata": {},
-   "source": [
-    "### HIGH THROUGHPUT COMPUTING WITH DASK\n",
-    "\n",
-    "**Organisers:** Alan O’Cais, David Swenson  \n",
-    "**Website:** https://www.cecam.org/workshop-details/1022\n",
-    "\n",
-    "**Synopsis:**\n",
-    "High-throughput (task-based) computing is a flexible approach to parallelisation. It involves splitting a problem into loosely-coupled tasks. A scheduler then orchestrates the parallel execution of those tasks, allowing programs to adaptively scale their resource usage. E-CAM has extended the data-analytics framework Dask with a capable and eﬃcient library to handle such workloads. This workshop will be held as a series of virtual seminars/tutorials on tools in the Dask HPC ecosystem.\n",
-    "\n",
-    "**Programme:**\n",
-    "- 21 January 2021, 3pm CET (2pm UTC): Dask - a flexible library for parallel computing in Python\n",
-    "  - YouTube link: https://youtu.be/Tl8rO-baKuY\n",
-    "  - GitHub Repo: https://github.com/jacobtomlinson/dask-video-tutorial-2020\n",
-    "\n",
-    "- 4 February 2021, 3pm CET (2pm UTC): Dask-Jobqueue - a library that integrates Dask with standard HPC queuing systems, such as SLURM or PBS\n",
-    "  - YouTube link: https://youtu.be/iNxhHXzmJ1w\n",
-    "  - GitHub Repo: https://github.com/ExaESM-WP4/workshop-Dask-Jobqueue-cecam-2021-02\n",
-    "\n",
-    "- 11 February 2021, 3pm CET (2pm UTC) : Jobqueue-Features - a library that enables functionality aimed at enhancing scalability\n",
-    "  - YouTube link: https://youtu.be/FpMua8iJeTk\n",
-    "  - GitHub Repo: https://github.com/E-CAM/jobqueue_features_workshop_materials"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/day2_hpcenv/4_parallel-programming/2_dask/1_dask_MonteCarloPi.ipynb b/day2_hpcenv/4_parallel-programming/2_dask/1_dask_MonteCarloPi.ipynb
new file mode 100644
index 0000000..d81f977
--- /dev/null
+++ b/day2_hpcenv/4_parallel-programming/2_dask/1_dask_MonteCarloPi.ipynb
@@ -0,0 +1,374 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Dask local cluster example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## What is Dask? (https://docs.dask.org/en/latest/)\n",
+    "\n",
+    "* combine a blocked algorithm approach\n",
+    "* with dynamic and memory aware task scheduling\n",
+    "* to realise a parallel out-of-core NumPy clone\n",
+    "* optimized for interactive computational workloads\n",
+    "\n",
+    "-----------------------------------"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### WORKSHOP on DASK - HIGH THROUGHPUT COMPUTING WITH DASK\n",
+    "\n",
+    "**Organisers:** Alan O’Cais, David Swenson  \n",
+    "**Website:** https://www.cecam.org/workshop-details/1022\n",
+    "\n",
+    "**Synopsis:**\n",
+    "High-throughput (task-based) computing is a flexible approach to parallelisation. It involves splitting a problem into loosely-coupled tasks. A scheduler then orchestrates the parallel execution of those tasks, allowing programs to adaptively scale their resource usage. E-CAM has extended the data-analytics framework Dask with a capable and eﬃcient library to handle such workloads. This workshop will be held as a series of virtual seminars/tutorials on tools in the Dask HPC ecosystem.\n",
+    "\n",
+    "**Programme:**\n",
+    "- 21 January 2021, 3pm CET (2pm UTC): Dask - a flexible library for parallel computing in Python\n",
+    "  - YouTube link: https://youtu.be/Tl8rO-baKuY\n",
+    "  - GitHub Repo: https://github.com/jacobtomlinson/dask-video-tutorial-2020\n",
+    "\n",
+    "- 4 February 2021, 3pm CET (2pm UTC): Dask-Jobqueue - a library that integrates Dask with standard HPC queuing systems, such as SLURM or PBS\n",
+    "  - YouTube link: https://youtu.be/iNxhHXzmJ1w\n",
+    "  - GitHub Repo: https://github.com/ExaESM-WP4/workshop-Dask-Jobqueue-cecam-2021-02\n",
+    "\n",
+    "- 11 February 2021, 3pm CET (2pm UTC) : Jobqueue-Features - a library that enables functionality aimed at enhancing scalability\n",
+    "  - YouTube link: https://youtu.be/FpMua8iJeTk\n",
+    "  - GitHub Repo: https://github.com/E-CAM/jobqueue_features_workshop_materials\n",
+    "  \n",
+    "------------------------------------"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Example problem: Monte-Carlo estimate of $\\pi$\n",
+    "\n",
+    "<img src=\"https://upload.wikimedia.org/wikipedia/commons/8/84/Pi_30K.gif\" width=\"25%\" align=left alt=\"PI monte-carlo estimate\"/>\n",
+    "\n",
+    "## Problem description\n",
+    "\n",
+    "Suppose we want to estimate the number $\\pi$ using a [Monte-Carlo method](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods), i.e. obtain a numerical estimate based on a random sampling approach, and that we want at least single precision floating point accuracy.\n",
+    "\n",
+    "We take advantage of the fact that the area of a quarter circle with unit radius is $\\pi/4$ and that hence the probability of a randomly chosen point inside a unit square to lie within that circle is $\\pi/4$ as well.\n",
+    "\n",
+    "So for N randomly chosen pairs $(x, y)$ with $x\\in[0, 1)$ and $y\\in[0, 1)$ we count the number $N_{circ}$ of pairs that also satisfy $(x^2 + y^2) < 1$ and estimage $\\pi \\approx 4 \\cdot N_{circ} / N$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Monte-Carlo estimate with NumPy on a single CPU\n",
+    "\n",
+    "* NumPy is the fundamental package for scientific computing with Python (https://numpy.org/).\n",
+    "* It contains a powerful n-dimensional array object and useful random number capabilities."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import numpy"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "def calculate_pi_single(size_in_bytes):\n",
+    "    \n",
+    "    \"\"\"Calculate pi using a Monte Carlo method.\"\"\"\n",
+    "    \n",
+    "    rand_array_shape = (int(size_in_bytes / 8 / 2), 2)\n",
+    "    \n",
+    "    # 2D random array with positions (x, y)\n",
+    "    xy = numpy.random.uniform(low=0.0, high=1.0, size=rand_array_shape)\n",
+    "    \n",
+    "    # check if position (x, y) is in unit circle\n",
+    "    xy_inside_circle = (xy ** 2).sum(axis=1) < 1\n",
+    "\n",
+    "    # pi is the fraction of points in circle x 4\n",
+    "    pi = 4 * xy_inside_circle.sum() / xy_inside_circle.size\n",
+    "\n",
+    "    print(f\"\\nfrom {xy.nbytes / 1e9} GB randomly chosen positions\")\n",
+    "    print(f\"   pi estimate: {pi}\")\n",
+    "    print(f\"   pi error: {abs(pi - numpy.pi)}\\n\")\n",
+    "    \n",
+    "    return pi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Let's calculate...\n",
+    "\n",
+    "Observe how the error decreases with an increasing number of randomly chosen positions!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "%time pi = calculate_pi_single(size_in_bytes=10_000_000) # 10 MB\n",
+    "%time pi = calculate_pi_single(size_in_bytes=100_000_000) # 100 MB\n",
+    "%time pi = calculate_pi_single(size_in_bytes=1_000_000_000) # 1 GB"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Are we already better than single precision floating point resolution?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "numpy.finfo(numpy.float32)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## We won't be able to scale the problem to several Gigabytes or Terabytes!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Problems\n",
+    "\n",
+    "* slowness of the numpy-only single CPU approach! (we could scale the problem using the [multiprocessing](https://docs.python.org/3.8/library/multiprocessing.html) and/or [threading](https://docs.python.org/3.8/library/threading.html) libraries)\n",
+    "* frontend/login node compute resources are shared and CPU, memory (and IO bandwidth) user demands will collide"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Monte-Carlo estimate with Dask on multiple CPUs\n",
+    "\n",
+    "We define a Dask cluster with 8 CPUs and 24 GB of memory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import dask.distributed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "cluster = dask.distributed.LocalCluster(\n",
+    "    n_workers=1, threads_per_worker=8, memory_limit=24e9,\n",
+    "    ip=\"0.0.0.0\"\n",
+    ")\n",
+    "\n",
+    "client = dask.distributed.Client(cluster)\n",
+    "client"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Use dask.array for randomly chosen positions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import numpy, dask.array"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "def calculate_pi_dask(size_in_bytes, number_of_chunks):\n",
+    "    \n",
+    "    \"\"\"Calculate pi using a Monte Carlo method.\"\"\"\n",
+    "    \n",
+    "    array_shape = (int(size_in_bytes / 8 / 2), 2)\n",
+    "    chunk_size = (int(array_shape[0] / number_of_chunks), 2)\n",
+    "    \n",
+    "    # 2D random positions array using dask.array\n",
+    "    xy = dask.array.random.uniform(\n",
+    "        low=0.0, high=1.0, size=array_shape,\n",
+    "        # specify chunk size, i.e. task number\n",
+    "        chunks=chunk_size )\n",
+    "  \n",
+    "    xy_inside_circle = (xy ** 2).sum(axis=1) < 1\n",
+    "\n",
+    "    pi = 4 * xy_inside_circle.sum() / xy_inside_circle.size\n",
+    "    \n",
+    "    # start Dask calculation\n",
+    "    pi = pi.compute()\n",
+    "\n",
+    "    print(f\"\\nfrom {xy.nbytes / 1e9} GB randomly chosen positions\")\n",
+    "    print(f\"   pi estimate: {pi}\")\n",
+    "    print(f\"   pi error: {abs(pi - numpy.pi)}\\n\")\n",
+    "    display(xy)\n",
+    "    \n",
+    "    return pi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Let's calculate again...\n",
+    "Observe the wall time decreases of the 1 Gigabyte and 10 Gigabyte random sample $\\pi$ estimates!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "%time pi = calculate_pi_dask(size_in_bytes=1_000_000_000, number_of_chunks=10) # 1 GB"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "%time pi = calculate_pi_dask(size_in_bytes=10_000_000_000, number_of_chunks=100) # 10 GB"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Let's go larger than memory...\n",
+    "Because Dask splits the computation into single managable tasks, we can scale up easily!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "%time pi = calculate_pi_dask(size_in_bytes=100_000_000_000, number_of_chunks=250) # 100 GB"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Are we now better than single precision floating point resolution?\n",
+    "Not at all, if we require an order of magnitude better..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "numpy.finfo(numpy.float32)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## We could increase the local cluster CPU resources...\n",
+    "However, the above Dask cluster size is always limited by the memory/CPU resources of a single compute node."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# %time pi = calculate_pi(size_in_bytes=1_000_000_000_000, number_of_chunks=2_500) # 1 TB"
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/day2_hpcenv/4_parallel-programming/2_dask/2_dask_example.ipynb b/day2_hpcenv/4_parallel-programming/2_dask/2_dask_example.ipynb
deleted file mode 100644
index 23e4668..0000000
--- a/day2_hpcenv/4_parallel-programming/2_dask/2_dask_example.ipynb
+++ /dev/null
@@ -1,892 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Dask local cluster example"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## What is Dask? (https://docs.dask.org/en/latest/)\n",
-    "\n",
-    "* combine a blocked algorithm approach\n",
-    "* with dynamic and memory aware task scheduling\n",
-    "* to realise a parallel out-of-core NumPy clone\n",
-    "* optimized for interactive computational workloads\n",
-    "\n",
-    "-----------------------------------"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Example problem: Monte-Carlo estimate of $\\pi$\n",
-    "\n",
-    "<img src=\"https://upload.wikimedia.org/wikipedia/commons/8/84/Pi_30K.gif\" width=\"25%\" align=left alt=\"PI monte-carlo estimate\"/>\n",
-    "\n",
-    "## Problem description\n",
-    "\n",
-    "Suppose we want to estimate the number $\\pi$ using a [Monte-Carlo method](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods), i.e. obtain a numerical estimate based on a random sampling approach, and that we want at least single precision floating point accuracy.\n",
-    "\n",
-    "We take advantage of the fact that the area of a quarter circle with unit radius is $\\pi/4$ and that hence the probability of a randomly chosen point inside a unit square to lie within that circle is $\\pi/4$ as well.\n",
-    "\n",
-    "So for N randomly chosen pairs $(x, y)$ with $x\\in[0, 1)$ and $y\\in[0, 1)$ we count the number $N_{circ}$ of pairs that also satisfy $(x^2 + y^2) < 1$ and estimage $\\pi \\approx 4 \\cdot N_{circ} / N$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Monte-Carlo estimate with NumPy on a single CPU\n",
-    "\n",
-    "* NumPy is the fundamental package for scientific computing with Python (https://numpy.org/).\n",
-    "* It contains a powerful n-dimensional array object and useful random number capabilities."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "def calculate_pi_single(size_in_bytes):\n",
-    "    \n",
-    "    \"\"\"Calculate pi using a Monte Carlo method.\"\"\"\n",
-    "    \n",
-    "    rand_array_shape = (int(size_in_bytes / 8 / 2), 2)\n",
-    "    \n",
-    "    # 2D random array with positions (x, y)\n",
-    "    xy = numpy.random.uniform(low=0.0, high=1.0, size=rand_array_shape)\n",
-    "    \n",
-    "    # check if position (x, y) is in unit circle\n",
-    "    xy_inside_circle = (xy ** 2).sum(axis=1) < 1\n",
-    "\n",
-    "    # pi is the fraction of points in circle x 4\n",
-    "    pi = 4 * xy_inside_circle.sum() / xy_inside_circle.size\n",
-    "\n",
-    "    print(f\"\\nfrom {xy.nbytes / 1e9} GB randomly chosen positions\")\n",
-    "    print(f\"   pi estimate: {pi}\")\n",
-    "    print(f\"   pi error: {abs(pi - numpy.pi)}\\n\")\n",
-    "    \n",
-    "    return pi"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Let's calculate...\n",
-    "\n",
-    "Observe how the error decreases with an increasing number of randomly chosen positions!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "from 0.01 GB randomly chosen positions\n",
-      "   pi estimate: 3.1451904\n",
-      "   pi error: 0.0035977464102070478\n",
-      "\n",
-      "CPU times: user 25 ms, sys: 8.79 ms, total: 33.8 ms\n",
-      "Wall time: 31.3 ms\n",
-      "\n",
-      "from 0.1 GB randomly chosen positions\n",
-      "   pi estimate: 3.14238272\n",
-      "   pi error: 0.0007900664102069577\n",
-      "\n",
-      "CPU times: user 224 ms, sys: 44.5 ms, total: 269 ms\n",
-      "Wall time: 261 ms\n",
-      "\n",
-      "from 1.0 GB randomly chosen positions\n",
-      "   pi estimate: 3.141662784\n",
-      "   pi error: 7.01304102070921e-05\n",
-      "\n",
-      "CPU times: user 1.94 s, sys: 424 ms, total: 2.37 s\n",
-      "Wall time: 2.28 s\n"
-     ]
-    }
-   ],
-   "source": [
-    "%time pi = calculate_pi_single(size_in_bytes=10_000_000) # 10 MB\n",
-    "%time pi = calculate_pi_single(size_in_bytes=100_000_000) # 100 MB\n",
-    "%time pi = calculate_pi_single(size_in_bytes=1_000_000_000) # 1 GB"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Are we already better than single precision floating point resolution?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "finfo(resolution=1e-06, min=-3.4028235e+38, max=3.4028235e+38, dtype=float32)"
-      ]
-     },
-     "execution_count": 21,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "numpy.finfo(numpy.float32)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## We won't be able to scale the problem to several Gigabytes or Terabytes!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Problems\n",
-    "\n",
-    "* slowness of the numpy-only single CPU approach! (we could scale the problem using the [multiprocessing](https://docs.python.org/3.8/library/multiprocessing.html) and/or [threading](https://docs.python.org/3.8/library/threading.html) libraries)\n",
-    "* frontend/login node compute resources are shared and CPU, memory (and IO bandwidth) user demands will collide"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Monte-Carlo estimate with Dask on multiple CPUs\n",
-    "\n",
-    "We define a Dask cluster with 8 CPUs and 24 GB of memory."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import dask.distributed"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "    <div style=\"width: 24px; height: 24px; background-color: #e1e1e1; border: 3px solid #9D9D9D; border-radius: 5px; position: absolute;\"> </div>\n",
-       "    <div style=\"margin-left: 48px;\">\n",
-       "        <h3 style=\"margin-bottom: 0px;\">Client</h3>\n",
-       "        <p style=\"color: #9D9D9D; margin-bottom: 0px;\">Client-7f9fc6c4-5433-11ed-8324-3cecef1f6772</p>\n",
-       "        <table style=\"width: 100%; text-align: left;\">\n",
-       "\n",
-       "        <tr>\n",
-       "        \n",
-       "            <td style=\"text-align: left;\"><strong>Connection method:</strong> Cluster object</td>\n",
-       "            <td style=\"text-align: left;\"><strong>Cluster type:</strong> distributed.LocalCluster</td>\n",
-       "        \n",
-       "        </tr>\n",
-       "\n",
-       "        \n",
-       "            <tr>\n",
-       "                <td style=\"text-align: left;\">\n",
-       "                    <strong>Dashboard: </strong> <a href=\"http://134.94.0.100:8787/status\" target=\"_blank\">http://134.94.0.100:8787/status</a>\n",
-       "                </td>\n",
-       "                <td style=\"text-align: left;\"></td>\n",
-       "            </tr>\n",
-       "        \n",
-       "\n",
-       "        </table>\n",
-       "\n",
-       "        \n",
-       "            <details>\n",
-       "            <summary style=\"margin-bottom: 20px;\"><h3 style=\"display: inline;\">Cluster Info</h3></summary>\n",
-       "            <div class=\"jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output\">\n",
-       "    <div style=\"width: 24px; height: 24px; background-color: #e1e1e1; border: 3px solid #9D9D9D; border-radius: 5px; position: absolute;\">\n",
-       "    </div>\n",
-       "    <div style=\"margin-left: 48px;\">\n",
-       "        <h3 style=\"margin-bottom: 0px; margin-top: 0px;\">LocalCluster</h3>\n",
-       "        <p style=\"color: #9D9D9D; margin-bottom: 0px;\">7914e7e8</p>\n",
-       "        <table style=\"width: 100%; text-align: left;\">\n",
-       "            <tr>\n",
-       "                <td style=\"text-align: left;\">\n",
-       "                    <strong>Dashboard:</strong> <a href=\"http://134.94.0.100:8787/status\" target=\"_blank\">http://134.94.0.100:8787/status</a>\n",
-       "                </td>\n",
-       "                <td style=\"text-align: left;\">\n",
-       "                    <strong>Workers:</strong> 1\n",
-       "                </td>\n",
-       "            </tr>\n",
-       "            <tr>\n",
-       "                <td style=\"text-align: left;\">\n",
-       "                    <strong>Total threads:</strong> 8\n",
-       "                </td>\n",
-       "                <td style=\"text-align: left;\">\n",
-       "                    <strong>Total memory:</strong> 22.35 GiB\n",
-       "                </td>\n",
-       "            </tr>\n",
-       "            \n",
-       "            <tr>\n",
-       "    <td style=\"text-align: left;\"><strong>Status:</strong> running</td>\n",
-       "    <td style=\"text-align: left;\"><strong>Using processes:</strong> True</td>\n",
-       "</tr>\n",
-       "\n",
-       "            \n",
-       "        </table>\n",
-       "\n",
-       "        <details>\n",
-       "            <summary style=\"margin-bottom: 20px;\">\n",
-       "                <h3 style=\"display: inline;\">Scheduler Info</h3>\n",
-       "            </summary>\n",
-       "\n",
-       "            <div style=\"\">\n",
-       "    <div>\n",
-       "        <div style=\"width: 24px; height: 24px; background-color: #FFF7E5; border: 3px solid #FF6132; border-radius: 5px; position: absolute;\"> </div>\n",
-       "        <div style=\"margin-left: 48px;\">\n",
-       "            <h3 style=\"margin-bottom: 0px;\">Scheduler</h3>\n",
-       "            <p style=\"color: #9D9D9D; margin-bottom: 0px;\">Scheduler-96886d6c-baf6-48eb-95dc-2cca09abbe70</p>\n",
-       "            <table style=\"width: 100%; text-align: left;\">\n",
-       "                <tr>\n",
-       "                    <td style=\"text-align: left;\">\n",
-       "                        <strong>Comm:</strong> tcp://134.94.0.100:42495\n",
-       "                    </td>\n",
-       "                    <td style=\"text-align: left;\">\n",
-       "                        <strong>Workers:</strong> 1\n",
-       "                    </td>\n",
-       "                </tr>\n",
-       "                <tr>\n",
-       "                    <td style=\"text-align: left;\">\n",
-       "                        <strong>Dashboard:</strong> <a href=\"http://134.94.0.100:8787/status\" target=\"_blank\">http://134.94.0.100:8787/status</a>\n",
-       "                    </td>\n",
-       "                    <td style=\"text-align: left;\">\n",
-       "                        <strong>Total threads:</strong> 8\n",
-       "                    </td>\n",
-       "                </tr>\n",
-       "                <tr>\n",
-       "                    <td style=\"text-align: left;\">\n",
-       "                        <strong>Started:</strong> Just now\n",
-       "                    </td>\n",
-       "                    <td style=\"text-align: left;\">\n",
-       "                        <strong>Total memory:</strong> 22.35 GiB\n",
-       "                    </td>\n",
-       "                </tr>\n",
-       "            </table>\n",
-       "        </div>\n",
-       "    </div>\n",
-       "\n",
-       "    <details style=\"margin-left: 48px;\">\n",
-       "        <summary style=\"margin-bottom: 20px;\">\n",
-       "            <h3 style=\"display: inline;\">Workers</h3>\n",
-       "        </summary>\n",
-       "\n",
-       "        \n",
-       "        <div style=\"margin-bottom: 20px;\">\n",
-       "            <div style=\"width: 24px; height: 24px; background-color: #DBF5FF; border: 3px solid #4CC9FF; border-radius: 5px; position: absolute;\"> </div>\n",
-       "            <div style=\"margin-left: 48px;\">\n",
-       "            <details>\n",
-       "                <summary>\n",
-       "                    <h4 style=\"margin-bottom: 0px; display: inline;\">Worker: 0</h4>\n",
-       "                </summary>\n",
-       "                <table style=\"width: 100%; text-align: left;\">\n",
-       "                    <tr>\n",
-       "                        <td style=\"text-align: left;\">\n",
-       "                            <strong>Comm: </strong> tcp://134.94.0.100:40747\n",
-       "                        </td>\n",
-       "                        <td style=\"text-align: left;\">\n",
-       "                            <strong>Total threads: </strong> 8\n",
-       "                        </td>\n",
-       "                    </tr>\n",
-       "                    <tr>\n",
-       "                        <td style=\"text-align: left;\">\n",
-       "                            <strong>Dashboard: </strong> <a href=\"http://134.94.0.100:46353/status\" target=\"_blank\">http://134.94.0.100:46353/status</a>\n",
-       "                        </td>\n",
-       "                        <td style=\"text-align: left;\">\n",
-       "                            <strong>Memory: </strong> 22.35 GiB\n",
-       "                        </td>\n",
-       "                    </tr>\n",
-       "                    <tr>\n",
-       "                        <td style=\"text-align: left;\">\n",
-       "                            <strong>Nanny: </strong> tcp://134.94.0.100:33715\n",
-       "                        </td>\n",
-       "                        <td style=\"text-align: left;\"></td>\n",
-       "                    </tr>\n",
-       "                    <tr>\n",
-       "                        <td colspan=\"2\" style=\"text-align: left;\">\n",
-       "                            <strong>Local directory: </strong> /tmp/dask-worker-space/worker-pxxiovlw\n",
-       "                        </td>\n",
-       "                    </tr>\n",
-       "\n",
-       "                    \n",
-       "\n",
-       "                    \n",
-       "\n",
-       "                </table>\n",
-       "            </details>\n",
-       "            </div>\n",
-       "        </div>\n",
-       "        \n",
-       "\n",
-       "    </details>\n",
-       "</div>\n",
-       "\n",
-       "        </details>\n",
-       "    </div>\n",
-       "</div>\n",
-       "            </details>\n",
-       "        \n",
-       "\n",
-       "    </div>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "<Client: 'tcp://134.94.0.100:42495' processes=1 threads=8, memory=22.35 GiB>"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "cluster = dask.distributed.LocalCluster(\n",
-    "    n_workers=1, threads_per_worker=8, memory_limit=24e9,\n",
-    "    ip=\"0.0.0.0\"\n",
-    ")\n",
-    "\n",
-    "client = dask.distributed.Client(cluster)\n",
-    "client"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Use dask.array for randomly chosen positions"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy, dask.array"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def calculate_pi_dask(size_in_bytes, number_of_chunks):\n",
-    "    \n",
-    "    \"\"\"Calculate pi using a Monte Carlo method.\"\"\"\n",
-    "    \n",
-    "    array_shape = (int(size_in_bytes / 8 / 2), 2)\n",
-    "    chunk_size = (int(array_shape[0] / number_of_chunks), 2)\n",
-    "    \n",
-    "    # 2D random positions array using dask.array\n",
-    "    xy = dask.array.random.uniform(\n",
-    "        low=0.0, high=1.0, size=array_shape,\n",
-    "        # specify chunk size, i.e. task number\n",
-    "        chunks=chunk_size )\n",
-    "  \n",
-    "    xy_inside_circle = (xy ** 2).sum(axis=1) < 1\n",
-    "\n",
-    "    pi = 4 * xy_inside_circle.sum() / xy_inside_circle.size\n",
-    "    \n",
-    "    # start Dask calculation\n",
-    "    pi = pi.compute()\n",
-    "\n",
-    "    print(f\"\\nfrom {xy.nbytes / 1e9} GB randomly chosen positions\")\n",
-    "    print(f\"   pi estimate: {pi}\")\n",
-    "    print(f\"   pi error: {abs(pi - numpy.pi)}\\n\")\n",
-    "    display(xy)\n",
-    "    \n",
-    "    return pi"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Let's calculate again...\n",
-    "Observe the wall time decreases of the 1 Gigabyte and 10 Gigabyte random sample $\\pi$ estimates!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "from 1.0 GB randomly chosen positions\n",
-      "   pi estimate: 3.141517184\n",
-      "   pi error: 7.546958979309792e-05\n",
-      "\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <tr>\n",
-       "        <td>\n",
-       "            <table>\n",
-       "                <thead>\n",
-       "                    <tr>\n",
-       "                        <td> </td>\n",
-       "                        <th> Array </th>\n",
-       "                        <th> Chunk </th>\n",
-       "                    </tr>\n",
-       "                </thead>\n",
-       "                <tbody>\n",
-       "                    \n",
-       "                    <tr>\n",
-       "                        <th> Bytes </th>\n",
-       "                        <td> 0.93 GiB </td>\n",
-       "                        <td> 95.37 MiB </td>\n",
-       "                    </tr>\n",
-       "                    \n",
-       "                    <tr>\n",
-       "                        <th> Shape </th>\n",
-       "                        <td> (62500000, 2) </td>\n",
-       "                        <td> (6250000, 2) </td>\n",
-       "                    </tr>\n",
-       "                    <tr>\n",
-       "                        <th> Count </th>\n",
-       "                        <td> 1 Graph Layer </td>\n",
-       "                        <td> 10 Chunks </td>\n",
-       "                    </tr>\n",
-       "                    <tr>\n",
-       "                    <th> Type </th>\n",
-       "                    <td> float64 </td>\n",
-       "                    <td> numpy.ndarray </td>\n",
-       "                    </tr>\n",
-       "                </tbody>\n",
-       "            </table>\n",
-       "        </td>\n",
-       "        <td>\n",
-       "        <svg width=\"75\" height=\"170\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
-       "\n",
-       "  <!-- Horizontal lines -->\n",
-       "  <line x1=\"0\" y1=\"0\" x2=\"25\" y2=\"0\" style=\"stroke-width:2\" />\n",
-       "  <line x1=\"0\" y1=\"12\" x2=\"25\" y2=\"12\" />\n",
-       "  <line x1=\"0\" y1=\"24\" x2=\"25\" y2=\"24\" />\n",
-       "  <line x1=\"0\" y1=\"36\" x2=\"25\" y2=\"36\" />\n",
-       "  <line x1=\"0\" y1=\"48\" x2=\"25\" y2=\"48\" />\n",
-       "  <line x1=\"0\" y1=\"60\" x2=\"25\" y2=\"60\" />\n",
-       "  <line x1=\"0\" y1=\"72\" x2=\"25\" y2=\"72\" />\n",
-       "  <line x1=\"0\" y1=\"84\" x2=\"25\" y2=\"84\" />\n",
-       "  <line x1=\"0\" y1=\"96\" x2=\"25\" y2=\"96\" />\n",
-       "  <line x1=\"0\" y1=\"108\" x2=\"25\" y2=\"108\" />\n",
-       "  <line x1=\"0\" y1=\"120\" x2=\"25\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "\n",
-       "  <!-- Vertical lines -->\n",
-       "  <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "  <line x1=\"25\" y1=\"0\" x2=\"25\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "\n",
-       "  <!-- Colored Rectangle -->\n",
-       "  <polygon points=\"0.0,0.0 25.412616514582485,0.0 25.412616514582485,120.0 0.0,120.0\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
-       "\n",
-       "  <!-- Text -->\n",
-       "  <text x=\"12.706308\" y=\"140.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >2</text>\n",
-       "  <text x=\"45.412617\" y=\"60.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(-90,45.412617,60.000000)\">62500000</text>\n",
-       "</svg>\n",
-       "        </td>\n",
-       "    </tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "dask.array<uniform, shape=(62500000, 2), dtype=float64, chunksize=(6250000, 2), chunktype=numpy.ndarray>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 83.3 ms, sys: 17.7 ms, total: 101 ms\n",
-      "Wall time: 686 ms\n"
-     ]
-    }
-   ],
-   "source": [
-    "%time pi = calculate_pi_dask(size_in_bytes=1_000_000_000, number_of_chunks=10) # 1 GB"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 26,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "from 10.0 GB randomly chosen positions\n",
-      "   pi estimate: 3.141718944\n",
-      "   pi error: 0.00012629041020684184\n",
-      "\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <tr>\n",
-       "        <td>\n",
-       "            <table>\n",
-       "                <thead>\n",
-       "                    <tr>\n",
-       "                        <td> </td>\n",
-       "                        <th> Array </th>\n",
-       "                        <th> Chunk </th>\n",
-       "                    </tr>\n",
-       "                </thead>\n",
-       "                <tbody>\n",
-       "                    \n",
-       "                    <tr>\n",
-       "                        <th> Bytes </th>\n",
-       "                        <td> 9.31 GiB </td>\n",
-       "                        <td> 95.37 MiB </td>\n",
-       "                    </tr>\n",
-       "                    \n",
-       "                    <tr>\n",
-       "                        <th> Shape </th>\n",
-       "                        <td> (625000000, 2) </td>\n",
-       "                        <td> (6250000, 2) </td>\n",
-       "                    </tr>\n",
-       "                    <tr>\n",
-       "                        <th> Count </th>\n",
-       "                        <td> 1 Graph Layer </td>\n",
-       "                        <td> 100 Chunks </td>\n",
-       "                    </tr>\n",
-       "                    <tr>\n",
-       "                    <th> Type </th>\n",
-       "                    <td> float64 </td>\n",
-       "                    <td> numpy.ndarray </td>\n",
-       "                    </tr>\n",
-       "                </tbody>\n",
-       "            </table>\n",
-       "        </td>\n",
-       "        <td>\n",
-       "        <svg width=\"75\" height=\"170\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
-       "\n",
-       "  <!-- Horizontal lines -->\n",
-       "  <line x1=\"0\" y1=\"0\" x2=\"25\" y2=\"0\" style=\"stroke-width:2\" />\n",
-       "  <line x1=\"0\" y1=\"6\" x2=\"25\" y2=\"6\" />\n",
-       "  <line x1=\"0\" y1=\"12\" x2=\"25\" y2=\"12\" />\n",
-       "  <line x1=\"0\" y1=\"18\" x2=\"25\" y2=\"18\" />\n",
-       "  <line x1=\"0\" y1=\"25\" x2=\"25\" y2=\"25\" />\n",
-       "  <line x1=\"0\" y1=\"31\" x2=\"25\" y2=\"31\" />\n",
-       "  <line x1=\"0\" y1=\"37\" x2=\"25\" y2=\"37\" />\n",
-       "  <line x1=\"0\" y1=\"43\" x2=\"25\" y2=\"43\" />\n",
-       "  <line x1=\"0\" y1=\"50\" x2=\"25\" y2=\"50\" />\n",
-       "  <line x1=\"0\" y1=\"56\" x2=\"25\" y2=\"56\" />\n",
-       "  <line x1=\"0\" y1=\"62\" x2=\"25\" y2=\"62\" />\n",
-       "  <line x1=\"0\" y1=\"68\" x2=\"25\" y2=\"68\" />\n",
-       "  <line x1=\"0\" y1=\"75\" x2=\"25\" y2=\"75\" />\n",
-       "  <line x1=\"0\" y1=\"81\" x2=\"25\" y2=\"81\" />\n",
-       "  <line x1=\"0\" y1=\"87\" x2=\"25\" y2=\"87\" />\n",
-       "  <line x1=\"0\" y1=\"93\" x2=\"25\" y2=\"93\" />\n",
-       "  <line x1=\"0\" y1=\"100\" x2=\"25\" y2=\"100\" />\n",
-       "  <line x1=\"0\" y1=\"106\" x2=\"25\" y2=\"106\" />\n",
-       "  <line x1=\"0\" y1=\"112\" x2=\"25\" y2=\"112\" />\n",
-       "  <line x1=\"0\" y1=\"120\" x2=\"25\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "\n",
-       "  <!-- Vertical lines -->\n",
-       "  <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "  <line x1=\"25\" y1=\"0\" x2=\"25\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "\n",
-       "  <!-- Colored Rectangle -->\n",
-       "  <polygon points=\"0.0,0.0 25.412616514582485,0.0 25.412616514582485,120.0 0.0,120.0\" style=\"fill:#8B4903A0;stroke-width:0\"/>\n",
-       "\n",
-       "  <!-- Text -->\n",
-       "  <text x=\"12.706308\" y=\"140.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >2</text>\n",
-       "  <text x=\"45.412617\" y=\"60.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(-90,45.412617,60.000000)\">625000000</text>\n",
-       "</svg>\n",
-       "        </td>\n",
-       "    </tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "dask.array<uniform, shape=(625000000, 2), dtype=float64, chunksize=(6250000, 2), chunktype=numpy.ndarray>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 564 ms, sys: 56.4 ms, total: 621 ms\n",
-      "Wall time: 4.43 s\n"
-     ]
-    }
-   ],
-   "source": [
-    "%time pi = calculate_pi_dask(size_in_bytes=10_000_000_000, number_of_chunks=100) # 10 GB"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Let's go larger than memory...\n",
-    "Because Dask splits the computation into single managable tasks, we can scale up easily!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 32,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "from 100.0 GB randomly chosen positions\n",
-      "   pi estimate: 3.14160807168\n",
-      "   pi error: 1.541809020677576e-05\n",
-      "\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <tr>\n",
-       "        <td>\n",
-       "            <table>\n",
-       "                <thead>\n",
-       "                    <tr>\n",
-       "                        <td> </td>\n",
-       "                        <th> Array </th>\n",
-       "                        <th> Chunk </th>\n",
-       "                    </tr>\n",
-       "                </thead>\n",
-       "                <tbody>\n",
-       "                    \n",
-       "                    <tr>\n",
-       "                        <th> Bytes </th>\n",
-       "                        <td> 93.13 GiB </td>\n",
-       "                        <td> 381.47 MiB </td>\n",
-       "                    </tr>\n",
-       "                    \n",
-       "                    <tr>\n",
-       "                        <th> Shape </th>\n",
-       "                        <td> (6250000000, 2) </td>\n",
-       "                        <td> (25000000, 2) </td>\n",
-       "                    </tr>\n",
-       "                    <tr>\n",
-       "                        <th> Count </th>\n",
-       "                        <td> 1 Graph Layer </td>\n",
-       "                        <td> 250 Chunks </td>\n",
-       "                    </tr>\n",
-       "                    <tr>\n",
-       "                    <th> Type </th>\n",
-       "                    <td> float64 </td>\n",
-       "                    <td> numpy.ndarray </td>\n",
-       "                    </tr>\n",
-       "                </tbody>\n",
-       "            </table>\n",
-       "        </td>\n",
-       "        <td>\n",
-       "        <svg width=\"75\" height=\"170\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
-       "\n",
-       "  <!-- Horizontal lines -->\n",
-       "  <line x1=\"0\" y1=\"0\" x2=\"25\" y2=\"0\" style=\"stroke-width:2\" />\n",
-       "  <line x1=\"0\" y1=\"6\" x2=\"25\" y2=\"6\" />\n",
-       "  <line x1=\"0\" y1=\"12\" x2=\"25\" y2=\"12\" />\n",
-       "  <line x1=\"0\" y1=\"18\" x2=\"25\" y2=\"18\" />\n",
-       "  <line x1=\"0\" y1=\"24\" x2=\"25\" y2=\"24\" />\n",
-       "  <line x1=\"0\" y1=\"31\" x2=\"25\" y2=\"31\" />\n",
-       "  <line x1=\"0\" y1=\"37\" x2=\"25\" y2=\"37\" />\n",
-       "  <line x1=\"0\" y1=\"44\" x2=\"25\" y2=\"44\" />\n",
-       "  <line x1=\"0\" y1=\"50\" x2=\"25\" y2=\"50\" />\n",
-       "  <line x1=\"0\" y1=\"56\" x2=\"25\" y2=\"56\" />\n",
-       "  <line x1=\"0\" y1=\"62\" x2=\"25\" y2=\"62\" />\n",
-       "  <line x1=\"0\" y1=\"69\" x2=\"25\" y2=\"69\" />\n",
-       "  <line x1=\"0\" y1=\"75\" x2=\"25\" y2=\"75\" />\n",
-       "  <line x1=\"0\" y1=\"82\" x2=\"25\" y2=\"82\" />\n",
-       "  <line x1=\"0\" y1=\"88\" x2=\"25\" y2=\"88\" />\n",
-       "  <line x1=\"0\" y1=\"94\" x2=\"25\" y2=\"94\" />\n",
-       "  <line x1=\"0\" y1=\"100\" x2=\"25\" y2=\"100\" />\n",
-       "  <line x1=\"0\" y1=\"107\" x2=\"25\" y2=\"107\" />\n",
-       "  <line x1=\"0\" y1=\"113\" x2=\"25\" y2=\"113\" />\n",
-       "  <line x1=\"0\" y1=\"120\" x2=\"25\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "\n",
-       "  <!-- Vertical lines -->\n",
-       "  <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "  <line x1=\"25\" y1=\"0\" x2=\"25\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "\n",
-       "  <!-- Colored Rectangle -->\n",
-       "  <polygon points=\"0.0,0.0 25.412616514582485,0.0 25.412616514582485,120.0 0.0,120.0\" style=\"fill:#8B4903A0;stroke-width:0\"/>\n",
-       "\n",
-       "  <!-- Text -->\n",
-       "  <text x=\"12.706308\" y=\"140.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >2</text>\n",
-       "  <text x=\"45.412617\" y=\"60.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(-90,45.412617,60.000000)\">6250000000</text>\n",
-       "</svg>\n",
-       "        </td>\n",
-       "    </tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "dask.array<uniform, shape=(6250000000, 2), dtype=float64, chunksize=(25000000, 2), chunktype=numpy.ndarray>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 3.73 s, sys: 374 ms, total: 4.1 s\n",
-      "Wall time: 38.8 s\n"
-     ]
-    }
-   ],
-   "source": [
-    "%time pi = calculate_pi_dask(size_in_bytes=100_000_000_000, number_of_chunks=250) # 100 GB"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Are we now better than single precision floating point resolution?\n",
-    "Not at all, if we require an order of magnitude better..."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "finfo(resolution=1e-06, min=-3.4028235e+38, max=3.4028235e+38, dtype=float32)"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "numpy.finfo(numpy.float32)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## We could increase the local cluster CPU resources...\n",
-    "However, the above Dask cluster size is always limited by the memory/CPU resources of a single compute node."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# %time pi = calculate_pi(size_in_bytes=1_000_000_000_000, number_of_chunks=2_500) # 1 TB"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "------------------------------------\n",
-    "\n",
-    "### More on Dask - HIGH THROUGHPUT COMPUTING WITH DASK\n",
-    "\n",
-    "**Organisers:** Alan O’Cais, David Swenson  \n",
-    "**Website:** https://www.cecam.org/workshop-details/1022\n",
-    "\n",
-    "**Synopsis:**\n",
-    "High-throughput (task-based) computing is a flexible approach to parallelisation. It involves splitting a problem into loosely-coupled tasks. A scheduler then orchestrates the parallel execution of those tasks, allowing programs to adaptively scale their resource usage. E-CAM has extended the data-analytics framework Dask with a capable and eﬃcient library to handle such workloads. This workshop will be held as a series of virtual seminars/tutorials on tools in the Dask HPC ecosystem.\n",
-    "\n",
-    "**Programme:**\n",
-    "- 21 January 2021, 3pm CET (2pm UTC): Dask - a flexible library for parallel computing in Python\n",
-    "  - YouTube link: https://youtu.be/Tl8rO-baKuY\n",
-    "  - GitHub Repo: https://github.com/jacobtomlinson/dask-video-tutorial-2020  \n",
-    "  \n",
-    "4 February 2021, 3pm CET (2pm UTC): Dask-Jobqueue - a library that integrates Dask with standard HPC queuing systems, such as SLURM or PBS\n",
-    "  - YouTube link: https://youtu.be/iNxhHXzmJ1w\n",
-    "  - GitHub Repo: https://github.com/ExaESM-WP4/workshop-Dask-Jobqueue-cecam-2021-02  \n",
-    "  \n",
-    "- 11 February 2021, 3pm CET (2pm UTC) : Jobqueue-Features - a library that enables functionality aimed at enhancing scalability\n",
-    "  - YouTube link: https://youtu.be/FpMua8iJeTk\n",
-    "  - GitHub Repo: https://github.com/E-CAM/jobqueue_features_workshop_materials"
-   ]
-  }
- ],
- "metadata": {
-  "anaconda-cloud": {},
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
-- 
GitLab