{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Profiling\n",
    "<div class=\"dateauthor\">\n",
    "21 June 2022 | Jan H. Meinke\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Profiler"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "cprofiler (standard module)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "line_profiler"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Intel Advisor since 2017 beta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "source": [
    "Scalene"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "source": [
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Before you start to optimize a program, you should generate a profile. A profile shows how much time a program spends in which function, line of code, or even assembler instruction.\n",
    "\n",
    "There are several profiling tools,e.g.,  cprofile, which comes with Python and is always available. It measures performance on a function call level. We'll also look at line_profiler. Let's look at an example.\n",
    "\n",
    "The following functions implement a simple n-body simulation using a long range potential. This could be part of, e.g., an astrophysical simulation, a simulation of a many-electron system, or a molecular dynamics simulation.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Profiling a simple particle dynamics code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "pair_force()\n",
    "\n",
    "force()\n",
    "\n",
    "calculate_all_forces()\n",
    "\n",
    "step()\n",
    "\n",
    "propagate_all_variables()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "#%%writefile md.py\n",
    "import numpy as np\n",
    "from numba import jit\n",
    "\n",
    "#@jit\n",
    "def pair_force(x0, y0, z0, m0, x1, y1, z1, m1):\n",
    "    \"\"\"Calculate the force on p0 due to p1.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    x0: float\n",
    "        x-coordinate of the p0\n",
    "    y0: float\n",
    "        y-coordinate of the p0\n",
    "    z0: float\n",
    "        z-coordinate of the p0\n",
    "    m0: float\n",
    "        mass of the p0\n",
    "    x1: float\n",
    "        x-coordinate of the p1\n",
    "    y1: float\n",
    "        y-coordinate of the p1\n",
    "    z1: float\n",
    "        z-coordinate of the p1\n",
    "    m1: float\n",
    "        mass of the p1\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    f: ndarray\n",
    "        force on p0 due to p1\n",
    "        \n",
    "    \"\"\"\n",
    "    r2 = (x1 - x0) ** 2 + (y1 - y0) ** 2 + (z1 - z0) ** 2\n",
    "    f = m0 * m1 * np.array([(x1 - x0), (y1 - y0), (z1 - z0)]) * r2 ** (-1.5) if r2 else np.zeros(3)\n",
    "    return f\n",
    "\n",
    "#@jit\n",
    "def force(x0, y0, z0, m0, x, y, z, m):\n",
    "    \"\"\"Calculates the force on the particle at (x0, y0, z0) due to all other particles.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    x0: float\n",
    "        x-coordinate of the particle\n",
    "    y0: float\n",
    "        y-coordinate of the particle\n",
    "    z0: float\n",
    "        z-coordinate of the particle\n",
    "    m0: float\n",
    "        mass of the particle\n",
    "    x: ndarray\n",
    "        x-coordinates of all particles\n",
    "    y: ndarray\n",
    "        y-coordinates of all particles\n",
    "    z: ndarray\n",
    "        z-coordinates of all particles\n",
    "    m: ndarray\n",
    "        masses of all particles.\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    f: ndarray\n",
    "        force on particle with mass m0 at (x0, y0, z0)\n",
    "    \"\"\"\n",
    "    f = np.zeros(3)\n",
    "    for x1, y1, z1, m1 in zip(x, y, z, m):\n",
    "        f += pair_force(x0, y0, z0, m0, x1, y1, z1, m1)\n",
    "    return f\n",
    "\n",
    "\n",
    "def calculate_all_forces(x, y, z, m):\n",
    "    \"\"\"Calculates the force on each particle p due to all other particles.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    x: ndarray\n",
    "        x-coordinates of all particles\n",
    "    y: ndarray\n",
    "        y-coordinates of all particles\n",
    "    z: ndarray\n",
    "        z-coordinates of all particles\n",
    "    m: ndarray\n",
    "        masses of all particles.\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    f: ndarray\n",
    "        force on each particle due to all other particles.\n",
    "    \"\"\"\n",
    "    return np.array([force(x[i], y[i], z[i], m[i], x, y, z, m) for i in range(n)])\n",
    "\n",
    "def step(x, y, z, vx, vy, vz, m, f, dt):\n",
    "    \"\"\"Propagate the position and velocities.\n",
    "    \n",
    "    Starting from the current positions, velocities, and forces, propogate positions\n",
    "    and velocities by one time step of lenght dt.\n",
    "    \n",
    "    .. note:: This algorithm should not be used for real simulations!\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    x: ndarray\n",
    "        x-coordinates of all particles\n",
    "    y: ndarray\n",
    "        y-coordinates of all particles\n",
    "    z: ndarray\n",
    "        z-coordinates of all particles\n",
    "    vx: ndarray\n",
    "        x-component of the velocity of all particles\n",
    "    vy: ndarray\n",
    "        y-component of the velocity of all particles\n",
    "    vz: ndarray\n",
    "        z-component of the velocity of all particles\n",
    "    m: ndarray\n",
    "        masses of all particles.\n",
    "    f: ndarray\n",
    "        forces on particles\n",
    "    dt: float\n",
    "        time step\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    x, y, z, vx, vy, vz at t + dt\n",
    "    \"\"\"\n",
    "    xn = x + vx * dt + 0.5 * f[0] / m * dt * dt\n",
    "    yn = y + vy * dt + 0.5 * f[1] / m * dt * dt\n",
    "    zn = y + vz * dt + 0.5 * f[2] / m * dt * dt\n",
    "    vxn = vx + f[0] / m * dt\n",
    "    vyn = vy + f[1] / m * dt\n",
    "    vzn = vz + f[2] / m * dt\n",
    "    return xn, yn, zn, vxn, vyn, vzn\n",
    "\n",
    "def propagate_all_variables(x, y, z, vx, vy, vz, m, f, dt):\n",
    "    for i in range(n):\n",
    "        x[i], y[i], z[i], vx[i], vy[i], vz[i] = step(x[i], y[i], z[i], vx[i], vy[i], vz[i], m[i], f[i], dt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## The main program"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Initialization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1000 particles\n",
    "n = 1000\n",
    "# time step of 0.01\n",
    "dt = 0.01\n",
    "\n",
    "# Initialize coordinates and velocities to random values.\n",
    "x = np.random.random(n)\n",
    "y = np.random.random(n)\n",
    "z = np.random.random(n)\n",
    "vx = np.zeros_like(x)\n",
    "vy = np.zeros_like(x)\n",
    "vz = np.zeros_like(x)\n",
    "m = np.ones_like(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### The algorithm\n",
    "\n",
    "There are basically two steps to this algorithm:\n",
    "\n",
    "1. Calculate the forces on all particles\n",
    "2. Propagate all variables for a time step\n",
    "3. Continue at 1. for nstep steps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "nsteps = 2\n",
    "for i in range(nsteps):\n",
    "    f = calculate_all_forces(x, y, z, m)\n",
    "    propagate_all_variables(x, y, z, vx, vy, vz, m, f, dt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "This took quite some time. Let's measure how long it takes. Add a %%timeit statement just before nsteps (same line)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## The base line"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%%timeit nsteps = 1\n",
    "for i in range(nsteps):\n",
    "    f = calculate_all_forces(x, y, z, m)\n",
    "    propagate_all_variables(x, y, z, vx, vy, vz, m, f, dt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "OK, that's our base line. Next, we want to know where all this time is spent. I mentioned the cprofile module at the beginning. IPython has a magic for that called %%prun. Use it in front of the loop this time."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Profiling with %%prun"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%prun -r nsteps=1\n",
    "for i in range(nsteps):\n",
    "    f = calculate_all_forces(x, y, z, m)\n",
    "    propagate_all_variables(x, y, z, vx, vy, vz, m, f, dt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "    2003007 function calls in 8.561 seconds\n",
    "\n",
    "    Ordered by: internal time\n",
    "\n",
    "    ncalls  tottime  percall  cumtime  percall filename:lineno(function)\n",
    "    1000000  5.209    0.000    7.073    0.000 <ipython-input-3-66a2b576efa3>:4(pair_force)\n",
    "    999001   1.863    0.000    1.863    0.000 {built-in method numpy.core.multiarray.array}\n",
    "     1000    1.480    0.001    8.555    0.009 <ipython-input-3-66a2b576efa3>:36(force)\n",
    "     2000    0.003    0.000    0.003    0.000 {built-in method numpy.core.multiarray.zeros}\n",
    "     1000    0.003    0.000    0.003    0.000 <ipython-input-3-66a2b576efa3>:89(step)\n",
    "        1    0.001    0.001    0.004    0.004 <ipython-input-3-66a2b576efa3>:130(propagate_all_variables)\n",
    "        1    0.001    0.001    8.557    8.557 <ipython-input-3-66a2b576efa3>:87(<listcomp>)\n",
    "        1    0.000    0.000    8.561    8.561 {built-in method builtins.exec}\n",
    "        1    0.000    0.000    8.557    8.557 <ipython-input-3-66a2b576efa3>:68(calculate_all_forces)\n",
    "        1    0.000    0.000    8.561    8.561 <string>:1(<module>)\n",
    "        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The overhead shouldn't be too bad. I got about 10%. Most of the time (about 80%) is spent in pair_force. And 20% of that time is spent on np.array>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Line by line profiling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Unfortunately, this is a rather coarse grained profile. We don't know which part is the expensive part of this calculation and what we can do about it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%load_ext line_profiler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%lprun -f pair_force force(x[0], y[0], z[0], m[0], x, y, z, m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Total time: 9.86691 s                                                                                                             \n",
    "File: md.py                                                                                                                        \n",
    "Function: pair_force at line 3                                                                                                     \n",
    "                                                                                                                                   \n",
    "    Line #      Hits         Time  Per Hit   % Time  Line Contents\n",
    "    ==============================================================                                                       \n",
    "        32   1000000      2269473      2.3     23.0      r2 = (x1 - x0) ** 2 + (y1 - y0) ** 2 + (z1 - z0) ** 2\n",
    "        33   1000000      6864624      6.9     69.6      f = m0 * m1 * np.array([(x1 - x0), (y1 - y0), (z1 - z0)])  \n",
    "                                                         * r2 **(-1.5) if r2 else np.zeros(3)\n",
    "        34   1000000       732809      0.7      7.4      return f                                                                      \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Timing individual operations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Calculating r2 takes 2.3 mus per call. Let's use %timeit to see how much time each operation takes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%%timeit x0 = x[0]; y0=y[0]; z0 = z[0]; x1 = x[1]; y1=y[1]; z1 = z[1];\n",
    "(x1 - x0)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%%timeit x0 = x[0]; y0=y[0]; z0 = z[0]; x1 = x[1]; y1=y[1]; z1 = z[1];\n",
    "(x1 - x0) ** 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%%timeit x0 = x[0]; y0=y[0]; z0 = z[0]; x1 = x[1]; y1=y[1]; z1 = z[1];\n",
    "(x1 - x0) * (x1 - x0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%%timeit x0 = x[0]; y0=y[0]; z0 = z[0]; x1 = x[1]; y1=y[1]; z1 = z[1];\n",
    "dx = (x1 - x0)\n",
    "dx * dx"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "We can now change the code so it calculates dx, dy, and dz first and then uses them later in the calculation. We can also use numba to speed up the simulation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## Exercise: Time the other operations and optimize the code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "HPC Python 2022 (local)",
   "language": "python",
   "name": "hpcpy22"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}