Add more on Seaborn palettes; add poll feedback results; fix PDF

86da99cf · Andreas Herten · 94a683c6 · 86da99cf · 86da99cf · 86da99cf
Commit 86da99cf authored Feb 27, 2019 by Andreas Herten
--- a/Introduction-to-Pandas--JURECA--solution.ipynb
+++ b/Introduction-to-Pandas--JURECA--solution.ipynb
--- a/Introduction-to-Pandas--JURECA--tasks.ipynb
+++ b/Introduction-to-Pandas--JURECA--tasks.ipynb
-{"cells": [{"cell_type": "markdown", "metadata": {"exercise": "task"}, "source": ["# *Introduction to* Data Analysis and Plotting with Pandas\n", "## JSC Tutorial\n", "\n", "Andreas Herten, Forschungszentrum J\u00fclich, 26 February 2019"]}, {"cell_type": "markdown", "metadata": {"exercise": "onlytask", "slideshow": {"slide_type": "skip"}}, "source": ["**Version: Tasks**"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "source": ["## Task Outline\n", "\n", "* [Task 1](#task1)\n", "* [Task 2](#task2)\n", "* [Task 3](#task3)\n", "* [Task 4](#task4)\n", "* [Task 5](#task5)\n", "* [Task 6](#task6)\n", "* [Task 7](#task7)\n", "* [Bonus Task](#taskb)"]}, {"cell_type": "code", "execution_count": 2, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["import pandas as pd"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 1\n", "<a name=\"task1\"></a>\n", "\n", "* Create data frame with\n", "    - 10 names of dinosaurs, \n", "    - their favourite prime number, \n", "    - and their favourite color\n", "* Play around with the frame\n", "* Tell me on poll when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "nopresentation", "slideshow": {"slide_type": "skip"}}, "source": ["Jupyter Notebook 101:\n", "\n", "* Execute cell: `shift+enter`\n", "* New cell in front of current cell: `a`\n", "* New cell after current cell: `b`"]}, {"cell_type": "code", "execution_count": 21, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["happy_dinos = {\n", "    \"Dinosaur Name\": [],\n", "    \"Favourite Prime\": [],\n", "    \"Favourite Color\": []\n", "}\n", "#df_dinos = "]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 2\n", "<a name=\"task2\"></a>\n", "\n", "* Read in `nest-data.csv` to `DataFrame`; call it `df`  \n", " *Data was produced with [JUBE](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/JUBE/_node.html), Pandas works **very** well together with JUBE*\n", "* Get to know it and play a bit with it\n", "* Tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "code", "execution_count": 30, "metadata": {"exercise": "task"}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["id,Nodes,Tasks/Node,Threads/Task,Runtime Program / s,Scale,Plastic,Avg. Neuron Build Time / s,Min. Edge Build Time / s,Max. Edge Build Time / s,Min. Init. Time / s,Max. Init. Time / s,Presim. Time / s,Sim. Time / s,Virt. Memory (Sum) / kB,Local Spike Counter (Sum),Average Rate (Sum),Number of Neurons,Number of Connections,Min. Delay,Max. Delay\n", "5,1,2,4,420.42,10,true,0.29,88.12,88.18,1.14,1.20,17.26,311.52,46560664.00,825499,7.48,112500,1265738500,1.5,1.5\n", "5,1,4,4,200.84,10,true,0.15,46.03,46.34,0.70,1.01,7.87,142.97,46903088.00,802865,7.03,112500,1265738500,1.5,1.5\n"]}], "source": ["!cat nest-data.csv | head -3"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "subslide"}}, "source": ["## Task 3\n", "<a name=\"task3\"></a>\n", "\n", "* Add a column to the Nest data frame called `Virtual Processes` which is the total number of threads across all nodes (i.e. the product of threads per task and tasks per node and nodes)\n", "* Remember to tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "code", "execution_count": 56, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["import matplotlib.pyplot as plt\n", "%matplotlib inline"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 4\n", "<a name=\"task4\"></a>\n", "\n", "* Sort the data frame by the virtual proccesses\n", "* Plot `\"Presim. Time / s\"` and `\"Sim. Time / s\"` of our data frame `df` as a function of the virtual processes\n", "* Use a dashed, red line for `\"Presim. Time / s\"`, a blue line for `\"Sim. Time / s\"` (see [API description](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot))\n", "* Don't forget to label your axes and to add a legend\n", "* Submit when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 5\n", "<a name=\"task5\"></a>\n", "\n", "Use the NEST data frame `df` to:\n", "\n", "1. Make the virtual processes the index of the data frame (`.set_index()`)\n", "2. Plot `\"Presim. Program / s\"` and `\"Sim. Time / s`\" individually\n", "3. Plot them onto one common canvas!\n", "4. Make them have the same line colors and styles as before\n", "5. Add a legend, add missing labels\n", "\n", "* Done? Tell me! [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 6\n", "<a name=\"task6\"></a>\n", "\n", "* To your `df` NEST data frame, add a column with the unaccounted time (`Unaccounted Time / s`), which is the difference of program runtime, average neuron build time, minimal edge build time, minimal initialization time, presimulation time, and simulation time.  \n", "(*I know this is technically not super correct, but it will do for our example.*)\n", "* Plot a stacked bar plot of all these columns (except for program runtime) over the virtual processes\n", "* Remember: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 7\n", "<a name=\"task7\"></a>\n", "\n", "* Create a pivot table based on the NEST `df` data frame\n", "* Let the `x` axis show the number of nodes; display the values of the simulation time `\"Sim. Time / s\"` for the tasks per node and threas per task configurations\n", "* Please plot a bar plot\n", "* Done? [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "source": ["<a name=\"taskb\"></a>\n", "\n", "* Bonus task\n", "    - Use `Sim. Time / s` and `Presim. Time / s` as values to show\n", "    - Show a stack of those two values inside the pivot table"]}, {"cell_type": "markdown", "metadata": {"exercise": "task"}, "source": ["<span class=\"feedback\">Tell me what you think about this tutorial! <a href=\"mailto:a.herten@fz-juelich.de\">a.herten@fz-juelich.de</a></span>\n", "\n", "Next slide: Further reading"]}], "metadata": {"kernelspec": {"display_name": "JSC Pandas Tutorial", "language": "python", "name": "pandas"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2"}}, "nbformat": 4, "nbformat_minor": 2}
+{"cells": [{"cell_type": "markdown", "metadata": {"exercise": "task"}, "source": ["# *Introduction to* Data Analysis and Plotting with Pandas\n", "## JSC Tutorial\n", "\n", "Andreas Herten, Forschungszentrum J\u00fclich, 26 February 2019"]}, {"cell_type": "markdown", "metadata": {"exercise": "onlytask", "slideshow": {"slide_type": "skip"}}, "source": ["**Version: Tasks**"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "source": ["## Task Outline\n", "\n", "* [Task 1](#task1)\n", "* [Task 2](#task2)\n", "* [Task 3](#task3)\n", "* [Task 4](#task4)\n", "* [Task 5](#task5)\n", "* [Task 6](#task6)\n", "* [Task 7](#task7)\n", "* [Bonus Task](#taskb)"]}, {"cell_type": "code", "execution_count": 2, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["import pandas as pd"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 1\n", "<a name=\"task1\"></a>\n", "\n", "* Create data frame with\n", "    - 10 names of dinosaurs, \n", "    - their favourite prime number, \n", "    - and their favourite color\n", "* Play around with the frame\n", "* Tell me on poll when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "nopresentation", "slideshow": {"slide_type": "skip"}}, "source": ["Jupyter Notebook 101:\n", "\n", "* Execute cell: `shift+enter`\n", "* New cell in front of current cell: `a`\n", "* New cell after current cell: `b`"]}, {"cell_type": "code", "execution_count": 21, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["happy_dinos = {\n", "    \"Dinosaur Name\": [],\n", "    \"Favourite Prime\": [],\n", "    \"Favourite Color\": []\n", "}\n", "#df_dinos = "]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 2\n", "<a name=\"task2\"></a>\n", "\n", "* Read in `nest-data.csv` to `DataFrame`; call it `df`  \n", " *Data was produced with [JUBE](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/JUBE/_node.html), Pandas works **very** well together with JUBE*\n", "* Get to know it and play a bit with it\n", "* Tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "code", "execution_count": 30, "metadata": {"exercise": "task"}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["id,Nodes,Tasks/Node,Threads/Task,Runtime Program / s,Scale,Plastic,Avg. Neuron Build Time / s,Min. Edge Build Time / s,Max. Edge Build Time / s,Min. Init. Time / s,Max. Init. Time / s,Presim. Time / s,Sim. Time / s,Virt. Memory (Sum) / kB,Local Spike Counter (Sum),Average Rate (Sum),Number of Neurons,Number of Connections,Min. Delay,Max. Delay\n", "5,1,2,4,420.42,10,true,0.29,88.12,88.18,1.14,1.20,17.26,311.52,46560664.00,825499,7.48,112500,1265738500,1.5,1.5\n", "5,1,4,4,200.84,10,true,0.15,46.03,46.34,0.70,1.01,7.87,142.97,46903088.00,802865,7.03,112500,1265738500,1.5,1.5\n"]}], "source": ["!cat nest-data.csv | head -3"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "subslide"}}, "source": ["## Task 3\n", "<a name=\"task3\"></a>\n", "\n", "* Add a column to the Nest data frame called `Virtual Processes` which is the total number of threads across all nodes (i.e. the product of threads per task and tasks per node and nodes)\n", "* Remember to tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "code", "execution_count": 56, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["import matplotlib.pyplot as plt\n", "%matplotlib inline"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 4\n", "<a name=\"task4\"></a>\n", "\n", "* Sort the data frame by the virtual proccesses\n", "* Plot `\"Presim. Time / s\"` and `\"Sim. Time / s\"` of our data frame `df` as a function of the virtual processes\n", "* Use a dashed, red line for `\"Presim. Time / s\"`, a blue line for `\"Sim. Time / s\"` (see [API description](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot))\n", "* Don't forget to label your axes and to add a legend\n", "* Submit when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 5\n", "<a name=\"task5\"></a>\n", "\n", "Use the NEST data frame `df` to:\n", "\n", "1. Make the virtual processes the index of the data frame (`.set_index()`)\n", "2. Plot `\"Presim. Program / s\"` and `\"Sim. Time / s`\" individually\n", "3. Plot them onto one common canvas!\n", "4. Make them have the same line colors and styles as before\n", "5. Add a legend, add missing labels\n", "\n", "* Done? Tell me! [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 6\n", "<a name=\"task6\"></a>\n", "\n", "* To your `df` NEST data frame, add a column with the unaccounted time (`Unaccounted Time / s`), which is the difference of program runtime, average neuron build time, minimal edge build time, minimal initialization time, presimulation time, and simulation time.  \n", "(*I know this is technically not super correct, but it will do for our example.*)\n", "* Plot a stacked bar plot of all these columns (except for program runtime) over the virtual processes\n", "* Remember: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 7\n", "<a name=\"task7\"></a>\n", "\n", "* Create a pivot table based on the NEST `df` data frame\n", "* Let the `x` axis show the number of nodes; display the values of the simulation time `\"Sim. Time / s\"` for the tasks per node and threas per task configurations\n", "* Please plot a bar plot\n", "* Done? [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "source": ["<a name=\"taskb\"></a>\n", "\n", "* Bonus task\n", "    - Same pivot table as before (that is, `x` with nodes, and columns for Tasks/Node and Threads/Task)\n", "    - But now, use `Sim. Time / s` and `Presim. Time / s` as values to show\n", "    - Show them as a stack of those two values inside the pivot table"]}, {"cell_type": "markdown", "metadata": {"exercise": "task"}, "source": ["<span class=\"feedback\">Tell me what you think about this tutorial! <a href=\"mailto:a.herten@fz-juelich.de\">a.herten@fz-juelich.de</a></span>\n", "\n", "Next slide: Further reading"]}], "metadata": {"kernelspec": {"display_name": "JSC Pandas Tutorial", "language": "python", "name": "pandas"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2"}}, "nbformat": 4, "nbformat_minor": 2}
 %% Cell type:markdown id: tags:

 # *Introduction to* Data Analysis and Plotting with Pandas
 ## JSC Tutorial

 Andreas Herten, Forschungszentrum Jülich, 26 February 2019

 %% Cell type:markdown id: tags:

 **Version: Tasks**

 %% Cell type:markdown id: tags:

 ## Task Outline

 * [Task 1](#task1)
 * [Task 2](#task2)
 * [Task 3](#task3)
 * [Task 4](#task4)
 * [Task 5](#task5)
 * [Task 6](#task6)
 * [Task 7](#task7)
 * [Bonus Task](#taskb)

 %% Cell type:code id: tags:

 ``` python
 import pandas as pd
 ```

 %% Cell type:markdown id: tags:

 ## Task 1
 <a name="task1"></a>

 * Create data frame with
    - 10 names of dinosaurs,
    - their favourite prime number,
    - and their favourite color
 * Play around with the frame
 * Tell me on poll when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:markdown id: tags:

 Jupyter Notebook 101:

 * Execute cell: `shift+enter`
 * New cell in front of current cell: `a`
 * New cell after current cell: `b`

 %% Cell type:code id: tags:

 ``` python
 happy_dinos = {
    "Dinosaur Name": [],
    "Favourite Prime": [],
    "Favourite Color": []
 }
 #df_dinos =
 ```

 %% Cell type:markdown id: tags:

 ## Task 2
 <a name="task2"></a>

 * Read in `nest-data.csv` to `DataFrame`; call it `df`
 *Data was produced with [JUBE](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/JUBE/_node.html), Pandas works **very** well together with JUBE*
 * Get to know it and play a bit with it
 * Tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:code id: tags:

 ``` python
 !cat nest-data.csv | head -3
 ```

 %% Output

    id,Nodes,Tasks/Node,Threads/Task,Runtime Program / s,Scale,Plastic,Avg. Neuron Build Time / s,Min. Edge Build Time / s,Max. Edge Build Time / s,Min. Init. Time / s,Max. Init. Time / s,Presim. Time / s,Sim. Time / s,Virt. Memory (Sum) / kB,Local Spike Counter (Sum),Average Rate (Sum),Number of Neurons,Number of Connections,Min. Delay,Max. Delay
    5,1,2,4,420.42,10,true,0.29,88.12,88.18,1.14,1.20,17.26,311.52,46560664.00,825499,7.48,112500,1265738500,1.5,1.5
    5,1,4,4,200.84,10,true,0.15,46.03,46.34,0.70,1.01,7.87,142.97,46903088.00,802865,7.03,112500,1265738500,1.5,1.5

 %% Cell type:markdown id: tags:

 ## Task 3
 <a name="task3"></a>

 * Add a column to the Nest data frame called `Virtual Processes` which is the total number of threads across all nodes (i.e. the product of threads per task and tasks per node and nodes)
 * Remember to tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:code id: tags:

 ``` python
 import matplotlib.pyplot as plt
 %matplotlib inline
 ```

 %% Cell type:markdown id: tags:

 ## Task 4
 <a name="task4"></a>

 * Sort the data frame by the virtual proccesses
 * Plot `"Presim. Time / s"` and `"Sim. Time / s"` of our data frame `df` as a function of the virtual processes
 * Use a dashed, red line for `"Presim. Time / s"`, a blue line for `"Sim. Time / s"` (see [API description](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot))
 * Don't forget to label your axes and to add a legend
 * Submit when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:markdown id: tags:

 ## Task 5
 <a name="task5"></a>

 Use the NEST data frame `df` to:

 1. Make the virtual processes the index of the data frame (`.set_index()`)
 2. Plot `"Presim. Program / s"` and `"Sim. Time / s`" individually
 3. Plot them onto one common canvas!
 4. Make them have the same line colors and styles as before
 5. Add a legend, add missing labels

 * Done? Tell me! [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:markdown id: tags:

 ## Task 6
 <a name="task6"></a>

 * To your `df` NEST data frame, add a column with the unaccounted time (`Unaccounted Time / s`), which is the difference of program runtime, average neuron build time, minimal edge build time, minimal initialization time, presimulation time, and simulation time.
 (*I know this is technically not super correct, but it will do for our example.*)
 * Plot a stacked bar plot of all these columns (except for program runtime) over the virtual processes
 * Remember: [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:markdown id: tags:

 ## Task 7
 <a name="task7"></a>

 * Create a pivot table based on the NEST `df` data frame
 * Let the `x` axis show the number of nodes; display the values of the simulation time `"Sim. Time / s"` for the tasks per node and threas per task configurations
 * Please plot a bar plot
 * Done? [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:markdown id: tags:

 <a name="taskb"></a>

 * Bonus task
-    - Use `Sim. Time / s` and `Presim. Time / s` as values to show
-    - Show a stack of those two values inside the pivot table
+    - Same pivot table as before (that is, `x` with nodes, and columns for Tasks/Node and Threads/Task)
+    - But now, use `Sim. Time / s` and `Presim. Time / s` as values to show
+    - Show them as a stack of those two values inside the pivot table

 %% Cell type:markdown id: tags:

 <span class="feedback">Tell me what you think about this tutorial! <a href="mailto:a.herten@fz-juelich.de">a.herten@fz-juelich.de</a></span>

 Next slide: Further reading
--- a/Introduction-to-Pandas--master.ipynb
+++ b/Introduction-to-Pandas--master.ipynb
--- a/Introduction-to-Pandas--slides.html
+++ b/Introduction-to-Pandas--slides.html
--- a/Introduction-to-Pandas--slides.ipynb
+++ b/Introduction-to-Pandas--slides.ipynb
--- a/Introduction-to-Pandas--slides.pdf
+++ b/Introduction-to-Pandas--slides.pdf
--- a/Introduction-to-Pandas--solution.ipynb
+++ b/Introduction-to-Pandas--solution.ipynb
--- a/Introduction-to-Pandas--tasks.ipynb
+++ b/Introduction-to-Pandas--tasks.ipynb
-{"cells": [{"cell_type": "markdown", "metadata": {"exercise": "task"}, "source": ["# *Introduction to* Data Analysis and Plotting with Pandas\n", "## JSC Tutorial\n", "\n", "Andreas Herten, Forschungszentrum J\u00fclich, 26 February 2019"]}, {"cell_type": "markdown", "metadata": {"exercise": "onlytask", "slideshow": {"slide_type": "skip"}}, "source": ["**Version: Tasks**"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "source": ["## Task Outline\n", "\n", "* [Task 1](#task1)\n", "* [Task 2](#task2)\n", "* [Task 3](#task3)\n", "* [Task 4](#task4)\n", "* [Task 5](#task5)\n", "* [Task 6](#task6)\n", "* [Task 7](#task7)\n", "* [Bonus Task](#taskb)"]}, {"cell_type": "code", "execution_count": 2, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["import pandas as pd"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 1\n", "<a name=\"task1\"></a>\n", "\n", "* Create data frame with\n", "    - 10 names of dinosaurs, \n", "    - their favourite prime number, \n", "    - and their favourite color\n", "* Play around with the frame\n", "* Tell me on poll when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "nopresentation", "slideshow": {"slide_type": "skip"}}, "source": ["Jupyter Notebook 101:\n", "\n", "* Execute cell: `shift+enter`\n", "* New cell in front of current cell: `a`\n", "* New cell after current cell: `b`"]}, {"cell_type": "code", "execution_count": 21, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["happy_dinos = {\n", "    \"Dinosaur Name\": [],\n", "    \"Favourite Prime\": [],\n", "    \"Favourite Color\": []\n", "}\n", "#df_dinos = "]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 2\n", "<a name=\"task2\"></a>\n", "\n", "* Read in `nest-data.csv` to `DataFrame`; call it `df`  \n", " *Data was produced with [JUBE](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/JUBE/_node.html), Pandas works **very** well together with JUBE*\n", "* Get to know it and play a bit with it\n", "* Tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "code", "execution_count": 30, "metadata": {"exercise": "task"}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["id,Nodes,Tasks/Node,Threads/Task,Runtime Program / s,Scale,Plastic,Avg. Neuron Build Time / s,Min. Edge Build Time / s,Max. Edge Build Time / s,Min. Init. Time / s,Max. Init. Time / s,Presim. Time / s,Sim. Time / s,Virt. Memory (Sum) / kB,Local Spike Counter (Sum),Average Rate (Sum),Number of Neurons,Number of Connections,Min. Delay,Max. Delay\n", "5,1,2,4,420.42,10,true,0.29,88.12,88.18,1.14,1.20,17.26,311.52,46560664.00,825499,7.48,112500,1265738500,1.5,1.5\n", "5,1,4,4,200.84,10,true,0.15,46.03,46.34,0.70,1.01,7.87,142.97,46903088.00,802865,7.03,112500,1265738500,1.5,1.5\n"]}], "source": ["!cat nest-data.csv | head -3"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "subslide"}}, "source": ["## Task 3\n", "<a name=\"task3\"></a>\n", "\n", "* Add a column to the Nest data frame called `Virtual Processes` which is the total number of threads across all nodes (i.e. the product of threads per task and tasks per node and nodes)\n", "* Remember to tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "code", "execution_count": 56, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["import matplotlib.pyplot as plt\n", "%matplotlib inline"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 4\n", "<a name=\"task4\"></a>\n", "\n", "* Sort the data frame by the virtual proccesses\n", "* Plot `\"Presim. Time / s\"` and `\"Sim. Time / s\"` of our data frame `df` as a function of the virtual processes\n", "* Use a dashed, red line for `\"Presim. Time / s\"`, a blue line for `\"Sim. Time / s\"` (see [API description](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot))\n", "* Don't forget to label your axes and to add a legend\n", "* Submit when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 5\n", "<a name=\"task5\"></a>\n", "\n", "Use the NEST data frame `df` to:\n", "\n", "1. Make the virtual processes the index of the data frame (`.set_index()`)\n", "2. Plot `\"Presim. Program / s\"` and `\"Sim. Time / s`\" individually\n", "3. Plot them onto one common canvas!\n", "4. Make them have the same line colors and styles as before\n", "5. Add a legend, add missing labels\n", "\n", "* Done? Tell me! [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 6\n", "<a name=\"task6\"></a>\n", "\n", "* To your `df` NEST data frame, add a column with the unaccounted time (`Unaccounted Time / s`), which is the difference of program runtime, average neuron build time, minimal edge build time, minimal initialization time, presimulation time, and simulation time.  \n", "(*I know this is technically not super correct, but it will do for our example.*)\n", "* Plot a stacked bar plot of all these columns (except for program runtime) over the virtual processes\n", "* Remember: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 7\n", "<a name=\"task7\"></a>\n", "\n", "* Create a pivot table based on the NEST `df` data frame\n", "* Let the `x` axis show the number of nodes; display the values of the simulation time `\"Sim. Time / s\"` for the tasks per node and threas per task configurations\n", "* Please plot a bar plot\n", "* Done? [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "source": ["<a name=\"taskb\"></a>\n", "\n", "* Bonus task\n", "    - Use `Sim. Time / s` and `Presim. Time / s` as values to show\n", "    - Show a stack of those two values inside the pivot table"]}, {"cell_type": "markdown", "metadata": {"exercise": "task"}, "source": ["<span class=\"feedback\">Tell me what you think about this tutorial! <a href=\"mailto:a.herten@fz-juelich.de\">a.herten@fz-juelich.de</a></span>\n", "\n", "Next slide: Further reading"]}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2"}}, "nbformat": 4, "nbformat_minor": 2}
\ No newline at end of file
+{"cells": [{"cell_type": "markdown", "metadata": {"exercise": "task"}, "source": ["# *Introduction to* Data Analysis and Plotting with Pandas\n", "## JSC Tutorial\n", "\n", "Andreas Herten, Forschungszentrum J\u00fclich, 26 February 2019"]}, {"cell_type": "markdown", "metadata": {"exercise": "onlytask", "slideshow": {"slide_type": "skip"}}, "source": ["**Version: Tasks**"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "source": ["## Task Outline\n", "\n", "* [Task 1](#task1)\n", "* [Task 2](#task2)\n", "* [Task 3](#task3)\n", "* [Task 4](#task4)\n", "* [Task 5](#task5)\n", "* [Task 6](#task6)\n", "* [Task 7](#task7)\n", "* [Bonus Task](#taskb)"]}, {"cell_type": "code", "execution_count": 2, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["import pandas as pd"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 1\n", "<a name=\"task1\"></a>\n", "\n", "* Create data frame with\n", "    - 10 names of dinosaurs, \n", "    - their favourite prime number, \n", "    - and their favourite color\n", "* Play around with the frame\n", "* Tell me on poll when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "nopresentation", "slideshow": {"slide_type": "skip"}}, "source": ["Jupyter Notebook 101:\n", "\n", "* Execute cell: `shift+enter`\n", "* New cell in front of current cell: `a`\n", "* New cell after current cell: `b`"]}, {"cell_type": "code", "execution_count": 21, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["happy_dinos = {\n", "    \"Dinosaur Name\": [],\n", "    \"Favourite Prime\": [],\n", "    \"Favourite Color\": []\n", "}\n", "#df_dinos = "]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 2\n", "<a name=\"task2\"></a>\n", "\n", "* Read in `nest-data.csv` to `DataFrame`; call it `df`  \n", " *Data was produced with [JUBE](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/JUBE/_node.html), Pandas works **very** well together with JUBE*\n", "* Get to know it and play a bit with it\n", "* Tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "code", "execution_count": 30, "metadata": {"exercise": "task"}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["id,Nodes,Tasks/Node,Threads/Task,Runtime Program / s,Scale,Plastic,Avg. Neuron Build Time / s,Min. Edge Build Time / s,Max. Edge Build Time / s,Min. Init. Time / s,Max. Init. Time / s,Presim. Time / s,Sim. Time / s,Virt. Memory (Sum) / kB,Local Spike Counter (Sum),Average Rate (Sum),Number of Neurons,Number of Connections,Min. Delay,Max. Delay\n", "5,1,2,4,420.42,10,true,0.29,88.12,88.18,1.14,1.20,17.26,311.52,46560664.00,825499,7.48,112500,1265738500,1.5,1.5\n", "5,1,4,4,200.84,10,true,0.15,46.03,46.34,0.70,1.01,7.87,142.97,46903088.00,802865,7.03,112500,1265738500,1.5,1.5\n"]}], "source": ["!cat nest-data.csv | head -3"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "subslide"}}, "source": ["## Task 3\n", "<a name=\"task3\"></a>\n", "\n", "* Add a column to the Nest data frame called `Virtual Processes` which is the total number of threads across all nodes (i.e. the product of threads per task and tasks per node and nodes)\n", "* Remember to tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "code", "execution_count": 56, "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "outputs": [], "source": ["import matplotlib.pyplot as plt\n", "%matplotlib inline"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 4\n", "<a name=\"task4\"></a>\n", "\n", "* Sort the data frame by the virtual proccesses\n", "* Plot `\"Presim. Time / s\"` and `\"Sim. Time / s\"` of our data frame `df` as a function of the virtual processes\n", "* Use a dashed, red line for `\"Presim. Time / s\"`, a blue line for `\"Sim. Time / s\"` (see [API description](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot))\n", "* Don't forget to label your axes and to add a legend\n", "* Submit when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 5\n", "<a name=\"task5\"></a>\n", "\n", "Use the NEST data frame `df` to:\n", "\n", "1. Make the virtual processes the index of the data frame (`.set_index()`)\n", "2. Plot `\"Presim. Program / s\"` and `\"Sim. Time / s`\" individually\n", "3. Plot them onto one common canvas!\n", "4. Make them have the same line colors and styles as before\n", "5. Add a legend, add missing labels\n", "\n", "* Done? Tell me! [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 6\n", "<a name=\"task6\"></a>\n", "\n", "* To your `df` NEST data frame, add a column with the unaccounted time (`Unaccounted Time / s`), which is the difference of program runtime, average neuron build time, minimal edge build time, minimal initialization time, presimulation time, and simulation time.  \n", "(*I know this is technically not super correct, but it will do for our example.*)\n", "* Plot a stacked bar plot of all these columns (except for program runtime) over the virtual processes\n", "* Remember: [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "slide"}}, "source": ["## Task 7\n", "<a name=\"task7\"></a>\n", "\n", "* Create a pivot table based on the NEST `df` data frame\n", "* Let the `x` axis show the number of nodes; display the values of the simulation time `\"Sim. Time / s\"` for the tasks per node and threas per task configurations\n", "* Please plot a bar plot\n", "* Done? [pollev.com/aherten538](https://pollev.com/aherten538)"]}, {"cell_type": "markdown", "metadata": {"exercise": "task", "slideshow": {"slide_type": "fragment"}}, "source": ["<a name=\"taskb\"></a>\n", "\n", "* Bonus task\n", "    - Same pivot table as before (that is, `x` with nodes, and columns for Tasks/Node and Threads/Task)\n", "    - But now, use `Sim. Time / s` and `Presim. Time / s` as values to show\n", "    - Show them as a stack of those two values inside the pivot table"]}, {"cell_type": "markdown", "metadata": {"exercise": "task"}, "source": ["<span class=\"feedback\">Tell me what you think about this tutorial! <a href=\"mailto:a.herten@fz-juelich.de\">a.herten@fz-juelich.de</a></span>\n", "\n", "Next slide: Further reading"]}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2"}}, "nbformat": 4, "nbformat_minor": 2}
\ No newline at end of file
 %% Cell type:markdown id: tags:

 # *Introduction to* Data Analysis and Plotting with Pandas
 ## JSC Tutorial

 Andreas Herten, Forschungszentrum Jülich, 26 February 2019

 %% Cell type:markdown id: tags:

 **Version: Tasks**

 %% Cell type:markdown id: tags:

 ## Task Outline

 * [Task 1](#task1)
 * [Task 2](#task2)
 * [Task 3](#task3)
 * [Task 4](#task4)
 * [Task 5](#task5)
 * [Task 6](#task6)
 * [Task 7](#task7)
 * [Bonus Task](#taskb)

 %% Cell type:code id: tags:

 ``` python
 import pandas as pd
 ```

 %% Cell type:markdown id: tags:

 ## Task 1
 <a name="task1"></a>

 * Create data frame with
    - 10 names of dinosaurs,
    - their favourite prime number,
    - and their favourite color
 * Play around with the frame
 * Tell me on poll when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:markdown id: tags:

 Jupyter Notebook 101:

 * Execute cell: `shift+enter`
 * New cell in front of current cell: `a`
 * New cell after current cell: `b`

 %% Cell type:code id: tags:

 ``` python
 happy_dinos = {
    "Dinosaur Name": [],
    "Favourite Prime": [],
    "Favourite Color": []
 }
 #df_dinos =
 ```

 %% Cell type:markdown id: tags:

 ## Task 2
 <a name="task2"></a>

 * Read in `nest-data.csv` to `DataFrame`; call it `df`
 *Data was produced with [JUBE](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/JUBE/_node.html), Pandas works **very** well together with JUBE*
 * Get to know it and play a bit with it
 * Tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:code id: tags:

 ``` python
 !cat nest-data.csv | head -3
 ```

 %% Output

    id,Nodes,Tasks/Node,Threads/Task,Runtime Program / s,Scale,Plastic,Avg. Neuron Build Time / s,Min. Edge Build Time / s,Max. Edge Build Time / s,Min. Init. Time / s,Max. Init. Time / s,Presim. Time / s,Sim. Time / s,Virt. Memory (Sum) / kB,Local Spike Counter (Sum),Average Rate (Sum),Number of Neurons,Number of Connections,Min. Delay,Max. Delay
    5,1,2,4,420.42,10,true,0.29,88.12,88.18,1.14,1.20,17.26,311.52,46560664.00,825499,7.48,112500,1265738500,1.5,1.5
    5,1,4,4,200.84,10,true,0.15,46.03,46.34,0.70,1.01,7.87,142.97,46903088.00,802865,7.03,112500,1265738500,1.5,1.5

 %% Cell type:markdown id: tags:

 ## Task 3
 <a name="task3"></a>

 * Add a column to the Nest data frame called `Virtual Processes` which is the total number of threads across all nodes (i.e. the product of threads per task and tasks per node and nodes)
 * Remember to tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:code id: tags:

 ``` python
 import matplotlib.pyplot as plt
 %matplotlib inline
 ```

 %% Cell type:markdown id: tags:

 ## Task 4
 <a name="task4"></a>

 * Sort the data frame by the virtual proccesses
 * Plot `"Presim. Time / s"` and `"Sim. Time / s"` of our data frame `df` as a function of the virtual processes
 * Use a dashed, red line for `"Presim. Time / s"`, a blue line for `"Sim. Time / s"` (see [API description](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot))
 * Don't forget to label your axes and to add a legend
 * Submit when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:markdown id: tags:

 ## Task 5
 <a name="task5"></a>

 Use the NEST data frame `df` to:

 1. Make the virtual processes the index of the data frame (`.set_index()`)
 2. Plot `"Presim. Program / s"` and `"Sim. Time / s`" individually
 3. Plot them onto one common canvas!
 4. Make them have the same line colors and styles as before
 5. Add a legend, add missing labels

 * Done? Tell me! [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:markdown id: tags:

 ## Task 6
 <a name="task6"></a>

 * To your `df` NEST data frame, add a column with the unaccounted time (`Unaccounted Time / s`), which is the difference of program runtime, average neuron build time, minimal edge build time, minimal initialization time, presimulation time, and simulation time.
 (*I know this is technically not super correct, but it will do for our example.*)
 * Plot a stacked bar plot of all these columns (except for program runtime) over the virtual processes
 * Remember: [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:markdown id: tags:

 ## Task 7
 <a name="task7"></a>

 * Create a pivot table based on the NEST `df` data frame
 * Let the `x` axis show the number of nodes; display the values of the simulation time `"Sim. Time / s"` for the tasks per node and threas per task configurations
 * Please plot a bar plot
 * Done? [pollev.com/aherten538](https://pollev.com/aherten538)

 %% Cell type:markdown id: tags:

 <a name="taskb"></a>

 * Bonus task
-    - Use `Sim. Time / s` and `Presim. Time / s` as values to show
-    - Show a stack of those two values inside the pivot table
+    - Same pivot table as before (that is, `x` with nodes, and columns for Tasks/Node and Threads/Task)
+    - But now, use `Sim. Time / s` and `Presim. Time / s` as values to show
+    - Show them as a stack of those two values inside the pivot table

 %% Cell type:markdown id: tags:

 <span class="feedback">Tell me what you think about this tutorial! <a href="mailto:a.herten@fz-juelich.de">a.herten@fz-juelich.de</a></span>

 Next slide: Further reading
--- a/Makefile
+++ b/Makefile
@@ -18,7 +18,8 @@ subnotebooks: $(SUBNOTEBOOKS)
 	> $@

 %.pdf: %.html $(DEP_PRESENTATION)
-	decktape --size "1280x720" reveal $< $@
+	# This needs to have artificially large paper size in order to fix bug https://github.com/astefanutti/decktape/issues/151#issuecomment-456166075
+	decktape --size "2560x1440" reveal $< $@

 Introduction-to-Pandas--slides.ipynb: $(MASTER_NOTEBOOK)
 	./notebook-task-filter.py $< --keep task --keep solution --keep onlypresentation --remove onlytask --remove onlysolution --remove nopresentation -o $@

--- a/img/poll-results.png
+++ b/img/poll-results.png