Skip to content
Snippets Groups Projects
Commit 86da99cf authored by Andreas Herten's avatar Andreas Herten
Browse files

Add more on Seaborn palettes; add poll feedback results; fix PDF

parent 94a683c6
No related branches found
No related tags found
No related merge requests found
Source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:markdown id: tags:
# *Introduction to* Data Analysis and Plotting with Pandas
## JSC Tutorial
Andreas Herten, Forschungszentrum Jülich, 26 February 2019
%% Cell type:markdown id: tags:
**Version: Tasks**
%% Cell type:markdown id: tags:
## Task Outline
* [Task 1](#task1)
* [Task 2](#task2)
* [Task 3](#task3)
* [Task 4](#task4)
* [Task 5](#task5)
* [Task 6](#task6)
* [Task 7](#task7)
* [Bonus Task](#taskb)
%% Cell type:code id: tags:
``` python
import pandas as pd
```
%% Cell type:markdown id: tags:
## Task 1
<a name="task1"></a>
* Create data frame with
- 10 names of dinosaurs,
- their favourite prime number,
- and their favourite color
* Play around with the frame
* Tell me on poll when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:markdown id: tags:
Jupyter Notebook 101:
* Execute cell: `shift+enter`
* New cell in front of current cell: `a`
* New cell after current cell: `b`
%% Cell type:code id: tags:
``` python
happy_dinos = {
"Dinosaur Name": [],
"Favourite Prime": [],
"Favourite Color": []
}
#df_dinos =
```
%% Cell type:markdown id: tags:
## Task 2
<a name="task2"></a>
* Read in `nest-data.csv` to `DataFrame`; call it `df`
*Data was produced with [JUBE](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/JUBE/_node.html), Pandas works **very** well together with JUBE*
* Get to know it and play a bit with it
* Tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:code id: tags:
``` python
!cat nest-data.csv | head -3
```
%% Output
id,Nodes,Tasks/Node,Threads/Task,Runtime Program / s,Scale,Plastic,Avg. Neuron Build Time / s,Min. Edge Build Time / s,Max. Edge Build Time / s,Min. Init. Time / s,Max. Init. Time / s,Presim. Time / s,Sim. Time / s,Virt. Memory (Sum) / kB,Local Spike Counter (Sum),Average Rate (Sum),Number of Neurons,Number of Connections,Min. Delay,Max. Delay
5,1,2,4,420.42,10,true,0.29,88.12,88.18,1.14,1.20,17.26,311.52,46560664.00,825499,7.48,112500,1265738500,1.5,1.5
5,1,4,4,200.84,10,true,0.15,46.03,46.34,0.70,1.01,7.87,142.97,46903088.00,802865,7.03,112500,1265738500,1.5,1.5
%% Cell type:markdown id: tags:
## Task 3
<a name="task3"></a>
* Add a column to the Nest data frame called `Virtual Processes` which is the total number of threads across all nodes (i.e. the product of threads per task and tasks per node and nodes)
* Remember to tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
%matplotlib inline
```
%% Cell type:markdown id: tags:
## Task 4
<a name="task4"></a>
* Sort the data frame by the virtual proccesses
* Plot `"Presim. Time / s"` and `"Sim. Time / s"` of our data frame `df` as a function of the virtual processes
* Use a dashed, red line for `"Presim. Time / s"`, a blue line for `"Sim. Time / s"` (see [API description](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot))
* Don't forget to label your axes and to add a legend
* Submit when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:markdown id: tags:
## Task 5
<a name="task5"></a>
Use the NEST data frame `df` to:
1. Make the virtual processes the index of the data frame (`.set_index()`)
2. Plot `"Presim. Program / s"` and `"Sim. Time / s`" individually
3. Plot them onto one common canvas!
4. Make them have the same line colors and styles as before
5. Add a legend, add missing labels
* Done? Tell me! [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:markdown id: tags:
## Task 6
<a name="task6"></a>
* To your `df` NEST data frame, add a column with the unaccounted time (`Unaccounted Time / s`), which is the difference of program runtime, average neuron build time, minimal edge build time, minimal initialization time, presimulation time, and simulation time.
(*I know this is technically not super correct, but it will do for our example.*)
* Plot a stacked bar plot of all these columns (except for program runtime) over the virtual processes
* Remember: [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:markdown id: tags:
## Task 7
<a name="task7"></a>
* Create a pivot table based on the NEST `df` data frame
* Let the `x` axis show the number of nodes; display the values of the simulation time `"Sim. Time / s"` for the tasks per node and threas per task configurations
* Please plot a bar plot
* Done? [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:markdown id: tags:
<a name="taskb"></a>
* Bonus task
- Use `Sim. Time / s` and `Presim. Time / s` as values to show
- Show a stack of those two values inside the pivot table
- Same pivot table as before (that is, `x` with nodes, and columns for Tasks/Node and Threads/Task)
- But now, use `Sim. Time / s` and `Presim. Time / s` as values to show
- Show them as a stack of those two values inside the pivot table
%% Cell type:markdown id: tags:
<span class="feedback">Tell me what you think about this tutorial! <a href="mailto:a.herten@fz-juelich.de">a.herten@fz-juelich.de</a></span>
Next slide: Further reading
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
No preview for this file type
This diff is collapsed.
%% Cell type:markdown id: tags:
# *Introduction to* Data Analysis and Plotting with Pandas
## JSC Tutorial
Andreas Herten, Forschungszentrum Jülich, 26 February 2019
%% Cell type:markdown id: tags:
**Version: Tasks**
%% Cell type:markdown id: tags:
## Task Outline
* [Task 1](#task1)
* [Task 2](#task2)
* [Task 3](#task3)
* [Task 4](#task4)
* [Task 5](#task5)
* [Task 6](#task6)
* [Task 7](#task7)
* [Bonus Task](#taskb)
%% Cell type:code id: tags:
``` python
import pandas as pd
```
%% Cell type:markdown id: tags:
## Task 1
<a name="task1"></a>
* Create data frame with
- 10 names of dinosaurs,
- their favourite prime number,
- and their favourite color
* Play around with the frame
* Tell me on poll when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:markdown id: tags:
Jupyter Notebook 101:
* Execute cell: `shift+enter`
* New cell in front of current cell: `a`
* New cell after current cell: `b`
%% Cell type:code id: tags:
``` python
happy_dinos = {
"Dinosaur Name": [],
"Favourite Prime": [],
"Favourite Color": []
}
#df_dinos =
```
%% Cell type:markdown id: tags:
## Task 2
<a name="task2"></a>
* Read in `nest-data.csv` to `DataFrame`; call it `df`
*Data was produced with [JUBE](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/JUBE/_node.html), Pandas works **very** well together with JUBE*
* Get to know it and play a bit with it
* Tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:code id: tags:
``` python
!cat nest-data.csv | head -3
```
%% Output
id,Nodes,Tasks/Node,Threads/Task,Runtime Program / s,Scale,Plastic,Avg. Neuron Build Time / s,Min. Edge Build Time / s,Max. Edge Build Time / s,Min. Init. Time / s,Max. Init. Time / s,Presim. Time / s,Sim. Time / s,Virt. Memory (Sum) / kB,Local Spike Counter (Sum),Average Rate (Sum),Number of Neurons,Number of Connections,Min. Delay,Max. Delay
5,1,2,4,420.42,10,true,0.29,88.12,88.18,1.14,1.20,17.26,311.52,46560664.00,825499,7.48,112500,1265738500,1.5,1.5
5,1,4,4,200.84,10,true,0.15,46.03,46.34,0.70,1.01,7.87,142.97,46903088.00,802865,7.03,112500,1265738500,1.5,1.5
%% Cell type:markdown id: tags:
## Task 3
<a name="task3"></a>
* Add a column to the Nest data frame called `Virtual Processes` which is the total number of threads across all nodes (i.e. the product of threads per task and tasks per node and nodes)
* Remember to tell me when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
%matplotlib inline
```
%% Cell type:markdown id: tags:
## Task 4
<a name="task4"></a>
* Sort the data frame by the virtual proccesses
* Plot `"Presim. Time / s"` and `"Sim. Time / s"` of our data frame `df` as a function of the virtual processes
* Use a dashed, red line for `"Presim. Time / s"`, a blue line for `"Sim. Time / s"` (see [API description](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot))
* Don't forget to label your axes and to add a legend
* Submit when you're done: [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:markdown id: tags:
## Task 5
<a name="task5"></a>
Use the NEST data frame `df` to:
1. Make the virtual processes the index of the data frame (`.set_index()`)
2. Plot `"Presim. Program / s"` and `"Sim. Time / s`" individually
3. Plot them onto one common canvas!
4. Make them have the same line colors and styles as before
5. Add a legend, add missing labels
* Done? Tell me! [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:markdown id: tags:
## Task 6
<a name="task6"></a>
* To your `df` NEST data frame, add a column with the unaccounted time (`Unaccounted Time / s`), which is the difference of program runtime, average neuron build time, minimal edge build time, minimal initialization time, presimulation time, and simulation time.
(*I know this is technically not super correct, but it will do for our example.*)
* Plot a stacked bar plot of all these columns (except for program runtime) over the virtual processes
* Remember: [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:markdown id: tags:
## Task 7
<a name="task7"></a>
* Create a pivot table based on the NEST `df` data frame
* Let the `x` axis show the number of nodes; display the values of the simulation time `"Sim. Time / s"` for the tasks per node and threas per task configurations
* Please plot a bar plot
* Done? [pollev.com/aherten538](https://pollev.com/aherten538)
%% Cell type:markdown id: tags:
<a name="taskb"></a>
* Bonus task
- Use `Sim. Time / s` and `Presim. Time / s` as values to show
- Show a stack of those two values inside the pivot table
- Same pivot table as before (that is, `x` with nodes, and columns for Tasks/Node and Threads/Task)
- But now, use `Sim. Time / s` and `Presim. Time / s` as values to show
- Show them as a stack of those two values inside the pivot table
%% Cell type:markdown id: tags:
<span class="feedback">Tell me what you think about this tutorial! <a href="mailto:a.herten@fz-juelich.de">a.herten@fz-juelich.de</a></span>
Next slide: Further reading
......@@ -18,7 +18,8 @@ subnotebooks: $(SUBNOTEBOOKS)
> $@
%.pdf: %.html $(DEP_PRESENTATION)
decktape --size "1280x720" reveal $< $@
# This needs to have artificially large paper size in order to fix bug https://github.com/astefanutti/decktape/issues/151#issuecomment-456166075
decktape --size "2560x1440" reveal $< $@
Introduction-to-Pandas--slides.ipynb: $(MASTER_NOTEBOOK)
./notebook-task-filter.py $< --keep task --keep solution --keep onlypresentation --remove onlytask --remove onlysolution --remove nopresentation -o $@
......
img/poll-results.png

16.6 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment