# JSC Tutorial: Data Analysis and Plotting with Pandas

Repository for a small course held in May 2021.

Material to be found at  
http://herten1.pages.jsc.fz-juelich.de/jsc-pandas-introduction/

## Setup

One **master** Notebook is used to generate three Sub-Notebooks:

1. Slides
2. Exercises: Tasks
3. Exercises: Solutions

The slides Notebook is then converted to a HTML presentation (and also to a static PDF); all material is served to Gitlab pages via CI.

In case you're interested in the details, read on.


### Splitting Notebooks

To have one single Notebook file and don't deal with diverging content, `Introducton-to-Pandas--master.ipynb` is the **master** Notebook which contains all the information. All of it!

Cell metadata specifies if a Notebook cell should be treated specially. *Special* could be:

* A cell should end up in the presentation Notebook (default)
* A cell should end up in the tasks Notebook
* A cell should end up in the solution Notebook
* … and combinations of these

Since Notebooks are just JSON, I wrote a small parser in Python which suits my needs and added it to PyPI to be installed as a CLI tool: [https://pypi.org/project/notebook-splitter/](https://pypi.org/project/notebook-splitter/)

It works by providing *tags* of cells to `keep` and *tags* of cells to `remove`. For instance,

```bash
notebook-splitter $< --keep task --keep solution --remove nopresentation
```

would look into Notebook cells and keep those which are *tagged* `task` or `solution` and remove those which are tagged `nopresentation`. One special tag for removal exists, `all`, which removes everything except what's marked to keep. A tag in the sense used here is a value to a JSON key in the cell's metadata, which is per default `"exercise"` (but can be selected via `--basekey`. Example:

```json
{
    "exercise": "task"
}
// or
{
    "exercise": ["notask", "nopresentation"]
}
```

You can provide more than one tag by making them into a list.

Have a look at the provided `Makefile` to see how the individual files for this tutorial have been generated.

If you think this script is useful and want to collaborate on it, let me know and I'll make it into a dedicated repository.


### Presentation from Notebooks

Via `jupyter nbconvert --to slides`, Jupyter Notebooks can be converted to HTML-based slideshows using [reveal.js](https://revealjs.com/). For each cell, one can select in the *Cell Inspector* if the cell should be a `Slide`, `Sub-Slide`, or a `Fragment`.

This works reasonably well, but to add the Jülich design to the reveal.js HTML some more steps are needed. Those steps are partly implemented in Jan's repository regarding the [reveal.js Jülich Theme](https://gitlab.version.fz-juelich.de/JanMeinke/revealjstheme-juelich), which is loaded as a sub-module, and an additional step to make it work as sub-module here. All is done in the `Makefile`.

A PDF version of the slides is generated with [`decktape`](https://github.com/astefanutti/decktape), a NPM package which uses a headless Chromium instance to generate the PDF pages. It's slow and doesn't look 100 % like the presented slides, but it's the best I could find.


### Gitlab Pages

A Gitlab Shared Runner is used to serve material to the public web page of the tutorial. The static `index.html` is to be found in its own (orphan) branch at `pages`. The files are copied from the repository to the public web page as indicated in the CI configuration file `.gitlab-ci.yml`.

Ideally, I'd only check in the `--master.ipynb` Notebook and let the CI create all other material. But there's so many wild dependencies (NPM!!1), I'd rather do it myself.

Because of the sub-module dependencies, the slides are also provided here as a _static_ slide bundle, copied over to the Pages index in the runner. 
