Skip to content
Snippets Groups Projects

JSC Tutorial: Data Analysis and Plotting with Pandas

Repository for a small course held in May 2021.

Material to be found at
http://herten1.pages.jsc.fz-juelich.de/jsc-pandas-introduction/

Setup

One master Notebook is used to generate three Sub-Notebooks:

  1. Slides
  2. Exercises: Tasks
  3. Exercises: Solutions

The slides Notebook is then converted to a HTML presentation (and also to a static PDF); all material is served to Gitlab pages via CI.

In case you're interested in the details, read on.

Splitting Notebooks

To have one single Notebook file and don't deal with diverging content, Introducton-to-Pandas--master.ipynb is the master Notebook which contains all the information. All of it!

Cell metadata specifies if a Notebook cell should be treated specially. Special could be:

  • A cell should end up in the presentation Notebook (default)
  • A cell should end up in the tasks Notebook
  • A cell should end up in the solution Notebook
  • … and combinations of these

Since Notebooks are just JSON, I wrote a small parser in Python which suits my needs and added it to PyPI to be installed as a CLI tool: https://pypi.org/project/notebook-splitter/

It works by providing tags of cells to keep and tags of cells to remove. For instance,

notbook-splitter $< --keep task --keep solution --remove nopresentation

would look into Notebook cells and keep those which are tagged task or solution and remove those which are tagged nopresentation. One special tag for removal exists, all, which removes everything except what's marked to keep. A tag in the sense used here is a value to a JSON key in the cell's metadata, which is per default "exercise" (but can be selected via --basekey. Example:

{
    "exercise": "task"
}
// or
{
    "exercise": ["notask", "nopresentation"]
}

You can provide more than one tag by making them into a list.

Have a look at the provided Makefile to see how the individual files for this tutorial have been generated.

If you think this script is useful and want to collaborate on it, let me know and I'll make it into a dedicated repository.

Presentation from Notebooks

Via jupyter nbconvert --to slides, Jupyter Notebooks can be converted to HTML-based slideshows using reveal.js. For each cell, one can select in the Cell Inspector if the cell should be a Slide, Sub-Slide, or a Fragment.

This works reasonably well, but to add the Jülich design to the reveal.js HTML some more steps are needed. Those steps are partly implemented in Jan's repository regarding the reveal.js Jülich Theme, which is loaded as a sub-module, and an additional step to make it work as sub-module here. All is done in the Makefile.

A PDF version of the slides is generated with decktape, a NPM package which uses a headless Chromium instance to generate the PDF pages. It's slow and doesn't look 100 % like the presented slides, but it's the best I could find.

Gitlab Pages

A Gitlab Shared Runner is used to serve material to the public web page of the tutorial. The static index.html is to be found in its own (orphan) branch at pages. The files are copied from the repository to the public web page as indicated in the CI configuration file .gitlab-ci.yml.

Ideally, I'd only check in the --master.ipynb Notebook and let the CI create all other material. But there's so many wild dependencies (NPM!!1 (merged)), I'd rather do it myself.

Because of the sub-module dependencies, the slides are also provided here as a static slide bundle, copied over to the Pages index in the runner.