Update for 2021!

c0035a71 · Andreas Herten · 5b7424e8 · c0035a71 · 5b7424e8 · 5b7424e8
Commit c0035a71 authored Mar 2, 2021 by Andreas Herten
--- a/.generatePdf.mk
+++ b/.generatePdf.mk
@@ -17,7 +17,7 @@ LCFLAGS += --variable geometry:margin=1in
 LCFLAGS += --variable author:"Andreas Herten <a.herten@fz-juelich.de>"

 # SRC = $(wildcard *.md)
-SRC = Accounts.md JUWELS.md JURON.md Batch-Systems.md More.md
+SRC = README.md

 PDFS = $(SRC:.md=.pdf)

@@ -27,8 +27,8 @@ all: $(PDFS) all.pdf
 	sed "s/→/$$\\\rightarrow$$/" $< | $(LC) $(LCFLAGS) -o pdf/$@

 all.pdf: LCFLAGS += --toc 
-all.pdf: LCFLAGS += --variable title:"GPU Eurohack 2019 User Guide"
-all.pdf: LCFLAGS += --variable abstract:"Some hints for participants and mentors of the GPU Hackathon 2019 at Jülich Supercomputing Centre. \textit{Hack Away!}"
+all.pdf: LCFLAGS += --variable title:"Helmholtz GPU Hackathon 2021 User Guide"
+all.pdf: LCFLAGS += --variable abstract:"Some hints for participants and mentors of the GPU Hackathon 2021 at Jülich Supercomputing Centre. \textit{Hack Away!}"
 all.pdf: LCFLAGS += --variable institute:"Forschungszentrum Jülich"
 all.pdf: LCFLAGS += --variable keywords:"GPU,CUDA,OpenACC,FZJ,JSC"
 all.pdf: $(SRC) $(MAKEFILE)

--- a/Accounts.md
+++ b/Accounts.md
-# Accounts
-
-The GPU Hackathon will use the Jülich supercomputer *JUWELS*. As a backup, we prepare access to the *JURON* machine. Both are centrally managed.
-
-## Account Creation
-
-User management for the supercomputers in Jülich is done centrally via the JuDOOR portal. Hackathon attendees need to signup for a JuDOOR account and then apply to be added to the Hackathon project `training1908`. This link will let you join the project:
-
-[https://dspserv.zam.kfa-juelich.de/judoor/projects/join/TRAINING1908](https://dspserv.zam.kfa-juelich.de/judoor/projects/join/TRAINING1908)
-
-Once you are in the project, you need to agree to the usage agreements of `JUWELS`, `JUWELS GPU`, and `JURON`.
-
-After that, you can upload your SSH public key via the »Manage SSH-keys« link. *(New to SSH? See for example [this help at Github](https://help.github.com/en/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent).)**
-
-## Login
-
-Please login to JUWELS via SSH
-
-```bash
-ssh name1@juwels.fz-juelich.de
-```
-
-For JURON, choose `juron.fz-juelich.de` as the hostname.
-
-In case you are using PuTTY on Windows, maybe see [this external tutorial](https://devops.profitbricks.com/tutorials/use-ssh-keys-with-putty-on-windows/#use-existing-public-and-private-keys) .
-
-## Environment
-
-One of the first steps after login should be to activate the environment for the GPU Hackathon using `jutil`:
-
-```bash
-jutil env activate -p training1908 -A training1908
-```
-
-To shortcut this, use
-
-```bash
-source $PROJECT_training1908/common/environment/activate.sh
-```
-
-### Tips & Trouble Shooting
-
-* [SSH tutorial](https://www.digitalocean.com/community/tutorials/ssh-essentials-working-with-ssh-servers-clients-and-keys)
-* Use `ssh -Y […]` or `ssh -X […]` to forward X windows to your machine
-* It's good to add your SSH key to SSH agent. Call `ssh-add` and enter the key's password; you won't be prompted for further password requests during login now
-* Even easier: SSH alias inside of your SSH config (`~/.ssh/config`). Add this, for example:
-```
-Host juwels
-    HostName juwels.fz-juelich.de
-    User name1
-    ForwardAgent Yes
-```
-Now the alias `juwels` even works with `scp` and `rsync`.
--- a/Batch-Systems.md
+++ b/Batch-Systems.md
-# Batch Systems
-
-JUWELS and JURON have different batch systems: JUWELS runs Slurm, JURON runs LSF. This document tries to summarize the basic information needed to run jobs on the systems.
-
-
-## JUWELS
-
-Documentation for JUWELS's batch system can be found [online](https://apps.fz-juelich.de/jsc/hps/juwels/batchsystem.html). JUWELS uses Slurm which has its own MPI launcher, called `srun`.
-
-For the Hackathon, reservations are available for each day; please use these reservations, otherwise you will be stuck in the queue for years (JUWELS's GPU nodes are very popular).
-
-### Reservation Overview
-
-There are large reservations during the active working hours of the Hackathon (9:00 to 18:00) and smaller reservations during the night (18:00 to 9:00). Please see the note on the night reservations below!
-
-|              Date              |        Reservation Name       |
-|--------------------------------|-------------------------------|
-| **Mon, 8 April**               | `gpu-hack-2019-04-08`         |
-| Mon, 8 April, → Tue, 9 April   | `gpu-hack-nightly-2019-04-08` |
-| **Tue, 9 April**               | `gpu-hack-2019-04-09`         |
-| Tue, 9 April, → Wed, 10 April  | `gpu-hack-nightly-2019-04-09` |
-| **Wed, 10 April**              | `gpu-hack-2019-04-10`         |
-| Wed, 10 April, → Thu, 11 April | `gpu-hack-nightly-2019-04-10` |
-| **Thu, 11 April**              | `gpu-hack-2019-04-11`         |
-| Thu, 11 April, → Fri, 12 April | `gpu-hack-nightly-2019-04-11` |
-| **Fri, 12 April**              | `gpu-hack-2019-04-12`         |
-
-**Nightly Reservations**: The nightly reservations are setup such that the jobs which are scheduled at 18:00 of a given day are run; as soon as no more jobs are in the queue using the reservation, the specific reservation is released (Slurm option `PURGE_COMP`).
-
-Use the reservations with `--reservation NAME`.
-
-The reservation of the day can be set automatically by calling
-
-```bash
-source $PROJECT_training1908/common/environment/reservation.sh
-```
-
-It will put the name of the night-time reservation into environment variable `$GPU_HACK_RESERVATION_NIGHTLY`. The day-time reservation is available at `$GPU_HACK_RESERVATION`, but will be used by Slurm automatically as the responsible environment variables are set (like `$SLURM_RESERVATION`).
-
-### Interactive Jobs
-
-#### Allocating Resources
-
-When running interactively, resources need to be allocated first. `salloc` is the program handling this.
-
-```
-salloc --partition=gpus --gres=gpu:4 --time=0:40:00
-```
-
-Here, a node with 4 GPUs (4 V100 devices) is allocated on the `gpus` partition for 40 minutes. All options are mandatory, expect for `--time` which defaults to 60 minutes.
-
-Further useful options:
-
-* `--ntasks=2`: Allocate resources for 2 tasks
-* `--ntasks-per-node=2`: Start 2 tasks on each node
-* `--ntasks-per-core=4`: Start 4 tasks per core
-* `--cpus-per-task=2`: Each task needs 2 processors per task
-* `--cuda-mps`: Start [MPS](https://docs.nvidia.com/deploy/mps/index.html) server. Needed if multiple MPI ranks are sharing a GPU.
-
-**Note**: A new shell is launched after allocating resources.
-
-If you sourced the *useful variables*, partition and time do not need to be given to the `salloc` call as some defaults have been exported to Slurm environment variables.
-
-#### Launching Jobs
-
-* Print host name: `srun hostname`
-* Launch interactive shell: `srun --pty /bin/bash -i`
-* Forward X11: `srun --pty --forward-x /bin/bash -i`; as a first step after starting an interactive compute shell, please fix your X11 environment by calling `source $PROJECT_training1908/common/environment/juwels-xforward-fix.sh` (or by invoking `x11fix` in case you sourced the *useful variables*)
-
-Further useful options:
-
-* `--cpu_bind=none`: Define affinity
-
-### Batch Jobs
-
-Launch batch jobs with `sbatch script.job`. An example follows.
-
-```
-#!/bin/bash -x
-#SBATCH --nodes=4               # Run on 4 nodes
-#SBATCH --ntasks=8              # Use 8 tasks
-#SBATCH --ntasks-per-node=2     # That means: 2 tasks per node
-#SBATCH --output=gpu-out.%j     # STD out
-#SBATCH --error=gpu-err.%j      # STD err
-#SBATCH --time=00:15:00         # Maximum wall time
-#SBATCH --partition=gpus        # Partition name
-#SBATCH --reservation=gpu-hack-2019-04-10  # Reservation name
-
-#SBATCH --gres=gpu:4            # Allocate resources
-
-srun ./gpu-prog                 # Singe program uses MPI, launch with srun
-```
-
-### Further Commands
-
-* `sinfo`: Show status of partitions
-* `squeue`: List all unfinished jobs
-* `squeue -u ME`: List unfinished jobs of user ME
-* `scancel ID`: Cancel a job with ID
-
-## JURON
-
-For additional hints on JURON's usage, see the man page on the system: `man juron`
-
-### Interactive Jobs (`-I`)
-
-Important commands:
-
-* Print host name (`-I`): `bsub -I hostname`
-* Open interactive pseudo-terminal shell (`-tty`): `bsub -tty -Is /bin/bash`
-* With GPU resources (`-R […]`): `bsub -R "rusage[ngpus_shared=1]" -tty -Is /bin/bash`
-* Forward X, e.g. for *Visual Profiler* (`-XF`): `bsub -R "rusage[ngpus_shared=1]" -I -XF nvvp`
-    - Trouble with this? Make sure you have done the following<a name="xftrouble"></a>
-        + On your local machine, add the `id_train0XX` SSH key to your agent with `ssh-add id_train0XX`
-        + Connect to JURON forwarding your SSH agent (`ssh -A […]`)
-        + On JURON, verify that the system knows your `id_train0XX` key with `ssh-add -l`
-        + *Hint: Have a look at the **Creating Alias** part of the `Login.{md,pdf}` section of the documentation. Much better than creating aliases in your shell.*
-* Use node in exclusive mode (`-x`): `bsub -x -I hostname` (*Please keep your exclusive jobs short*)
-
-Available queues:
-
-* `normal`: For batch compute jobs (max. 8 nodes; max. 12 h)
-* `normal.i`: For interactive jobs (max. 2 nodes; max. 4 h)
-* `vis`: For visualization jobs (max. 2 nodes; max 4h) – these jobs will use hosts `juronc01` through `juronc04` which run an X Server and have a 10 GBit/s Ethernet connection to external networks
-
-Further important parameters and options:
-
-* `-n 23`: Launch 23 tasks
-* `-n 4 -R span[ptile=2]`: Of the 4 launched tasks, run only 2 on the same node
-
-### Batch Jobs
-
-The same flags as above can be used. A batch file is submitted via `bsub < file.job`. Example job file:
-
-```
-#!/bin/bash
-#BSUB -J lsf-tst                   # Job name
-#BSUB -o output.o%J                # Job standard output
-#BSUB -n 8                         # Number of tasks
-#BSUB -q short                     # Job queue
-#BSUB -R "rusage[ngpus_shared=1]"  # Allocate GPUs
-#BSUB -R "span[ptile=4]"           # Spawn 4 processes per node
-#BSUB -a openmpi
-
-module load openmpi
-
-mpirun hostname
-```
-
-### Further Commands
-
-* `bhosts`: List available and busy hosts
-* `bhosts -l juronc05`: List detailed information for host `juronc05`
-* `bjobs`: List the user's current unfinished jobs
-* `bjobs -u all`: Show all currently unfinished jobs
-* `bkill ID`: Kill job with ID
--- a/Environment.md
+++ b/Environment.md
-# JUWELS Hackathon System Environment
-
-There are scripts available to configure the environment for the GPU Hackathon. Use them like this (replacing `environment.sh` with the respective script)
-
-```bash
-source $PROJECT_training1908/common/environment/environment.sh
-```
-
-Available files:
-
-* `environment.sh`: Source all of the following scripts
-* `activate.sh`: Prepare the environment for accounting to `training1909` with `jutil`
-* `reservation.sh`: Set the Slurm reservation for the current day by exporting responsible environment variables; it will also export the associated nightly reservation name to `$GPU_HACK_RESERVATION_NIGHTLY` to be used manually
-* `useful_variables.sh`: Set some useful variables and shortcuts
-    - Set the Slurm environment variables to use the `gpus` partition, such that `--partition gpus` does not need to be appended for each `salloc` command
-    - Set the Slurm environment variables to use an allocation time of 4 h, such that `--time 4:00:00` does not need to be appended for each `salloc` command
-    - Set some `$GPU_HACK_` convenience function
-    - Teach the module system to use module files of `$PROJECT_training1908/common/applications`; see them with `module avail`
-    - Export an alias, `x11fix`, needed when working with GUI applications on JUWELS compute nodes
--- a/JURON.md
+++ b/JURON.md
-# JURON
-
-JURON is a 18 node POWER8NVL system with tight interconnect between CPU and GPU. Each node of JURON has two sockets with 10 pcc64le CPU cores each (adding up to a total of 160 hardware threads when including 8-fold multi-threading); each node hosts 4 Tesla P100 GPUs, where each pair of P100s is connected to one CPU socket with the fast NVLink interconnect. Also, the P100s of a pair are connected via NVLink.
-
-## Module System
-
-JURON offers software via a module system.
-
-Use `module avail` to list all configured modules. `module load NAME` loads a module with `NAME`. `module unload NAME` will unload the module again and `module purge` resets the configuration to its initial state. To list all loaded modules, use `module list`. `module key NAME` searches for `NAME` in all modules.
-
-Software versions are included in the name of the module. Some packages (e.g. OpenMPI) are explicitly compiled with a certain compiler which should be loaded as well.
-
-For the Hackathon of special interest are:
-
-* CUDA module: `module load cuda/10.0.130`
-* GCC modules:
-    - `module load gcc/5.4.0`
-    - `module load gcc/6.3.0`
-    - `module load gcc/7.2.0`
-    - *GCC 4.8.5 is default*
-* PGI modules:
-    - `module load pgi/18.4`
-    - `module load pgi/17.10`
-* OpenMPI modules:
-    - `module load openmpi/3.1.3-gcc_5.4.0-cuda_10.0.130`
-    - `module load openmpi/2.1.2-pgi_18.4-cuda`
-
-## Batch System
-
-JURON uses LSF as its method of running jobs on the GPU-equipped compute nodes. See the `Batch-Systems.md` file for a description.
-
-## Useful Commands and Tools
-
-* `numctl --hardware`: Gives an overview of the CPU/core configuration
-* `taskset -c 2 ./PROG`: Will pin `PROG` on core number 2
-* `nvidia-smi -topo m`: Prints the topology of GPU and CPU devices; useful for determining affinity between the two
-
-## File System
-
-All Jülich systems both share a file system (called *GPFS*). You have different `$HOME` directories for each. In addition, there are two more storage spaces available. Descriptions:
-
-* `$HOME`: Only 5 GB available to have the most-important files
-* `$PROJECT`: Plenty of space for all project members to share
-* `$SCRATCH`: Plenty of temporary space!
-
-For the environment variables to map to the correct values, the project environment needs to be activated with
-
-```bash
-jutil env activate -p training1908 -A training1908
-```
-
-See also [the online description](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/NewUsageModel/UserAndProjectSpace.html?nn=2363700).
-
-## Useful GCC Compiler Options
-
-* `-Ofast`: Compile with optimizations for speed
-* `-flto`: Enable link-time optimization
-* `-mcpu=power8`: Set architecture type, register usage, and instruction scheduling parameters for POWER8
-* `-mveclibabi=mass`: Use the MASS vector library for vectorization
-* *Further:* IBM compiler flags for the SPEC CPU benchmarks (P7): https://www.spec.org/cpu2006/results/res2013q3/cpu2006-20130729-26110.txt
--- a/JUWELS.md
+++ b/JUWELS.md
-# JUWELS
-
-JUWELS is one of Jülich's [Top500 supercomputers](https://www.top500.org/system/179424). The system comprises about 2500 compute nodes of which 48 are equipped with GPUs. Per node, two Intel Skylake CPUs are available and each GPU-node has 4 NVIDIA Tesla V100 GPUs (16 GB Ram).
-
-The documentation of JUWELS is [available online](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUWELS/JUWELS_node.html), there's also a [Quick Start guide](https://apps.fz-juelich.de/jsc/hps/juwels/quickintro.html).
-
-## Module System
-
-JUWELS offers software through a module system. It is organized hierarchically, with the outermost level determined by the chosen compiler. Some software might only be available by loading a certain compiler first. A typical next hierarchical level is the MPI implementation.
-
-`module avail` will show the available compiler entry points, of which `PGI/18.7-GCC-7.3.0` is of special interest for the Hackathon. CUDA can be loaded by `module load CUDA/9.2.88`, `module unload CUDA/9.2.88` will unload it. `module list` lists and `module purge` removes all loaded modules. Most of the times, the version numbers can be omitted.
-
-To search through all available modules for `NAME`, use `module spider NAME`. If `NAME` matches an exact module, like `module spider CUDA/9.2.88`, detailed information about the module and how to load it is displayed. `module key NAME` searches for `NAME` in all module titles or descriptions.
-
-For the Hackathon of special interest are following. Older versions are available in other stages, which can be enabled by calling:
-
-```
-module use /gpfs/software/juwels/otherstages
-[module load Stages/Devel]
-```
-
-If a combination of module and stage in need is not available, please talk to Andreas.
-
-* CUDA module: `module load CUDA/9.2.88`
-    - *Note:* `nvcc_pgc++` is available which calls `nvcc` with the PGI C++ compiler (by `-ccbin=pgc++`)
-    - Alternative CUDA installations are available in other stages (`module use /gpfs/software/juwels/otherstages`)
-        * Stage Devel-2018b: CUDA 10.0.130 (`module load Stages/Devel-2018b CUDA/10.0.130`)
-        * Stage 2019a: CUDA 10.1.105 (`module load Stages/2019a CUDA/10.1.105`)
-* GCC module:
-    - `module load GCC/7.3.0`
-    - GCC 8.2.0 is the current default, but that is not compatible with CUDA 9
-* PGI modules:
-    - `module load PGI/18.7-GCC-7.3.0`
-    - Other stages:
-        * Stage 2019a: PGI 19.3 (`module load Stages/2019a PGI/19.3-GCC-8.3.0`)
-* MPI modules:
-    - `module load MVAPICH2`
-        + *Note:* This should load the correct version for a given compiler automatically (GCC/CUDA: `MVAPICH2/2.3-GDR`, PGI: `MVAPICH2/2.3rc1-GDR`)
-    - There is an experimental OpenMPI with CUDA-support available in stage Devel-2018b: `module load Stages/Devel-2018b OpenMPI/4.0.0-cuda`
-* Score-P modules:
-    - `module load Score-P`, only for `GCC/8.2.0` which isn't working with CUDA (TBD)
-* Scalasca module:
-    - `module load Scalasca`, only for `GCC/8.2.0` which isn't working with CUDA (TBD)
-* Vampir module:
-    - `module load Vampir`, only in `Stages/2018b` (TBD)
-* Nsight Systems / Nsight Compute:
-    - Experimental modules of the two applications are available with `module load nsight-systems nsight-compute` on JUWELS
-
-## Batch System
-
-JUWELS makes the GPU-equipped compute nodes available through the Slurm batch system. See the `Batch-Systems.md` file for a description.
-
-## File System
-
-All Jülich systems both share a file system (called *GPFS*). You have different `$HOME` directories for each. In addition, there are two more storage spaces available. Descriptions:
-
-* `$HOME`: Only 5 GB available to have the most-important files
-* `$PROJECT`: Plenty of space for all project members to share
-* `$SCRATCH`: Plenty of temporary space!
-
-For the environment variables to map to the correct values, the project environment needs to be activated with
-
-```bash
-jutil env activate -p training1908 -A training1908
-```
-
-See also [the online description](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/NewUsageModel/UserAndProjectSpace.html?nn=2363700).
--- a/More.md
+++ b/More.md
-# More…
-
-Some useful links for further reading
-
-* JSC Courses
-    - [CUDA](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Services/Documentation/presentations/presentation-cuda_table.html?nn=362392) (Introduction, Tools, Unified Memory, Matrix Multiplication (+tiled), Performance Optimization, Multi-GPU, CUDA-aware MPI, CUB)
-    - [OpenACC](http://www.fz-juelich.de/ias/jsc/DE/Leistungen/Dienstleistungen/Dokumentation/Praesentationen/folien-openacc_table.html?nn=364550) (Introduction, Multi-GPU, Performance Optimization, Tools, CUDA Interoperability)
-    - Performance metering with Score-P and Vampir and … ([PDF](https://indico-jsc.fz-juelich.de/event/8/session/4/contribution/11/material/slides/0.pdf))
-* OpenACC
-    - OpenACC Quick Reference Guide ([PDF](http://www.openacc.org/sites/default/files/OpenACC_2.5_ref_guide_update.pdf))
-    - OpenACC API ([PDF](http://www.openacc.org/sites/default/files/OpenACC_2pt5.pdf))
-    - PGI OpenACC Getting Started Guide ([PDF](http://www.pgroup.com/doc/openacc_gs.pdf))
-* CUDA
-    - [CUDA C Programming Guide](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)
-    - [CUDA C Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
-    - CUDA Fortran Programming Guide and Reference ([PDF](https://www.pgroup.com/lit/whitepapers/pgicudaforug.pdf))
-    - CUDA Fortran Quick Reference Card ([PDF](https://www.pgroup.com/lit/literature/pgi-cuf-qrg-2014.pdf))
-    - CUDA Fortran Library Interfaces ([PDF](http://www.pgroup.com/doc/pgicudaint.pdf))
-* NVIDIA OpenACC Resources
-    - [Recorded courses](https://developer.nvidia.com/openacc-courses)
-    - [Course from October](https://developer.nvidia.com/intro-to-openacc-course-2016)
-    - [OpenACC Qwiklabs](https://developer.nvidia.com/qwiklabs-signup)
-* [NVIDIA Devblogs: Parallel Forall](https://indico-jsc.fz-juelich.de/event/8/session/4/contribution/11/material/slides/0.pdf)
-* Supercomputers
-    - [JURECA](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JURECA/UserInfo/UserInfo_node.html)
-    - [JURON](https://trac.version.fz-juelich.de/hbp-pcp/wiki/Public)
-* Other
-    - [Helmholtz GPU Hackathon Gitlab](https://gitlab.version.fz-juelich.de/gpu-hackathon/)
--- a/Profiling-Nvidia_Visual_Profiler-nvprof.md
+++ b/Profiling-Nvidia_Visual_Profiler-nvprof.md
-# Profiling with NVIDIA Tools
-
-The CUDA Toolkit comes with two solutions for profiling an application: `nvprof`, which is a command line program, and the GUI application *NVIDIA Visual Profiler* (NVVP).
-
-`nvprof` can be used in batch jobs or smaller interactive runs; NVVP can either import an `nvprof`-generated profile or run interactively through X forwarding.[^freeware]
-
-On JURON, the CUDA Toolkit can be loaded by `module load nvidia/cuda/8.0`; on JURECA, load it by `module load CUDA`.
-
-## Command Line: `nvprof`
-
-For a quick overview of the GPU-invocations of an application, prefix its run with `nvprof`: `nvprof ./APP`.[^srun] `nvprof` instruments the application at run-time.
-
-![nvprof](img/screenshot-nvprof-1--HQ.png)\ 
-
-Among the many options of `nvprof` (see `nvprof --help`) it can export a profile for further usage through NVVP: `nvprof --export-profile FILE ./APP` will export the profile to `FILE`.
-
-To make use of NVVP performance experiments, certain metrics need to be measured by `nvprof`: `nvprof --analysis-metrics --export-profile FILE ./APP` will export the metrics to `FILE`.
-
-Further options of potential interest:
-
-* `--print-gpu-trace`: Show trace of function calls
-* `--openacc-profiling on`: Profile OpenACC as well (*on* by default)
-* `--cpu-profiling on`: Enable some CPU profiling
-* `--csv --log-file FILE`: Generate CSV output and save to `FILE`; handy for plots or benchmarked analysis
-* `--metrics M1`: Measure only metric `M1` which is one of the NVIDIA-provided metrics which can be listed via `--query-metrics`.
-
-→ [docs.nvidia.com/cuda/profiler-users-guide/](http://docs.nvidia.com/cuda/profiler-users-guide/)
-
-## GUI: NVIDIA Visual Profiler
-
-While `nvprof` can be used to collect information or display it concisely on the command line, the Visual Profiler (NVVP) can be helpful to understand an application through a timeline view and by running performance analyses.
-
-NVVP can be launched from the command line with `nvvp` or by installing the CUDA Toolkit on a local machine.
-
-![NVIDIA Visual Profiler](img/screenshot-nvvp-osx--HQ.png)\ 
-
-→ [developer.nvidia.com/nvidia-visual-profiler](https://developer.nvidia.com/nvidia-visual-profiler)
-
-[^freeware]: The CUDA Toolkit is freeware and can be installed on your local machine; even on a laptop without an NVIDIA GPU. This allows for downloading generated `nvprof` profiles and importing them locally or even for connecting to a remote server with NVVP.
-
-[^srun]: Since your application might be run via a batch system, the call to `nvprof` might need to be prefixed by an `srun`. Like in the screenshot.
--- a/Profiling-scorep-Vampir.md
+++ b/Profiling-scorep-Vampir.md
-# Profiling with Score-P and Friends
-
-Score-P allows for detailed instrumentation of an application using CPU and/or GPU. The generated profiles can be analyzed with CUBE.  If OTF2 traces were created, those can be analyzed automatically with Scalasca or manually with Vampir.
-
-## Generating Performance Reports
-
-### Modules
-
-Score-P is available both on JUWELS and on JURON. But with different features and configurations.
-
-* JUWELS
-    - Score-P for GCC + CUDA
-        + `module use /p/scratch/share/GPU-Hackathon/mf`
-        + `module load Score-P/5.0-gnu-mvapich-cuda`
-        + `module load CubeGUI` 
-    - Score-P for PGI 18.4
-        + `module use /p/scratch/share/GPU-Hackathon/mf`
-        + `module load Score-P/5.0-pgi-mvapich-cuda`
-        + `module load CubeGUI` 
-        + *Note: `nvcc` is currently not tested as a target compiler for `scorep`; the CUDA support might be unavailable (OpenACC should work, though)*
-
-### How-to Score-P
-
-#### Compilation
-
-Score-P works easiest by prefixing the compilation and linking command with `scorep`. Instead of calling `nvcc ARGS`, call `scorep nvcc ARGS`; instead of `pgfortran ALL`, call `scorep pgfortran ARGS`.
-
-Important flags to `scorep` for us:
-
-* `--cuda`: Enables CUDA instrumentation
-* `--openacc`: Enables OpenACC instrumentation
-
-If you are compiling through CMake, the Score-P-wrapped compilers like `scorep-nvcc` and `scorep-g++` might be of interest to you in conjunction with  `-DCMAKE_CXX_COMPILER`
-
-#### Running
-
-As soon as the compiled application is launched, the performance report is produced inside of a directory with the current time stamp (that's the default at least). This is the input for further analyses.
-
-Score-P can be steered during measurement by setting environment variables. See `scorep-info config-vars [--full]` for a list.
-
-Important environment variables for us:
-
-* `SCOREP_ENABLE_TRACING=true`: Enables tracing (`SCOREP_TOTAL_MEMORY=120MB` might be useful)
-* `SCOREP_EXPERIMENT_DIRECTORY=something`: Set output directory to `something`
-* `SCOREP_CUDA_ENABLE=runtime,driver,kernel`: Capture calls to CUDA runtime API, driver API, and kernels. There are more, see `scorep-info config-vars --full` for a full list
-* `SCOREP_OPENACC_ENABLE=yes`: Enable measurement of OpenACC regions
-
-For OpenACC measurements you also need to 
-
-* `export ACC_PROFLIB=/p/scratch/share/GPU-Hackathon/packages/scorep/5.0-pgi-mvapich-cuda/lib/libscorep_adapter_openacc_event.so`
-
-## Analyzing Reports
-
-Score-P reports and traces are the basis for analyses with Cube, Scalasca, or Vampir; and with the small `scorep-score`.
-
-### Lightweight and Superficial: `scorep-score`
-
-For a *quick and dirty* look at the performance data, e.g. for validating that something happened at all, `scorep-score` can be used.
-
-```
-scorep-score scorep-20170302_0919_1488442762831800//profile.cubex
-```
-
-A result for this example (STREAM benchmark) looks like this:
-
-```
-$ scorep-score -r scorep-20170302_0919_1488442762831800/profile.cubex
-
-Estimated aggregate size of event trace:                   1465 bytes
-Estimated requirements for largest trace buffer (max_buf): 1465 bytes
-Estimated memory requirements (SCOREP_TOTAL_MEMORY):       4097kB
-(hint: When tracing set SCOREP_TOTAL_MEMORY=4097kB to avoid intermediate flushes
- or reduce requirements using USR regions filters.)
-
-flt     type max_buf[B] visits time[s] time[%] time/visit[us]  region
-         ALL      1,464     61    1.32   100.0       21572.72  ALL
-         USR      1,464     61    1.32   100.0       21572.72  USR
-
-         USR        288     12    0.00     0.0           0.08  convertrate
-         USR        240     10    0.00     0.0           5.20  copy
-         USR        240     10    0.00     0.0           4.80  scale
-         USR        240     10    0.00     0.0           4.70  add
-         USR        240     10    0.00     0.0           5.10  triad
-```
-
-
-### Cube
-
-Cube is the *performance report explorer for Scalasca*, a GUI application which can be launched either on JURON or JUWELS. It can also be downloaded as free software from [scalasca.org](http://www.scalasca.org/software/cube-4.x/download.html) and run locally to explore profiles.
-
-After importing a profile, Cube looks like this:
-
-![Cube](img/screenshot-cube--HQ.png)
-
-
-### Vampir
-
-Vampir is installed on JUWELS, available through the `Vampir` module: `module load Vampir`.
-
-Vampir is used to study trace files, generate them with Score-P by setting the environment variable `SCOREP_ENABLE_TRACING=true` prior to running your program.  
-If the environment variable is set, a `.otf2` trace file is placed into the Score-P result directory. Open it with `vampir`.
-
-![Vampir](img/screenshot-vampir.png)\ 
--- a/README.md
+++ b/README.md
-# Helmholtz GPU Hackathon 2019
+# Helmholtz GPU Hackathon 2021

-This repository hold the documentation for the GPU Hackathon 2019 at Jülich Supercomputing Centre (Forschungszentrum Jülich).
+This repository holds the documentation for the Helmholtz GPU Hackathon 2021 at Jülich Supercomputing Centre (Forschungszentrum Jülich).

-Currently, the documentation is still being compiled. If you find errors or room for improvement, please file an issue!
+For additional info, please write to Andreas Herten (<a.herten@fz-juelich.de>) or Filipe Guimaraes (<f.guimaraes@fz-juelich.de>) on Slack or email.

-Available documents:
+## Sign-Up

-* [Account Creation and Login](Accounts.md)
-* [JUWELS Introduction](JUWELS.md)
-* [JURON Introduction](JURON.md)
-* [Setting up JUWELS Environment for GPU Hackathon](Environment.md)
-* [Overview of the Batch Systems](Batch-Systems.md)
-* [More Information and Useful Links](More.md)
-* Folder: [Previous communications](./communication/)
+Please use JuDoor to sign up for our training project, training2104: [https://judoor.fz-juelich.de/projects/join/training2105](https://judoor.fz-juelich.de/projects/join/training2105)

-See the directory ./pdf/ for PDF version of the files, for example all.pdf.
+Make sure to accept the usage agreement for JUWELS Booster.
+
+Please upload your SSH key to the system via JuDoor. The key needs to be restricted to accept accesses only from a specific source, as specified through the `from` clause. Please have a look at the associated documentation ([SSH Access](https://apps.fz-juelich.de/jsc/hps/juwels/access.html) and [Key Upload](https://apps.fz-juelich.de/jsc/hps/juwels/access.html#key-upload-key-restriction)).
+
+## JUWELS Booster
+
+We are using JUWELS Booster for the Hackathon, a system equipped with 3600 A100 GPUs. See here for a overview of the JUWELS Booster system: [https://apps.fz-juelich.de/jsc/hps/juwels/booster-overview.html](https://apps.fz-juelich.de/jsc/hps/juwels/booster-overview.html)
+
+## Access
+
+After successfully uploading your key through JuDoor, you should be able to access JUWELS Booster via
+
+```bash
+ssh user1@juwels-booster.fz-juelich.de
+```
+
+An alternative way of access JUWELS Booster is through _Jupyter JSC_, JSC's Jupyter-based web portal available at [https://jupyter-jsc.fz-juelich.de](https://jupyter-jsc.fz-juelich.de). Sessions should generally be launched on the login nodes. A great alternative to X is available through the portal called Xpra. It's great to run the Nsight tools!
+
+## Environment
+
+On the system, different directories are accessible to you. To set environment variables according to a project, call the following snippet after logging in:
+
+```bash
+jutil env activate -p training2105 -A training2105
+```
+
+This will, for example, make the directory `$PROJECT` available to use, which you can use to store data. Your `$HOME` will not be a good place for data storage, as it is severely limited! Use `$PROJECT` (or `$SCRATCH`, see documentation on [_Available File Systems_](https://apps.fz-juelich.de/jsc/hps/juwels/environment.html?highlight=scratch#available-file-systems)).
+
+Different software can be loaded to the environment via environment modules, via the `module` command. To see available compilers (the first level of a toolchain), type `module avail`.  
+For JUWELS Booster, the most relevant modules are
+    * Compiler: `GCC` (with additional `CUDA`), `NVHPC`
+    * MPI: `ParaStationMPI`, `OpenMPI` (make sure to have loaded `mpi-settings/CUDA` as well)
+
+## Batch System
+
+JUWELS Booster uses a special flavor of Slurm as its workload manager (_PS_Slurm). Most of the vanilla Slurm commands are available with some Jülich-specific additions. An overview of Slurm is available in the according documentation which also gives example job scripts and interactive commands: [https://apps.fz-juelich.de/jsc/hps/juwels/batchsystem.html](https://apps.fz-juelich.de/jsc/hps/juwels/batchsystem.html)
+
+Please account your jobs to the `training2105` project, either by setting the according environment variable with the above `jutil` command, or by manually adding `-A training2105` to your batch jobs.
+
+Two partitions are available (see [documentation for limits](https://apps.fz-juelich.de/jsc/hps/juwels/quickintro.html#booster)):
+
+* `booster`: most of the nodes
+* `develbooster`: 10 nodes for development
+
+For the days of the Hackathon we will put reservations in place to accelerate allocation of jobs.
+
+X-forwarding sometimes is a bit of a challenge, please consider using _Xpra_ in your Browser through Jupyter JSC!
+
+## Etc
+
+### Previous Documentation
+
+More (although slightly outdated) documentation is available from the 2019 Hackathon [in the according branch](https://gitlab.version.fz-juelich.de/gpu-hackathon/doc/-/tree/2019).
+
+### PDFs
+
+See the directory `./pdf/` for PDF version of the documentation, for example `all.pdf`.
--- a/communication/aherten-introduction.pdf
+++ b/communication/aherten-introduction.pdf
--- a/communication/email1.md
+++ b/communication/email1.md
-Date: 2019-Mar-07 16:40
-Note: Mentor-only Email
---
-Dear Mentor,
-
-Thank you for volunteering to be a mentor at the Helmholtz GPU Hackathon 2019! We are looking forward to having you!
-
-Experience at past Hackathons shows that it's very beneficial to get the applications onto the systems well in advance – such that come Hackathon Monday we don't need to waste precious time to find the correct GCC flags on the system (or similar). In this email you find all necessary information to achieve that (plus more).  
-Right now, the attendees will NOT get any email from us organizers; it is YOUR responsibility to relay the information from this mail to everyone in your team. Make sure every attendee of your team went through the sign-up procedure and has access to JUWELS well before the event!
-
-Close before the event we'll send out another email to everyone with some more recent info.
-
-# Documentation
-
-Documentation of the event is available at this URL: https://gitlab.version.fz-juelich.de/gpu-hackathon/doc. I'd recommend looking at the Markdown (.md) files, but in the pdf folder there are also PDF documents of the documentation.
-
-The documentation is based on the repository of information from the last Jülich Hackathon. Although I added quite some new information already, there is still need for update at parts – that will happen gradually before the Hackathon.
-
-If you find something wrong or have a need for improvement, please file an issue in the repo (use your JuDOOR login, see below).
-
-# Infrastructure
-
-The Hackathon will be held on the Jülich supercomputer JUWELS, a x86 system which has 48 GPU nodes with each 4 Tesla V100 cards. As a backup, we will also have access to JURON, our smaller POWER-based system.
-Also see the documentation document on JUWELS: https://gitlab.version.fz-juelich.de/gpu-hackathon/doc/blob/master/JUWELS.md.
-
-# Accounts
-
-Account management in Jülich is done through a new portal called JuDOOR. If you haven't, please sign up for an account there (also called our JSC LDAP account) and login. Then request access to the Hackathon project (training1908). You will get access to JUWELS, JUWELS GPU, and JURON.
-
-Afterwards, please accept the usage agreements and upload your SSH key – all this can be done within JuDOOR. If you're done, login to the system with SSH via `ssh login1@juwels.fz-juelich.de`.
-
-See also the documentation on Accounts: https://gitlab.version.fz-juelich.de/gpu-hackathon/doc/blob/master/Accounts.md
-
-# Getting Started on System
-
-After logging in with SSH, first activate the environment for our project (this sets environment variables etc):
-
-```bash
-jutil env activate -p training1908 -A training1908
-```
-
-Now you can start developing your code on the system: Use modules from the module system (`module avail`…); compile your code on the login nodes. 
-
-To launch a GPU application, you need to make use of the GPU nodes of JUWELS. They are available via the batch system (I'd recommend using the `develgpus` partition for now); the login nodes don't have GPUs!  
-As soon as you launch a batch job (GPU or no GPU) you are using compute time. The project has next to no compute time allocated at this point in time, because right now we just want to test out if the application in question compiles and runs successfully. We will have ample compute time when the Hackathon happens, but until then, please consider compute time a very scarce resource.
-
-For more info on the batch system see https://gitlab.version.fz-juelich.de/gpu-hackathon/doc/blob/master/Batch-Systems.md#juwels; a preliminary set of interesting modules on JUWELS is available at https://gitlab.version.fz-juelich.de/gpu-hackathon/doc/blob/master/JUWELS.md#module-system
-
-# More
-
-Any further questions? Please tell me!
-
-Until then,
-
-Andreas
--- a/communication/email2.md
+++ b/communication/email2.md
-Date: 2019-Apr-02 14:40
---
-
-Dear Attendee,
-
-It’s not long until we will meet for a week of hacking and parallelization. We are looking forward to having you!
-
-Here are a few last logistics items for the Helmholtz GPU Hackathon 2019!
-
-# Arrival
-
-* Bus: A daily Hackathon shuttle bus will commute between Jülich city and Jülich Supercomputing Centre (JSC). The bus will leave at 8:30 from Neues Rathaus Jülich ([Große Rurstraße](https://goo.gl/maps/rRQwPoU4GzR2)) and will bring you to building 16.4 of JSC/Forschungszentrum. The bus will leave JSC at 18:00, going back to Jülich. (Except on Friday: no bus is going back on Friday.)
-* Campus Entry: Forschungszentrum campus is a restricted area and access is limited. The front gate reception knows about every Hackathon attendee and will have visitor badges prepared. Please bring a photo ID.
-   - In case you come with the shuttle bus: The bus will stop at the reception and you will have a chance to register, before the bus continues on to the Hackathon building; sometimes, reception employees even come into the bus to register each of you on-the-fly
-   - In case you come on your own: Please park your car at the parking spaces outside of the campus in front of the reception, register at the reception, and then continue driving to building 16.4
-* Location: The Hackathon will happen in the Rotunda room of JSC’s new building 16.4 (upper floor). It’s round and silver; you can’t miss it. In case you travel by bus, you’ll be driven directly to the front door; in case you travel on your own, please have a look at the [*how to reach us*-webpage](http://www.fz-juelich.de/portal/EN/Service/Howtoreachus/_node.html) of Forschungszentrum, see especially the [Wegeleitsystem](http://www.fz-juelich.de/SharedDocs/Bilder/_common/DE/wegeleitsystem.jpg?__blob=poster).
-
-# Systems
-
-* WiFi: Access to the internet will be provided via Eduroam. In case your home institution does not participate in Eduroam, we will give you access to the `fzj` WiFi on-site.
-* System accounts: By now, everyone should have their accounts on the supercomputing systems in Jülich (JUWELS, JURON). In case you have not, you should have gotten a reminder email. Consider this the second reminder to the second reminder.
-* Installation: As your mentors surely have mentioned, it is important that you have installed your software on JUWELS before the Hackathon starts; such that we can concentrate on the important bits come Monday. Please make sure everything’s running!
-* Gitlab Documentation: There’s plenty of documentation online on our Gitlab server at https://gitlab.version.fz-juelich.de/gpu-hackathon/doc and it will probably grow over the next 10 days. Make sure to search this website first in case you have questions about the systems.
-
-# Infrastructure
-
-* Room: We will be sitting in the round Rotunda room of JSC (16.4, upper floor). Although it’s the largest room we have, it will still be quite packed for our Hackathon. Each team will sit together in a small table group.
-* Lunch: The Forschungszentrum cafeteria (»Casino«) is very close and located at our beautiful pond. Lunch is pay-on-your-own.
-* Coffee (No-)Breaks: There will be JSC-sponsored coffee no-breaks during the day. No-breaks: We try to have some water, juice, and hot coffee at all Hackathon hours and supply plenty of cookies and fruit for intermediate re-fuelling. We don’t want to break your flow of work so there’s no dedicated time slot for coffee breaks. (Breaks are important, though, so go do them!)
-* Social Dinner: There will be a social dinner on Tuesday evening.
-
-If you have further questions, just send me an email.  
-If not, I’m keen to see you all on Monday!
-
-Andreas
--- a/img/screenshot-cube--HQ.png
+++ b/img/screenshot-cube--HQ.png
--- a/img/screenshot-nvprof-1--HQ.png
+++ b/img/screenshot-nvprof-1--HQ.png
--- a/img/screenshot-nvvp-osx--HQ.png
+++ b/img/screenshot-nvvp-osx--HQ.png
--- a/img/screenshot-vampir.png
+++ b/img/screenshot-vampir.png
--- a/pdf/Accounts.pdf
+++ b/pdf/Accounts.pdf
--- a/pdf/Batch-Systems.pdf
+++ b/pdf/Batch-Systems.pdf
--- a/pdf/JURON.pdf
+++ b/pdf/JURON.pdf