Helmholtz GPU Hackathon 2021
This repository holds the documentation for the Helmholtz GPU Hackathon 2021 at Jülich Supercomputing Centre (Forschungszentrum Jülich).
For additional info, please write to Andreas Herten (a.herten@fz-juelich.de) or Filipe Guimaraes (f.guimaraes@fz-juelich.de) on Slack or email.
Sign-Up
Please use JuDoor to sign up for our training project, training2105: https://judoor.fz-juelich.de/projects/join/training2105
Make sure to accept the usage agreement for JUWELS Booster.
Please upload your SSH key to the system via JuDoor. The key needs to be restricted to accept accesses only from a specific source, as specified through the from
clause. Please have a look at the associated documentation (SSH Access and Key Upload).
JUWELS Booster
We are using JUWELS Booster for the Hackathon, a system equipped with 3600 A100 GPUs. See here for a overview of the JUWELS Booster system: https://apps.fz-juelich.de/jsc/hps/juwels/booster-overview.html
Access
After successfully uploading your key through JuDoor, you should be able to access JUWELS Booster via
ssh user1@juwels-booster.fz-juelich.de
An alternative way of access JUWELS Booster is through Jupyter JSC, JSC's Jupyter-based web portal available at https://jupyter-jsc.fz-juelich.de. Sessions should generally be launched on the login nodes. A great alternative to X is available through the portal called Xpra. It's great to run the Nsight tools!
Environment
On the system, different directories are accessible to you. To set environment variables according to a project, call the following snippet after logging in:
jutil env activate -p training2105 -A training2105
This will, for example, make the directory $PROJECT
available to use, which you can use to store data. Your $HOME
will not be a good place for data storage, as it is severely limited! Use $PROJECT
(or $SCRATCH
, see documentation on Available File Systems).
Different software can be loaded to the environment via environment modules, via the module
command. To see available compilers (the first level of a toolchain), type module avail
.
For JUWELS Booster, the most relevant modules are
- Compiler:
GCC
(with additionalCUDA
),NVHPC
- MPI:
ParaStationMPI
,OpenMPI
(make sure to have loadedmpi-settings/CUDA
as well)
Batch System
JUWELS Booster uses a special flavor of Slurm as its workload manager (PSSlurm). Most of the vanilla Slurm commands are available with some Jülich-specific additions. An overview of Slurm is available in the according documentation which also gives example job scripts and interactive commands: https://apps.fz-juelich.de/jsc/hps/juwels/batchsystem.html
Please account your jobs to the training2105
project, either by setting the according environment variable with the above jutil
command, or by manually adding -A training2105
to your batch jobs.
Two partitions are available (see documentation for limits):
-
booster
: most of the nodes -
develbooster
: 10 nodes for development
For the days of the Hackathon, reservations are in place to accelerate scheduling of jobs. Use the following reservations names for the respective days with the --reservation gpu-hack-2021-dayX
Slurm option:
- Day 1, 15 March:
gpu-hack-2021-day1
- Day 2, 22 March:
gpu-hack-2021-day2
- Day 3, 23 March:
gpu-hack-2021-day3
- Day 4, 24 March:
gpu-hack-2021-day4
X-forwarding sometimes is a bit of a challenge, please consider using Xpra in your Browser through Jupyter JSC!
Etc
Slides
Please see juwels-booster-intro-slides.pdf for slides presented during the introduction to the Hackathon.
Previous Documentation
More (although slightly outdated) documentation is available from the 2019 Hackathon in the according branch.
PDFs
See the directory ./pdf/
for PDF version of the documentation, for example all.pdf
.