This repository holds the documentation for the Helmholtz GPU Hackathon 2024 at CASUS Görlitz.
This repository holds the documentation for the Helmholtz GPU Hackathon 2025 at Forschungszentrum Jülich.
For additional info, please write #cluster-support on Slack.
## Sign-Up
Please use JuDoor to sign up for our training project, training2406: [https://judoor.fz-juelich.de/projects/join/training2406](https://judoor.fz-juelich.de/projects/join/training2406)
Please use JuDoor to sign up for our training project, training2508: [https://judoor.fz-juelich.de/projects/join/training2508](https://judoor.fz-juelich.de/projects/join/training2508)
Make sure to accept the usage agreement for JURECA-DC and JUWELS Booster.
Make sure to accept the usage agreement for JEDI.
Please upload your SSH key to the system via JuDoor. The key needs to be restricted to accept accesses only from a specific source, as specified through the `from` clause. Please have a look at the associated documentation ([SSH Access](https://apps.fz-juelich.de/jsc/hps/juwels/access.html) and [Key Upload](https://apps.fz-juelich.de/jsc/hps/juwels/access.html#key-upload-key-restriction)).
Please upload your SSH key to the system via JuDoor. The key needs to be restricted to accept accesses only from a specific source, as specified through the `from` clause. Please have a look at the associated documentation ([SSH Access](https://apps.fz-juelich.de/jsc/hps/jedi/access.html) and [Key Upload](https://apps.fz-juelich.de/jsc/hps/jedi/access.html#key-upload-key-restriction)).
## HPC Systems
We are using primarily JURECA-DC for the Hackathon, a system with 768 NVIDIA A100 GPUs.
We are using primarily JEID for the Hackathon, the JUPITER precursor system with 192 NVIDIA Hopper GPUs.
For the system documentation, see the following websites:
After successfully uploading your key through JuDoor, you should be able to access JURECA-DC via
After successfully uploading your key through JuDoor, you should be able to access JEDI via
```bash
ssh user1@jureca.fz-juelich.de
ssh user1@login.jedi.fz-juelich.de
```
The hostname for JUWELS Booster is `juwels-booster.fz-juelich.de`.
The hostname for JEDI is `login.jedi.fz-juelich.de`.
An alternative way of access the systems is through _Jupyter JSC_, JSC's Jupyter-based web portal available at [https://jupyter-jsc.fz-juelich.de](https://jupyter-jsc.fz-juelich.de). Sessions should generally be launched on the login nodes. A great alternative to X is available through the portal called Xpra. It's great to run the Nsight tools!
An alternative way of access the systems is through _Jupyter JSC_, JSC's Jupyter-based web portal available at [https://jupyter.jsc.fz-juelich.de/workshops/gpuhack25](https://jupyter.jsc.fz-juelich.de/workshops/gpuhack25)(link to a pre-configured session). Sessions should generally be launched on the login nodes. A great alternative to X is available through the portal called Xpra. It's great to run the Nsight tools!
## Environment
On the systems, different directories are accessible to you. To set environment variables according to a project, call the following snippet after logging in:
```bash
jutil env activate -p training2406-A training2406
jutil env activate -p training2508-A training2508
```
This will, for example, make the directory `$PROJECT` available to use, which you can use to store data. Your `$HOME` will not be a good place for data storage, as it is severely limited! Use `$PROJECT` (or `$SCRATCH`, see documentation on [_Available File Systems_](https://apps.fz-juelich.de/jsc/hps/jureca/environment.html#available-file-systems)).
This will, for example, make the directory `$PROJECT` available to use, which you can use to store data. Your `$HOME` will not be a good place for data storage, as it is severely limited! Use `$PROJECT` (or `$SCRATCH`, see documentation on [_Available File Systems_](https://apps.fz-juelich.de/jsc/hps/jedi/environment.html#available-file-systems)).
Different software can be loaded to the environment via environment modules, via the `module` command. To see available compilers (the first level of a toolchain), type `module avail`.
The most relevant modules are
...
...
@@ -51,7 +50,7 @@ The most relevant modules are
## Containers
JSC supports containers thorugh Apptainer (previously: Singularity) on the HPC systems. The details are covered in a [dedicated article in the systems documetnation](https://apps.fz-juelich.de/jsc/hps/jureca/container-runtime.html). Access is subject to accepting a dedicated license agreement (because of special treatment regarding support) on JuDoor.
JSC supports containers thorugh Apptainer (previously: Singularity) on the HPC systems. The details are covered in a [dedicated article in the systems documetnation](https://apps.fz-juelich.de/jsc/hps/jedi/container-runtime.html). Access is subject to accepting a dedicated license agreement (because of special treatment regarding support) on JuDoor.
Once access is granted (check your `groups`), Docker containers can be imported and executed similarly to the following example:
The JSC systems use a special flavor of Slurm as the workload manager (PSSlurm). Most of the vanilla Slurm commands are available with some Jülich-specific additions. An overview of Slurm is available in the according documentation which also gives example job scripts and interactive commands: [https://apps.fz-juelich.de/jsc/hps/jureca/batchsystem.html](https://apps.fz-juelich.de/jsc/hps/jureca/batchsystem.html)
The JSC systems use a special flavor of Slurm as the workload manager (PSSlurm). Most of the vanilla Slurm commands are available with some Jülich-specific additions. An overview of Slurm is available in the according documentation which also gives example job scripts and interactive commands: [https://apps.fz-juelich.de/jsc/hps/jedi/batchsystem.html](https://apps.fz-juelich.de/jsc/hps/jedi/batchsystem.html)
Please account your jobs to the `training2406` project, either by setting the according environment variable with the above `jutil` command (as above), or by manually adding `-A training2406` to your batch jobs.
Please account your jobs to the `training2508` project, either by setting the according environment variable with the above `jutil` command (as above), or by manually adding `-A training2508` to your batch jobs.
Different partitions are available (see [documentation for limits](https://apps.fz-juelich.de/jsc/hps/jureca/batchsystem.html#jureca-dc-module-partitions)):
*`dc-gpu`: All GPU-equipped nodes
*`dc-gpu-devel`: Some nodes available for development
Only one partition is available on JEDI, called `all` (see [documentation for limits](https://apps.fz-juelich.de/jsc/hps/jedi/batchsystem.html)).
For the days of the Hackathon, reservations will be in place to accelerate scheduling of jobs.
* Day 1: `--reservation gpuhack24`
* Day 2: `--reservation gpuhack24-2024-04-23`
* Day 3: `--reservation gpuhack24-2024-04-24`
* Day 4: `--reservation gpuhack24-2024-04-25`
* Day 5: `--reservation gpuhack24-2024-04-26`
X-forwarding sometimes is a bit of a challenge, please consider using _Xpra_ in your Browser through Jupyter JSC!
## Etc
### Previous Documentation
More (although slightly outdated) documentation is available from the 2021 Hackathon [in the according JSC Gitlab Hackathon docu branch](https://gitlab.version.fz-juelich.de/gpu-hackathon/doc/-/tree/2021).
### PDFs
See the directory `./pdf/` for PDF version of the documentation.