Skip to content
Snippets Groups Projects
Select Git revision
  • 2025 default
  • 2024
  • 2023
  • 2021
  • 2019
  • 2017
6 results

doc

  • Clone with SSH
  • Clone with HTTPS
  • Name Last commit Last update
    pdf
    .generatePdf.mk
    README.md

    Helmholtz GPU Hackathon 2023

    This repository holds the documentation for the Helmholtz GPU Hackathon 2023 at Jülich Supercomputing Centre (Forschungszentrum Jülich).

    For additional info, please write to Andreas Herten (a.herten@fz-juelich.de) on Slack or email.

    Sign-Up

    Please use JuDoor to sign up for our training project, training2310: https://judoor.fz-juelich.de/projects/join/training2310

    Make sure to accept the usage agreement for JURECA DC and JUWELS Booster.

    Please upload your SSH key to the system via JuDoor. The key needs to be restricted to accept accesses only from a specific source, as specified through the from clause. Please have a look at the associated documentation (SSH Access and Key Upload).

    HPC Systems

    We are using primarily JURECA DC for the Hackathon, a system with 768 NVIDIA A100 GPUs. As an optional alternative, also the JUWELS Booster system with its 3600 A100 GPUs and stronger node-to-node interconnect (4×200 Gbit/s, instead of 2×200 Gbit/s for JURECA DC) can be utilized. The focus should be on JURECA DC, though.

    For the system documentation, see the following websites:

    Access

    After successfully uploading your key through JuDoor, you should be able to access JURECA DC via

    ssh user1@jureca.fz-juelich.de

    The hostname for JUWELS Booster is juwels-booster.fz-juelich.de.

    An alternative way of access the systems is through Jupyter JSC, JSC's Jupyter-based web portal available at https://jupyter-jsc.fz-juelich.de. Sessions should generally be launched on the login nodes. A great alternative to X is available through the portal called Xpra. It's great to run the Nsight tools!

    Environment

    On the systems, different directories are accessible to you. To set environment variables according to a project, call the following snippet after logging in:

    jutil env activate -p training2310 -A training2310

    This will, for example, make the directory $PROJECT available to use, which you can use to store data. Your $HOME will not be a good place for data storage, as it is severely limited! Use $PROJECT (or $SCRATCH, see documentation on Available File Systems).

    Different software can be loaded to the environment via environment modules, via the module command. To see available compilers (the first level of a toolchain), type module avail.
    The most relevant modules are

    • Compiler: GCC (with additional CUDA), NVHPC
    • MPI: ParaStationMPI, OpenMPI (make sure to have loaded MPI-settings/CUDA as well)

    Containers

    JSC supports containers thorugh Apptainer (previously: Singularity) on the HPC systems. The details are covered in a dedicated article in the systems documetnation. Access is subject to accepting a dedicated license agreement (because of special treatment regarding support) on JuDoor.

    Once access is granted (check your groups), Docker containers can be imported and executed similarly to the following example:

    $ apptainer pull tf.sif docker://nvcr.io/nvidia/tensorflow:20.12-tf1-py3
    $ srun -n 1 --pty apptainer exec --nv tf.sif python3 myscript.py

    Batch System

    The JSC systems use a special flavor of Slurm as the workload manager (PSSlurm). Most of the vanilla Slurm commands are available with some Jülich-specific additions. An overview of Slurm is available in the according documentation which also gives example job scripts and interactive commands: https://apps.fz-juelich.de/jsc/hps/jureca/batchsystem.html

    Please account your jobs to the training2310 project, either by setting the according environment variable with the above jutil command (as above), or by manually adding -A training2310 to your batch jobs.

    Different partitions are available (see documentation for limits):

    • dc-gpu: All GPU-equipped nodes
    • dc-gpu-devel: Some nodes available for development

    For the days of the Hackathon, reservations will be in place to accelerate scheduling of jobs. The reservations will be announced at this point closer to the event.

    X-forwarding sometimes is a bit of a challenge, please consider using Xpra in your Browser through Jupyter JSC!

    Etc

    Previous Documentation

    More (although slightly outdated) documentation is available from the 2021 Hackathon in the according JSC Gitlab Hackathon docu branch.

    PDFs

    See the directory ./pdf/ for PDF version of the documentation.