diff --git a/.generatePdf.mk b/.generatePdf.mk index 0422be2fd11f80a360c6c8edad1eb20c33b85322..41efa1da5f0938bfd8aa16efc5e7ff1883bdaf3d 100755 --- a/.generatePdf.mk +++ b/.generatePdf.mk @@ -24,7 +24,7 @@ PDFS = $(SRC:.md=.pdf) all: $(PDFS) all.pdf %.pdf: %.md $(MAKEFILE) - $(LC) $(LCFLAGS) -o pdf/$@ $< + sed "s/→/$$\\\rightarrow$$/" $< | $(LC) $(LCFLAGS) -o pdf/$@ all.pdf: LCFLAGS += --toc all.pdf: LCFLAGS += --variable title:"GPU Eurohack 2019 User Guide" diff --git a/Accounts.md b/Accounts.md index 84b62079de26e20f25586debebdd1620ed2db806..514f6669d714372afd565f5cc48264291222ae72 100644 --- a/Accounts.md +++ b/Accounts.md @@ -1,12 +1,14 @@ # Accounts +The GPU Hackathon will use the Jülich supercomputer *JUWELS*. As a backup, we prepare access to the *JURON* machine. Both are centrally managed. + ## Account Creation User management for the supercomputers in Jülich is done centrally via the JuDOOR portal. Hackathon attendees need to signup for a JuDOOR account and then apply to be added to the Hackathon project `training1908`. This link will let you join the project: [https://dspserv.zam.kfa-juelich.de/judoor/projects/join/TRAINING1908](https://dspserv.zam.kfa-juelich.de/judoor/projects/join/TRAINING1908) -Once you are in the project, you need to agree to the usage agreements of `JUWELS` and `JUWELS GPU`. +Once you are in the project, you need to agree to the usage agreements of `JUWELS`, `JUWELS GPU`, and `JURON`. After that, you can upload your SSH public key via the »Manage SSH-keys« link. *(New to SSH? See for example [this help at Github](https://help.github.com/en/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent).)** @@ -18,11 +20,13 @@ Please login to JUWELS via SSH ssh name1@juwels.fz-juelich.de ``` +For JURON, choose `juron.fz-juelich.de` as the hostname. + In case you are using PuTTY on Windows, maybe see [this external tutorial](https://devops.profitbricks.com/tutorials/use-ssh-keys-with-putty-on-windows/#use-existing-public-and-private-keys) . ## Environment -One of the first steps after login should be to activate the environment for the GPU Hackathon using `jtuil`: +One of the first steps after login should be to activate the environment for the GPU Hackathon using `jutil`: ```bash jutil env activate -p training1908 -A training1908 diff --git a/Batch-Systems.md b/Batch-Systems.md index d3fb1f32b385136089508f901b76ea74d3a6b8f1..77ac4e4ab4c40db2b742145a20da0112e7b6683f 100644 --- a/Batch-Systems.md +++ b/Batch-Systems.md @@ -1,71 +1,33 @@ # Batch Systems -JURON and JURECA have different batch systems: JURON runs LSF, JURECA runs Slurm. This document tries to summarize the basic information needed to run jobs on the systems. +JUWELS and JURON have different batch systems: JUWELS runs Slurm, JURON runs LSF. This document tries to summarize the basic information needed to run jobs on the systems. -## JURON - -For additional hints on JURON's usage, see the man page on the system: `man juron` - -### Interactive Jobs (`-I`) - -Important commands: - -* Print host name (`-I`): `bsub -I hostname` -* Open interactive pseudo-terminal shell (`-tty`): `bsub -tty -Is /bin/bash` -* With GPU resources (`-R […]`): `bsub -R "rusage[ngpus_shared=1]" -tty -Is /bin/bash` -* Forward X, e.g. for *Visual Profiler* (`-XF`): `bsub -R "rusage[ngpus_shared=1]" -I -XF nvvp` - - Trouble with this? Make sure you have done the following<a name="xftrouble"></a> - + On your local machine, add the `id_train0XX` SSH key to your agent with `ssh-add id_train0XX` - + Connect to JURON forwarding your SSH agent (`ssh -A […]`) - + On JURON, verify that the system knows your `id_train0XX` key with `ssh-add -l` - + *Hint: Have a look at the **Creating Alias** part of the `Login.{md,pdf}` section of the documentation. Much better than creating aliases in your shell.* -* Use node in exclusive mode (`-x`): `bsub -x -I hostname` (*Please keep your exclusive jobs short*) - -Available queues: - -* `normal`: For batch compute jobs (max. 8 nodes; max. 12 h) -* `normal.i`: For interactive jobs (max. 2 nodes; max. 4 h) -* `vis`: For visualization jobs (max. 2 nodes; max 4h) – these jobs will use hosts `juronc01` through `juronc04` which run an X Server and have a 10 GBit/s Ethernet connection to external networks - -Further important parameters and options: - -* `-n 23`: Launch 23 tasks -* `-n 4 -R span[ptile=2]`: Of the 4 launched tasks, run only 2 on the same node - -### Batch Jobs - -The same flags as above can be used. A batch file is submitted via `bsub < file.job`. Example job file: - -``` -#!/bin/bash -#BSUB -J lsf-tst # Job name -#BSUB -o output.o%J # Job standard output -#BSUB -n 8 # Number of tasks -#BSUB -q short # Job queue -#BSUB -R "rusage[ngpus_shared=1]" # Allocate GPUs -#BSUB -R "span[ptile=4]" # Spawn 4 processes per node -#BSUB -a openmpi -module load openmpi +## JUWELS -mpirun hostname -``` +Documentation for JUWELS's batch system can be found [online](https://apps.fz-juelich.de/jsc/hps/juwels/batchsystem.html). JUWELS uses Slurm which has its own MPI launcher, called `srun`. -### Further Commands +For the Hackathon, reservations are available for each day; please use these reservations, otherwise you will be stuck in the queue for years (JUWELS's GPU nodes are very popular). -* `bhosts`: List available and busy hosts -* `bhosts -l juronc05`: List detailed information for host `juronc05` -* `bjobs`: List the user's current unfinished jobs -* `bjobs -u all`: Show all currently unfinished jobs -* `bkill ID`: Kill job with ID +### Reservation Overview -## JUWELS +There are large reservations during the active working hours of the Hackathon (9:00 to 18:00) and smaller reservations during the night (18:00 to 9:00). Please see the note on the night reservations below! -Documentation for JUWELS' batch system can be found [online](https://apps.fz-juelich.de/jsc/hps/juwels/batchsystem.html). +| Date | Reservation Name | +|--------------------------------|------------------------------| +| **Mon, 8 April** | `gpu-hack-2019-04-08` | +| Mon, 8 April, → Tue, 9 April | `gpu-hack-nightly-2019-04-08` | +| **Tue, 9 April** | `gpu-hack-2019-04-09` | +| Tue, 9 April, → Wed, 10 April | `gpu-hack-nightly-2019-04-09` | +| **Wed, 10 April** | `gpu-hack-2019-04-10` | +| Wed, 10 April, → Thu, 11 April | `gpu-hack-nightly-2019-04-10` | +| **Thu, 11 April** | `gpu-hack-2019-04-11` | +| Thu, 11 April, → Fri, 12 April | `gpu-hack-nightly-2019-04-11` | +| **Fri, 12 April** | `gpu-hack-2019-04-12` | -For the Hackathon, a reservation is going to be created. +**Nightly Reservations**: The nightly reservations are setup such that the jobs which are scheduled at 18:00 of a given day are run; as soon as no more jobs are in the queue using the reservation, the specific reservation is released (Slurm option `PURGE_COMP`). -The MPI launcher on JUWELS is called `srun`. +Use the reservations with `--reservation NAME`. ### Interactive Jobs @@ -111,7 +73,7 @@ Launch batch jobs with `sbatch script.job`. An example follows. #SBATCH --error=gpu-err.%j # STD err #SBATCH --time=00:15:00 # Maximum wall time #SBATCH --partition=gpus # Partition name -#SBATCH --reservation=eurohack # Reservation name +#SBATCH --reservation=gpu-hack-2019-04-10 # Reservation name #SBATCH --gres=gpu:4 # Allocate resources @@ -124,3 +86,60 @@ srun ./gpu-prog # Singe program uses MPI, launch with srun * `squeue`: List all unfinished jobs * `squeue -u ME`: List unfinished jobs of user ME * `scancel ID`: Cancel a job with ID + +## JURON + +For additional hints on JURON's usage, see the man page on the system: `man juron` + +### Interactive Jobs (`-I`) + +Important commands: + +* Print host name (`-I`): `bsub -I hostname` +* Open interactive pseudo-terminal shell (`-tty`): `bsub -tty -Is /bin/bash` +* With GPU resources (`-R […]`): `bsub -R "rusage[ngpus_shared=1]" -tty -Is /bin/bash` +* Forward X, e.g. for *Visual Profiler* (`-XF`): `bsub -R "rusage[ngpus_shared=1]" -I -XF nvvp` + - Trouble with this? Make sure you have done the following<a name="xftrouble"></a> + + On your local machine, add the `id_train0XX` SSH key to your agent with `ssh-add id_train0XX` + + Connect to JURON forwarding your SSH agent (`ssh -A […]`) + + On JURON, verify that the system knows your `id_train0XX` key with `ssh-add -l` + + *Hint: Have a look at the **Creating Alias** part of the `Login.{md,pdf}` section of the documentation. Much better than creating aliases in your shell.* +* Use node in exclusive mode (`-x`): `bsub -x -I hostname` (*Please keep your exclusive jobs short*) + +Available queues: + +* `normal`: For batch compute jobs (max. 8 nodes; max. 12 h) +* `normal.i`: For interactive jobs (max. 2 nodes; max. 4 h) +* `vis`: For visualization jobs (max. 2 nodes; max 4h) – these jobs will use hosts `juronc01` through `juronc04` which run an X Server and have a 10 GBit/s Ethernet connection to external networks + +Further important parameters and options: + +* `-n 23`: Launch 23 tasks +* `-n 4 -R span[ptile=2]`: Of the 4 launched tasks, run only 2 on the same node + +### Batch Jobs + +The same flags as above can be used. A batch file is submitted via `bsub < file.job`. Example job file: + +``` +#!/bin/bash +#BSUB -J lsf-tst # Job name +#BSUB -o output.o%J # Job standard output +#BSUB -n 8 # Number of tasks +#BSUB -q short # Job queue +#BSUB -R "rusage[ngpus_shared=1]" # Allocate GPUs +#BSUB -R "span[ptile=4]" # Spawn 4 processes per node +#BSUB -a openmpi + +module load openmpi + +mpirun hostname +``` + +### Further Commands + +* `bhosts`: List available and busy hosts +* `bhosts -l juronc05`: List detailed information for host `juronc05` +* `bjobs`: List the user's current unfinished jobs +* `bjobs -u all`: Show all currently unfinished jobs +* `bkill ID`: Kill job with ID diff --git a/JURON.md b/JURON.md index 4e77583bfbb346e755ecd4706bb48863d1b90480..b0fee40b68e62d9e9d6a9d3b75447495961699af 100644 --- a/JURON.md +++ b/JURON.md @@ -1,6 +1,6 @@ # JURON -JURON is a 18 node POWER8NVL system which was just evaluated in Jülich in the course of a pre-commercial procurement for the Human Brain Project. Each node of JURON has two sockets with ten CPU cores each (adding up to a total of 160 hardware threads when including eight-fold multi-threading); each node hosts four Tesla P100 GPUs, where each pair of P100s is connected to one CPU socket with the fast NVLink interconnect. Also, the P100s of a pair are connected via NVLink. +JURON is a 18 node POWER8NVL system with tight interconnect between CPU and GPU. Each node of JURON has two sockets with 10 pcc64le CPU cores each (adding up to a total of 160 hardware threads when including 8-fold multi-threading); each node hosts 4 Tesla P100 GPUs, where each pair of P100s is connected to one CPU socket with the fast NVLink interconnect. Also, the P100s of a pair are connected via NVLink. ## Module System @@ -12,20 +12,19 @@ Software versions are included in the name of the module. Some packages (e.g. Op For the Hackathon of special interest are: -* CUDA module: `module load nvidia/cuda/8.0` +* CUDA module: `module load cuda/10.0.130` * GCC modules: - `module load gcc/5.4.0` - `module load gcc/6.3.0` - *GCC 4.8.5 is default* * PGI modules: - - `module load pgi/16.10` - - `module load pgi/17.1` + - `module load pgi/18.4` + - `module load pgi/17.10` * OpenMPI modules: - - `module load openmpi/1.10.2-pgi_16.10` - - `module load openmpi/1.10.2-pgi_17.1` - - `module load openmpi/2.0.2-gcc_5.4.0` + - `module load openmpi/3.1.3-gcc_5.4.0-cuda_10.0.130` + - `module load openmpi/2.1.2-pgi_18.4-cuda` * Score-P modules: - - `module load scorep/3.0-gcc5.4-ompi2.0.0-cuda8-papi5.5.0` + - TBD ## Batch System @@ -39,9 +38,19 @@ JURON uses LSF as its method of running jobs on the GPU-equipped compute nodes. ## File System -JURON und JURECA both share a common file system (*GPFS*). For working on both machines simultaneously, a dedicated folder for a build on each machine might be helpful. +All Jülich systems both share a file system (called *GPFS*). You have different `$HOME` directories for each. In addition, there are two more storage spaces available. Descriptions: -JURECA offers `$HOME` and `$WORK` for long-term and short-term (but somewhat faster) storage of data. JURON offers both locations as well, but without any difference in connection speed. +* `$HOME`: Only 5 GB available to have the most-important files +* `$PROJECT`: Plenty of space for all project members to share +* `$SCRATCH`: Plenty of temporary space! + +For the environment variables to map to the correct values, the project environment needs to be activated with + +```bash +jutil env activate -p training1908 -A training1908 +``` + +See also [the online description](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/NewUsageModel/UserAndProjectSpace.html?nn=2363700). ## Useful GCC Compiler Options diff --git a/pdf/Accounts.pdf b/pdf/Accounts.pdf index 3909701c6b5b9fd5a8470926ae7dfba3fe3459e8..9b1367bc2fdcc4b09a717237ede9a5a929a9a3e9 100644 Binary files a/pdf/Accounts.pdf and b/pdf/Accounts.pdf differ diff --git a/pdf/Batch-Systems.pdf b/pdf/Batch-Systems.pdf index 138c80656ebea2bbe5ccd86107b8ff9fb285513e..f0959343f91187b90e4b5c5401446a64a110d834 100644 Binary files a/pdf/Batch-Systems.pdf and b/pdf/Batch-Systems.pdf differ diff --git a/pdf/JURON.pdf b/pdf/JURON.pdf index 4d8bd058e9e80f4e6e42d2602e52ad9b80eb14ff..17d5d800c799fa694e7a7d9987472ce5c8382c96 100644 Binary files a/pdf/JURON.pdf and b/pdf/JURON.pdf differ diff --git a/pdf/JUWELS.pdf b/pdf/JUWELS.pdf index 175e1bfd7fe23b4bb1cfd02d0477aacde30d275d..89dadfe749628bd671bf5a7419cdb7aae9160822 100644 Binary files a/pdf/JUWELS.pdf and b/pdf/JUWELS.pdf differ diff --git a/pdf/More.pdf b/pdf/More.pdf index 30efe1465d0fc62b40a6e6f657de43c06d47ca3d..98e34633fea453e3e01ce64552853a033671113f 100644 Binary files a/pdf/More.pdf and b/pdf/More.pdf differ diff --git a/pdf/all.pdf b/pdf/all.pdf index b8a99ff1c6bd33953401e7ad813d823ccb53c9a2..a5088a98b4f857724b2fe35d14a2ac6d39ac12a8 100644 Binary files a/pdf/all.pdf and b/pdf/all.pdf differ