Skip to content
Snippets Groups Projects
Commit cbbb922c authored by Andreas Herten's avatar Andreas Herten
Browse files

Add info about reservation

parent 5828cfea
No related branches found
No related tags found
No related merge requests found
...@@ -24,7 +24,7 @@ PDFS = $(SRC:.md=.pdf) ...@@ -24,7 +24,7 @@ PDFS = $(SRC:.md=.pdf)
all: $(PDFS) all.pdf all: $(PDFS) all.pdf
%.pdf: %.md $(MAKEFILE) %.pdf: %.md $(MAKEFILE)
$(LC) $(LCFLAGS) -o pdf/$@ $< sed "s/→/$$\\\rightarrow$$/" $< | $(LC) $(LCFLAGS) -o pdf/$@
all.pdf: LCFLAGS += --toc all.pdf: LCFLAGS += --toc
all.pdf: LCFLAGS += --variable title:"GPU Eurohack 2019 User Guide" all.pdf: LCFLAGS += --variable title:"GPU Eurohack 2019 User Guide"
......
# Accounts # Accounts
The GPU Hackathon will use the Jülich supercomputer *JUWELS*. As a backup, we prepare access to the *JURON* machine. Both are centrally managed.
## Account Creation ## Account Creation
User management for the supercomputers in Jülich is done centrally via the JuDOOR portal. Hackathon attendees need to signup for a JuDOOR account and then apply to be added to the Hackathon project `training1908`. This link will let you join the project: User management for the supercomputers in Jülich is done centrally via the JuDOOR portal. Hackathon attendees need to signup for a JuDOOR account and then apply to be added to the Hackathon project `training1908`. This link will let you join the project:
[https://dspserv.zam.kfa-juelich.de/judoor/projects/join/TRAINING1908](https://dspserv.zam.kfa-juelich.de/judoor/projects/join/TRAINING1908) [https://dspserv.zam.kfa-juelich.de/judoor/projects/join/TRAINING1908](https://dspserv.zam.kfa-juelich.de/judoor/projects/join/TRAINING1908)
Once you are in the project, you need to agree to the usage agreements of `JUWELS` and `JUWELS GPU`. Once you are in the project, you need to agree to the usage agreements of `JUWELS`, `JUWELS GPU`, and `JURON`.
After that, you can upload your SSH public key via the »Manage SSH-keys« link. *(New to SSH? See for example [this help at Github](https://help.github.com/en/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent).)** After that, you can upload your SSH public key via the »Manage SSH-keys« link. *(New to SSH? See for example [this help at Github](https://help.github.com/en/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent).)**
...@@ -18,11 +20,13 @@ Please login to JUWELS via SSH ...@@ -18,11 +20,13 @@ Please login to JUWELS via SSH
ssh name1@juwels.fz-juelich.de ssh name1@juwels.fz-juelich.de
``` ```
For JURON, choose `juron.fz-juelich.de` as the hostname.
In case you are using PuTTY on Windows, maybe see [this external tutorial](https://devops.profitbricks.com/tutorials/use-ssh-keys-with-putty-on-windows/#use-existing-public-and-private-keys) . In case you are using PuTTY on Windows, maybe see [this external tutorial](https://devops.profitbricks.com/tutorials/use-ssh-keys-with-putty-on-windows/#use-existing-public-and-private-keys) .
## Environment ## Environment
One of the first steps after login should be to activate the environment for the GPU Hackathon using `jtuil`: One of the first steps after login should be to activate the environment for the GPU Hackathon using `jutil`:
```bash ```bash
jutil env activate -p training1908 -A training1908 jutil env activate -p training1908 -A training1908
......
# Batch Systems # Batch Systems
JURON and JURECA have different batch systems: JURON runs LSF, JURECA runs Slurm. This document tries to summarize the basic information needed to run jobs on the systems. JUWELS and JURON have different batch systems: JUWELS runs Slurm, JURON runs LSF. This document tries to summarize the basic information needed to run jobs on the systems.
## JURON
For additional hints on JURON's usage, see the man page on the system: `man juron`
### Interactive Jobs (`-I`)
Important commands:
* Print host name (`-I`): `bsub -I hostname`
* Open interactive pseudo-terminal shell (`-tty`): `bsub -tty -Is /bin/bash`
* With GPU resources (`-R […]`): `bsub -R "rusage[ngpus_shared=1]" -tty -Is /bin/bash`
* Forward X, e.g. for *Visual Profiler* (`-XF`): `bsub -R "rusage[ngpus_shared=1]" -I -XF nvvp`
- Trouble with this? Make sure you have done the following<a name="xftrouble"></a>
+ On your local machine, add the `id_train0XX` SSH key to your agent with `ssh-add id_train0XX`
+ Connect to JURON forwarding your SSH agent (`ssh -A […]`)
+ On JURON, verify that the system knows your `id_train0XX` key with `ssh-add -l`
+ *Hint: Have a look at the **Creating Alias** part of the `Login.{md,pdf}` section of the documentation. Much better than creating aliases in your shell.*
* Use node in exclusive mode (`-x`): `bsub -x -I hostname` (*Please keep your exclusive jobs short*)
Available queues:
* `normal`: For batch compute jobs (max. 8 nodes; max. 12 h)
* `normal.i`: For interactive jobs (max. 2 nodes; max. 4 h)
* `vis`: For visualization jobs (max. 2 nodes; max 4h) – these jobs will use hosts `juronc01` through `juronc04` which run an X Server and have a 10 GBit/s Ethernet connection to external networks
Further important parameters and options:
* `-n 23`: Launch 23 tasks
* `-n 4 -R span[ptile=2]`: Of the 4 launched tasks, run only 2 on the same node
### Batch Jobs
The same flags as above can be used. A batch file is submitted via `bsub < file.job`. Example job file:
```
#!/bin/bash
#BSUB -J lsf-tst # Job name
#BSUB -o output.o%J # Job standard output
#BSUB -n 8 # Number of tasks
#BSUB -q short # Job queue
#BSUB -R "rusage[ngpus_shared=1]" # Allocate GPUs
#BSUB -R "span[ptile=4]" # Spawn 4 processes per node
#BSUB -a openmpi
module load openmpi ## JUWELS
mpirun hostname Documentation for JUWELS's batch system can be found [online](https://apps.fz-juelich.de/jsc/hps/juwels/batchsystem.html). JUWELS uses Slurm which has its own MPI launcher, called `srun`.
```
### Further Commands For the Hackathon, reservations are available for each day; please use these reservations, otherwise you will be stuck in the queue for years (JUWELS's GPU nodes are very popular).
* `bhosts`: List available and busy hosts ### Reservation Overview
* `bhosts -l juronc05`: List detailed information for host `juronc05`
* `bjobs`: List the user's current unfinished jobs
* `bjobs -u all`: Show all currently unfinished jobs
* `bkill ID`: Kill job with ID
## JUWELS There are large reservations during the active working hours of the Hackathon (9:00 to 18:00) and smaller reservations during the night (18:00 to 9:00). Please see the note on the night reservations below!
Documentation for JUWELS' batch system can be found [online](https://apps.fz-juelich.de/jsc/hps/juwels/batchsystem.html). | Date | Reservation Name |
|--------------------------------|------------------------------|
| **Mon, 8 April** | `gpu-hack-2019-04-08` |
| Mon, 8 April, → Tue, 9 April | `gpu-hack-nightly-2019-04-08` |
| **Tue, 9 April** | `gpu-hack-2019-04-09` |
| Tue, 9 April, → Wed, 10 April | `gpu-hack-nightly-2019-04-09` |
| **Wed, 10 April** | `gpu-hack-2019-04-10` |
| Wed, 10 April, → Thu, 11 April | `gpu-hack-nightly-2019-04-10` |
| **Thu, 11 April** | `gpu-hack-2019-04-11` |
| Thu, 11 April, → Fri, 12 April | `gpu-hack-nightly-2019-04-11` |
| **Fri, 12 April** | `gpu-hack-2019-04-12` |
For the Hackathon, a reservation is going to be created. **Nightly Reservations**: The nightly reservations are setup such that the jobs which are scheduled at 18:00 of a given day are run; as soon as no more jobs are in the queue using the reservation, the specific reservation is released (Slurm option `PURGE_COMP`).
The MPI launcher on JUWELS is called `srun`. Use the reservations with `--reservation NAME`.
### Interactive Jobs ### Interactive Jobs
...@@ -111,7 +73,7 @@ Launch batch jobs with `sbatch script.job`. An example follows. ...@@ -111,7 +73,7 @@ Launch batch jobs with `sbatch script.job`. An example follows.
#SBATCH --error=gpu-err.%j # STD err #SBATCH --error=gpu-err.%j # STD err
#SBATCH --time=00:15:00 # Maximum wall time #SBATCH --time=00:15:00 # Maximum wall time
#SBATCH --partition=gpus # Partition name #SBATCH --partition=gpus # Partition name
#SBATCH --reservation=eurohack # Reservation name #SBATCH --reservation=gpu-hack-2019-04-10 # Reservation name
#SBATCH --gres=gpu:4 # Allocate resources #SBATCH --gres=gpu:4 # Allocate resources
...@@ -124,3 +86,60 @@ srun ./gpu-prog # Singe program uses MPI, launch with srun ...@@ -124,3 +86,60 @@ srun ./gpu-prog # Singe program uses MPI, launch with srun
* `squeue`: List all unfinished jobs * `squeue`: List all unfinished jobs
* `squeue -u ME`: List unfinished jobs of user ME * `squeue -u ME`: List unfinished jobs of user ME
* `scancel ID`: Cancel a job with ID * `scancel ID`: Cancel a job with ID
## JURON
For additional hints on JURON's usage, see the man page on the system: `man juron`
### Interactive Jobs (`-I`)
Important commands:
* Print host name (`-I`): `bsub -I hostname`
* Open interactive pseudo-terminal shell (`-tty`): `bsub -tty -Is /bin/bash`
* With GPU resources (`-R […]`): `bsub -R "rusage[ngpus_shared=1]" -tty -Is /bin/bash`
* Forward X, e.g. for *Visual Profiler* (`-XF`): `bsub -R "rusage[ngpus_shared=1]" -I -XF nvvp`
- Trouble with this? Make sure you have done the following<a name="xftrouble"></a>
+ On your local machine, add the `id_train0XX` SSH key to your agent with `ssh-add id_train0XX`
+ Connect to JURON forwarding your SSH agent (`ssh -A […]`)
+ On JURON, verify that the system knows your `id_train0XX` key with `ssh-add -l`
+ *Hint: Have a look at the **Creating Alias** part of the `Login.{md,pdf}` section of the documentation. Much better than creating aliases in your shell.*
* Use node in exclusive mode (`-x`): `bsub -x -I hostname` (*Please keep your exclusive jobs short*)
Available queues:
* `normal`: For batch compute jobs (max. 8 nodes; max. 12 h)
* `normal.i`: For interactive jobs (max. 2 nodes; max. 4 h)
* `vis`: For visualization jobs (max. 2 nodes; max 4h) – these jobs will use hosts `juronc01` through `juronc04` which run an X Server and have a 10 GBit/s Ethernet connection to external networks
Further important parameters and options:
* `-n 23`: Launch 23 tasks
* `-n 4 -R span[ptile=2]`: Of the 4 launched tasks, run only 2 on the same node
### Batch Jobs
The same flags as above can be used. A batch file is submitted via `bsub < file.job`. Example job file:
```
#!/bin/bash
#BSUB -J lsf-tst # Job name
#BSUB -o output.o%J # Job standard output
#BSUB -n 8 # Number of tasks
#BSUB -q short # Job queue
#BSUB -R "rusage[ngpus_shared=1]" # Allocate GPUs
#BSUB -R "span[ptile=4]" # Spawn 4 processes per node
#BSUB -a openmpi
module load openmpi
mpirun hostname
```
### Further Commands
* `bhosts`: List available and busy hosts
* `bhosts -l juronc05`: List detailed information for host `juronc05`
* `bjobs`: List the user's current unfinished jobs
* `bjobs -u all`: Show all currently unfinished jobs
* `bkill ID`: Kill job with ID
# JURON # JURON
JURON is a 18 node POWER8NVL system which was just evaluated in Jülich in the course of a pre-commercial procurement for the Human Brain Project. Each node of JURON has two sockets with ten CPU cores each (adding up to a total of 160 hardware threads when including eight-fold multi-threading); each node hosts four Tesla P100 GPUs, where each pair of P100s is connected to one CPU socket with the fast NVLink interconnect. Also, the P100s of a pair are connected via NVLink. JURON is a 18 node POWER8NVL system with tight interconnect between CPU and GPU. Each node of JURON has two sockets with 10 pcc64le CPU cores each (adding up to a total of 160 hardware threads when including 8-fold multi-threading); each node hosts 4 Tesla P100 GPUs, where each pair of P100s is connected to one CPU socket with the fast NVLink interconnect. Also, the P100s of a pair are connected via NVLink.
## Module System ## Module System
...@@ -12,20 +12,19 @@ Software versions are included in the name of the module. Some packages (e.g. Op ...@@ -12,20 +12,19 @@ Software versions are included in the name of the module. Some packages (e.g. Op
For the Hackathon of special interest are: For the Hackathon of special interest are:
* CUDA module: `module load nvidia/cuda/8.0` * CUDA module: `module load cuda/10.0.130`
* GCC modules: * GCC modules:
- `module load gcc/5.4.0` - `module load gcc/5.4.0`
- `module load gcc/6.3.0` - `module load gcc/6.3.0`
- *GCC 4.8.5 is default* - *GCC 4.8.5 is default*
* PGI modules: * PGI modules:
- `module load pgi/16.10` - `module load pgi/18.4`
- `module load pgi/17.1` - `module load pgi/17.10`
* OpenMPI modules: * OpenMPI modules:
- `module load openmpi/1.10.2-pgi_16.10` - `module load openmpi/3.1.3-gcc_5.4.0-cuda_10.0.130`
- `module load openmpi/1.10.2-pgi_17.1` - `module load openmpi/2.1.2-pgi_18.4-cuda`
- `module load openmpi/2.0.2-gcc_5.4.0`
* Score-P modules: * Score-P modules:
- `module load scorep/3.0-gcc5.4-ompi2.0.0-cuda8-papi5.5.0` - TBD
## Batch System ## Batch System
...@@ -39,9 +38,19 @@ JURON uses LSF as its method of running jobs on the GPU-equipped compute nodes. ...@@ -39,9 +38,19 @@ JURON uses LSF as its method of running jobs on the GPU-equipped compute nodes.
## File System ## File System
JURON und JURECA both share a common file system (*GPFS*). For working on both machines simultaneously, a dedicated folder for a build on each machine might be helpful. All Jülich systems both share a file system (called *GPFS*). You have different `$HOME` directories for each. In addition, there are two more storage spaces available. Descriptions:
JURECA offers `$HOME` and `$WORK` for long-term and short-term (but somewhat faster) storage of data. JURON offers both locations as well, but without any difference in connection speed. * `$HOME`: Only 5 GB available to have the most-important files
* `$PROJECT`: Plenty of space for all project members to share
* `$SCRATCH`: Plenty of temporary space!
For the environment variables to map to the correct values, the project environment needs to be activated with
```bash
jutil env activate -p training1908 -A training1908
```
See also [the online description](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/NewUsageModel/UserAndProjectSpace.html?nn=2363700).
## Useful GCC Compiler Options ## Useful GCC Compiler Options
......
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment