Modify training projects

b7eb6c70 · Ilya Zhukov · 34993f84 · b7eb6c70 · b7eb6c70 · b7eb6c70
Commit b7eb6c70 authored May 13, 2024 by Ilya Zhukov
--- a/docs/access.md
+++ b/docs/access.md
@@ -20,7 +20,7 @@ This is the case if
 - you have successfully [applied for computing time](https://www.fz-juelich.de/en/ias/jsc/systems/supercomputers/apply-for-computing-time) during one of our calls for project proposals and are now the principal investigator (PI) of your own project, or
 - you have gained access to a project either by being invited by the PI or project administrator (PA) or by being granted access upon requesting to join a project through JuDoor.
-We have created a computing time project for this course with a project ID of `training2334`.
+We have created a computing time project for this course with a project ID of `training2410`.
 To join the project, log in to [JuDoor](https://judoor.fz-juelich.de) and click *Join a project* under the *Projects* heading.
 Enter the project ID and, if you want to, a message to remind the PI/PA (one of the instructors) why you should be allowed to join the project.
 Afterwards the PI/PA will be automatically informed about your join request and can add you to the different systems available in the project.

--- a/docs/budgeting.md
+++ b/docs/budgeting.md
@@ -32,7 +32,7 @@ The comute time used for one job will be accounted by the following formula:
 Jobs that run on nodes equipped with GPUs are charged in the same way.
 Independent of the usage of the GPUs the available cores on the host CPU node are taken into account.
-Detailed information of each job can be found in KontView which is accessible via the button 'show extended statistics' for each project in [JuDoor](https://judoor.fz-juelich.de/projects/training2334/).
+Detailed information of each job can be found in KontView which is accessible via the button 'show extended statistics' for each project in [JuDoor](https://judoor.fz-juelich.de/projects/training2410/).
 Alternatively, you can execute the following command on the login nodes to query your CPU quota usage: `jutil user cpuquota`.
 Further information can be found in the "Accounting" chapter of the corresponding [System Documentation][System Documentation].

--- a/docs/environment.md
+++ b/docs/environment.md
@@ -30,10 +30,10 @@ For brevity's sake, one can also make one of the projects the "active project" a
 This can also be done through the `jutil` command:
 ```
-$ jutil env activate -p training2334 -A training2334
+$ jutil env activate -p training2410 -A training2410
 ```
-Now `training2334` is the active project.
+Now `training2410` is the active project.
 Any computational jobs will be accounted against its budget and the special file system locations associated with it can be reached through certain environment variables.
 More about that in the next section.
@@ -58,22 +58,22 @@ At least two directories are created for each project:
 Data projects have access to other storage locations, e.g. the tape based `ARCHIVE` for long term storage of results.
-The path of these directories is available as the value of environment variables of the form `<directory>_<project>`, e.g. `PROJECT_training2334` or `SCRATCH_training2334`.
+The path of these directories is available as the value of environment variables of the form `<directory>_<project>`, e.g. `PROJECT_training2410` or `SCRATCH_training2410`.
 If you have activated a project in the previous section, you will also have environment variables that are just `PROJECT` and `SCRATCH` that point to the respective directories of the active project.
-Print the contents of `PROJECT_training2334` and `PROJECT`:
+Print the contents of `PROJECT_training2410` and `PROJECT`:
 ```
-$ printenv PROJECT_training2334
+$ printenv PROJECT_training2410
-/p/project/training2334
+/p/project/training2410
 $ printenv PROJECT
-/p/project/training2334
+/p/project/training2410
 ```
 Change into that directory and see what is already there:
 ```
-$ cd $PROJECT_training2334
+$ cd $PROJECT_training2410
 $ ls
 ```

--- a/docs/running-jobs.md
+++ b/docs/running-jobs.md
@@ -37,14 +37,14 @@ If no resources are currently allocated, `srun` can infer from its command line
 After the associated commands have been run, the resources are relinquished and running further commands will have to ask for resources again.
 This one-shot mode can be useful when you want to interactively run a few quick jobs with varying sets of resources allocated for them.
 Run the `hostname` command to see how `srun` will run commands on different nodes than the log in nodes.
-On JURECA and JUSUF, use this command (Important: do not forget to replace `YYYYMMDD`, where `YYYY` and `MM` and `DD` are the current year and month and day in the Gregorian calendar, e.g. `20231121`):
+On JURECA and JUSUF, use this command (Important: do not forget to replace `YYYYMMDD`, where `YYYY` and `MM` and `DD` are the current year and month and day in the Gregorian calendar, e.g. `20240522`):
 [ER: should there be an explanation of what the hostname command is? I know people have forgotten to remove it before, and we're aiming this at people who don't know how to use a terminal, sometimes...]
 ```
 $ hostname
 jrlogin09.jureca
-$ srun -A training2334 --reservation hands-on-YYYYMMDD hostname
+$ srun -A training2410 --reservation hands-on-YYYYMMDD hostname
 srun: job 3472578 queued and waiting for resources
 srun: job 3472578 has been allocated resources
 jrc0454
@@ -58,7 +58,7 @@ To submit to JUWELS Cluster, you want to be logged in to the Cluster login nodes
 ```
 $ hostname
 jwlogin02.juwels
-$ srun -A training2334 --reservation hands-on-cluster-YYYYMMDD hostname
+$ srun -A training2410 --reservation hands-on-cluster-YYYYMMDD hostname
 srun: job 9792359 queued and waiting for resources
 srun: job 9792359 has been allocated resources
 jwc06n213.juwels
@@ -69,7 +69,7 @@ To submit to JUWELS Booster, you want to be logged in to the Booster login nodes
 ```
 $ hostname
 jwlogin24.juwels
-$ srun -A training2334 --reservation hands-on-booster-YYYYMMDD --gres gpu:4 hostname
+$ srun -A training2410 --reservation hands-on-booster-YYYYMMDD --gres gpu:4 hostname
 srun: job 4575092 queued and waiting for resources
 srun: job 4575092 has been allocated resources
 jwb0053.juwels
@@ -85,7 +85,7 @@ $ srun <srun options...> <program> <program options...>
 Above we have seen four `srun` options:
- `-A` (short for `--account`) to charge the resources consumed by the computation to the budget allotted to this course (if you have used `jutil env activate -A training2334` earlier on, you do not need this).
+- `-A` (short for `--account`) to charge the resources consumed by the computation to the budget allotted to this course (if you have used `jutil env activate -A training2410` earlier on, you do not need this).
 :::info
@@ -115,7 +115,7 @@ For the `<program>` we used `hostname` with no arguments of its own.
 To run more parallel instances of a program, increase the number of Slurm *tasks* using the `-n` option to `srun`:
 ```
-$ srun --label -A training2334 --reservation hands-on-cluster-YYYYMMDD -n 10 hostname
+$ srun --label -A training2410 --reservation hands-on-cluster-YYYYMMDD -n 10 hostname
 srun: job 3472812 queued and waiting for resources
 srun: job 3472812 has been allocated resources
 8: jwc00n002.juwels
@@ -141,7 +141,7 @@ Note also the `--label` option to `srun` which prefixes every line of output by
 Running more tasks than will fit on a single node will allocate two nodes and split the tasks between nodes:
 ```
-$ srun --label -A training2334 --reservation hands-on-cluster-YYYYMMDD -n 100 hostname
+$ srun --label -A training2410 --reservation hands-on-cluster-YYYYMMDD -n 100 hostname
 srun: job 3473040 queued and waiting for resources
 srun: job 3473040 has been allocated resources
 0: jwc00n007.juwels
@@ -157,7 +157,7 @@ Running over multiple nodes without intending to is also likely to degrade perfo
 You can now also use `srun` to run the `hellompi` program introduced in the previous section on deploying custom software:
 ```
-$ srun -A training2334 --reservation hands-on-cluster-YYYYMMDD -n 5 ./hellompi
+$ srun -A training2410 --reservation hands-on-cluster-YYYYMMDD -n 5 ./hellompi
 srun: job 3471349 queued and waiting for resources
 srun: job 3471349 has been allocated resources
 hello from process 4 of 5
@@ -197,7 +197,7 @@ However, since the number of CPU cores is always rounded up to the next multiple
 Using the `-N` command line argument, you can request a number of nodes from the resource manager (remember to specify `--gres gpu:4` for JUWELS Booster):
 ```
-$ salloc -A training2334 --reservation hands-on-cluster-YYYYMMDD -N 1
+$ salloc -A training2410 --reservation hands-on-cluster-YYYYMMDD -N 1
 salloc: Pending job allocation 3475519
 salloc: job 3475519 queued and waiting for resources
 salloc: job 3475519 has been allocated resources
@@ -272,7 +272,7 @@ And enter the following script:
 ```sh
 #!/bin/bash
-#SBATCH --account=training2334
+#SBATCH --account=training2410
 #SBATCH --reservation=hands-on-cluster-YYYYMMDD
 #SBATCH --nodes=2
 #SBATCH --cpus-per-task=1
@@ -337,7 +337,7 @@ By default, Slurm assumes that the processes you create are single threaded and
 Allocate a node for playing around with this mechanism:
 ```
-$ salloc -A training2334 --reservation hands-on-cluster-YYYYMMDD -N 1
+$ salloc -A training2410 --reservation hands-on-cluster-YYYYMMDD -N 1
 salloc: Pending job allocation 3499694
 salloc: job 3499694 queued and waiting for resources
 salloc: job 3499694 has been allocated resources

--- a/docs/using-gpus.md
+++ b/docs/using-gpus.md
@@ -18,9 +18,9 @@ The samples directory of the CUDA installation has a number of exemple codes you
 ```
 $ module load NVHPC ParaStationMPI MPI-settings/CUDA
-$ cd $PROJECT_training2334/$USER
+$ cd $PROJECT_training2410/$USER
 $ git clone https://github.com/NVIDIA/cuda-samples.git
-$ cd $PROJECT_training2334/$USER/cuda-samples/Samples/0_Introduction/simpleMPI
+$ cd $PROJECT_training2410/$USER/cuda-samples/Samples/0_Introduction/simpleMPI
 $ make
 /p/software/jurecadc/stages/2024/software/psmpi/5.9.2-1-NVHPC-23.7-CUDA-12/bin/mpicxx -I../../../Common    -o simpleMPI_mpi.o -c simpleMPI.cpp
 /p/software/jurecadc/stages/2024/software/CUDA/12/bin/nvcc -ccbin g++ -I../../../Common  -m64    --threads 0 --std=c++11 -Xcompiler -fPIE -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90,code=compute_90 -o simpleMPI.o -c simpleMPI.cu
@@ -34,7 +34,7 @@ There should now be an executable called `simpleMPI` inside the `simpleMPI` dire
 To run the program, use `srun` like before:
 ```
-$ srun -A training2334 -p <gpu partition> --gres gpu:4 -N 1 -n 4 ./simpleMPI
+$ srun -A training2410 -p <gpu partition> --gres gpu:4 -N 1 -n 4 ./simpleMPI
 [...]
 Running on 4 nodes
 Average of square roots is: 0.667305
@@ -61,7 +61,7 @@ After logging into the compute node, through `sgoto`, we show the usage of the G
 Afterwards we log out from the compute node, put the executed `srun` command from the background to the foreground with `fg` and cancel this execution by hitting `CTRL-C` a couple of times until the normal command line is available.
 ```
-$ srun -N 1 -n 1 -t 00:10:00 -A training2334 -p develbooster --gres=gpu:4 sleep 600 &
+$ srun -N 1 -n 1 -t 00:10:00 -A training2410 -p develbooster --gres=gpu:4 sleep 600 &
 [1] 25114
 srun: job 5535332 queued and waiting for resources
 srun: job 5535332 has been allocated resources
@@ -102,7 +102,7 @@ Thu May 12 08:49:34 2022
 $ exit
 logout
 $ fg
-srun -N 1 -n 1 -t 00:10:00 -A training2334 -p develbooster --gres=gpu:4 sleep 500
+srun -N 1 -n 1 -t 00:10:00 -A training2410 -p develbooster --gres=gpu:4 sleep 500
 ^Csrun: sending Ctrl-C to StepId=5535332.0
 srun: forcing job termination
 srun: Job step aborted: Waiting up to 6 seconds for job step to finish.
@@ -121,7 +121,7 @@ Let us investigate further on this with a practical example.
 First, we prepare a device query example.
 ```
-$ cd $PROJECT_training2334/$USER/cuda-samples/Samples/1_Utilities/deviceQueryDrv/
+$ cd $PROJECT_training2410/$USER/cuda-samples/Samples/1_Utilities/deviceQueryDrv/
 make
 /p/software/jurecadc/stages/2024/software/CUDA/12/bin/nvcc -ccbin g++ -I../../../Common  -m64    --threads 0 --std=c++11 -gencode arch=compute_50,code=compute_50 -o deviceQueryDrv.o -c deviceQueryDrv.cpp
 /p/software/jurecadc/stages/2024/software/CUDA/12/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_50,code=compute_50 -o deviceQueryDrv deviceQueryDrv.o  -L/p/software/jurecadc/stages/2024/software/CUDA/12/lib64/stubs -lcuda
@@ -140,7 +140,7 @@ The following sbatch script `gpuAffinityTest.sbatch` written for the JUWELS Boos
 #SBATCH --time=00:01:00
 #SBATCH --partition=develbooster
 #SBATCH --gres=gpu:4
-#SBATCH -A training2334
+#SBATCH -A training2410
 module load CUDA NVHPC ParaStationMPI MPI-settings/CUDA
@@ -328,7 +328,7 @@ It is worth it mentioning that you should use the same modules for compilation w
 ```
 $ module load NVHPC CUDA OpenMPI
 $ mpicxx -O0 -I$CUDA_HOME/include -L$CUDA_HOME/lib64 -lcudart -lcuda mpiBroadcasting.cpp
-$ srun -N 2 -n 8 -t 01:00:00 -A training2334 -p booster --gres=gpu:4 ./a.out
+$ srun -N 2 -n 8 -t 01:00:00 -A training2410 -p booster --gres=gpu:4 ./a.out
 Broadcasting to all host memories took 4.526835 seconds.
 Broadcasting to all GPUs took 7.481972 seconds with intermediate copy to host memory.
 Broadcasting to all GPUs took 2.625439 seconds.