... | ... | @@ -14,10 +14,7 @@ In this tutorial we will use JSC's [JURECA-DC](https://www.fz-juelich.de/en/news |
|
|
|
|
|
For this tutorial, you already have a JSC account and ssh keys ready, so we will skip these steps. Therefore, to access JURECA-DC we need to `ssh` into it.
|
|
|
|
|
|
***TO-DO: SHALL WE CREATE SOME PERSISTING SSH CONFIG TO USE?***
|
|
|
|
|
|
```ssh jureca```
|
|
|
|
|
|
You can follow some basic instructions [here](https://gitlab.jsc.fz-juelich.de/detect/detect_z03_z04/irtg_hpc/-/wikis/Hands-on:-first-contact-with-HPC#login-to-jsc-hpc-system-jureca-dc).
|
|
|
|
|
|
# 2. Get SERGHEI from a repository
|
|
|
|
... | ... | @@ -31,8 +28,7 @@ When you first login, your current directory is your home directory. You could g |
|
|
cd /p/project/training2226
|
|
|
```
|
|
|
|
|
|
This is the directory where all of our project's resources should be located. It has a much larger storage quota than your home directory.
|
|
|
To keep things tidy in here, first create a directory for you, with your user name.
|
|
|
This is the directory where all of our project's resources should be located. It has a much larger storage quota than your home directory. To keep things tidy in here, first create a directory for you, with your user name.
|
|
|
|
|
|
```
|
|
|
mkdir $USER
|
... | ... | @@ -41,6 +37,7 @@ mkdir $USER |
|
|
`mkdir` creates a new directory with the name returned by the environment variable `USER`, which returns your username. You can of course directly type your username.
|
|
|
|
|
|
Let's now move into this new directory.
|
|
|
|
|
|
```
|
|
|
cd $USER
|
|
|
```
|
... | ... | @@ -56,6 +53,7 @@ To get the source code, simply **clone** the repository here |
|
|
You can now navigate to the SERGHEI root directory `cd serghei`
|
|
|
|
|
|
Check the path to the current directory using the `pwd` command. This should return be the absolute path to the local working copy of the repository (i.e., the `serghei` directory). This should look something like
|
|
|
|
|
|
```
|
|
|
/p/project/training2226/username1/serghei
|
|
|
```
|
... | ... | @@ -80,7 +78,7 @@ echo $SERGHEIPATH |
|
|
|
|
|
A way setting this in a persisting way is to define it in your local `.bash_profile` file. Edit the `.bash_profile` file in your home folder
|
|
|
|
|
|
`vim ~/.bash_profile`
|
|
|
`vim \~/.bash_profile`
|
|
|
|
|
|
and include a new line which includes the path to `serghei` (which you can copy from the `pwd` output).
|
|
|
|
... | ... | @@ -129,7 +127,7 @@ If all goes well, we now should have the environment ready. You can check which |
|
|
|
|
|
### Getting Kokkos
|
|
|
|
|
|
Kokkos is available in GitHub at [https://github.com/kokkos/kokkos.git](https://github.com/kokkos/kokkos.git). To clone the Kokkos repository, follow a analogous procedure to what was done to clone SERGHEI. Make sure you that Kokkos is cloned inside of the `serghei` directory.
|
|
|
Kokkos is available in GitHub at https://github.com/kokkos/kokkos.git. To clone the Kokkos repository, follow a analogous procedure to what was done to clone SERGHEI. Make sure you that Kokkos is cloned inside of the `serghei` directory.
|
|
|
|
|
|
Usually dependencies need to be built in the system (so that they are available locally as a binary, or through a module). For SERGHEI, Kokkos does not need to be built beforehand. It will be built when we build SERGHEI, if the path to Kokkos is properly set in the next step.
|
|
|
|
... | ... | @@ -142,7 +140,6 @@ cd $SERGHEIPATH/src |
|
|
```
|
|
|
|
|
|
2. Open the `Makefile`. This is a script which controls how SERGHEI is compiled. Note that there are some definitions early on. In particular `KOKKOS_PATH` and `KOKKOS_SRC_PATH` are defined as a function of `HOME`. Note that the `Makefile` assumes that `KOKKOS_PATH` will be in `HOME` path. `HOME` is an environment variable which contain the path to your home directory. Exit the file and check what your HOME environment variable contains.
|
|
|
|
|
|
3. Now we will use `make` to compile and build. Running `make` attempts to read the `Makefile`, which you can see exists in this directory. If you try to run `make` in a different directory which does not contain a `Makefile` you will get an error. `make` support parallel threads, so that the compilation is faster. For example
|
|
|
|
|
|
```bash
|
... | ... | @@ -178,7 +175,7 @@ The command above should run a dam break simulation for 10 seconds. If all works |
|
|
|
|
|
If something goes wrong, check the paths, make sure you are in the right directory, and make sure that `serghei` was properly built (from the previous step).
|
|
|
|
|
|
**IMPORTANT**: Bear in mind that this is **not the correct** way to use the HPC system. By running the command above we ran a command in the ***login node*** of the HPC system, which is not meant to run simulations. We have only done this to quickly test if our build was ok, with an extremely small simulation, on a single CPU. To properly run a simulation we will use `slurm` and `sbatch` scripts in the next part of this tutorial.
|
|
|
**IMPORTANT**: Bear in mind that this is **not the correct** way to use the HPC system. By running the command above we ran a command in the **_login node_** of the HPC system, which is not meant to run simulations. We have only done this to quickly test if our build was ok, with an extremely small simulation, on a single CPU. To properly run a simulation we will use `slurm` and `sbatch` scripts in the next part of this tutorial.
|
|
|
|
|
|
# 6. Run a case using sbatch
|
|
|
|
... | ... | @@ -195,7 +192,8 @@ A minimal `sbatch` script for SERGHEI looks like this below. |
|
|
#SBATCH --ntasks-per-node=1
|
|
|
#SBATCH --cpus-per-task=##how_many?##
|
|
|
#SBATCH --nodes=1
|
|
|
#SBATCH --partition=##partition_to_use##
|
|
|
#SBATCH --partition=##partition_name##
|
|
|
#SBATCH --reservation=##reservation_name##
|
|
|
|
|
|
##### until here, we have configured the HPC resources #####
|
|
|
##### now we configure some additional goodies #####
|
... | ... | @@ -216,19 +214,22 @@ rm -rf $casePath/$OUTDIR |
|
|
|
|
|
## launch the job in the HPC system
|
|
|
srun $SERGHEIPATH/bin/serghei $casePath/input/ $casePath/$OUTDIR/ $OMP_NUM_THREADS
|
|
|
|
|
|
```
|
|
|
|
|
|
The first block (with the `SBATCH` keyword) informs the system of the HPC resources we want to use. It requires an account name to which the compute time will be billed against, a maximum job time, how many tasks we want per node, how many nodes we want and how many CPUs we wish to use per task. Finally, since the HPC system is divided into **partitions** of nodes, we must specify which one we will use.
|
|
|
|
|
|
For the problem we will run here, we will use:
|
|
|
|
|
|
- the account `training2226`
|
|
|
- the `dc-cpu-devel` partition
|
|
|
- the `dc-cpu` partition
|
|
|
- use all of the CPUs in a single node (e.g., 64 in JURECA-DC)
|
|
|
- only use one task, as all of our computational domain will be in the same node.
|
|
|
- for our training project we have a **reservation**. That is, nodes that are reserved for us to use during the training, and not available to other users. We will use `training2226-cpu`
|
|
|
|
|
|
Configure the sbatch script with this information and save it in some reasonable place where you can find it (you can call it something like `my_sbatch_script.job`. Remember to also update the `casePath` in the script to where the `input` folder lies. Hint: the `casePath` should not include the `input` itself (i.e., it is the parent directory of `input`).
|
|
|
|
|
|
Finally, we can run this script and launch the job.
|
|
|
|
|
|
```bash
|
|
|
sbatch my_sbatch_script.job
|
|
|
```
|
... | ... | @@ -241,7 +242,6 @@ squeue -u $USER |
|
|
|
|
|
which uses the `USER` environment variable (your username effectively) to query the system for active jobs.
|
|
|
|
|
|
If things go well, you should get an output folder and inside it a `log.out` file.
|
|
|
You will also get a `slurm` job report showing everything that happened behind the scenes and not shown in the terminal while the job was launched in the compute nodes. Inspecting the contents of this file which also show if there was an error or a successful run. If you had errors, try troubleshooting through them.
|
|
|
If things go well, you should get an output folder and inside it a `log.out` file. You will also get a `slurm` job report showing everything that happened behind the scenes and not shown in the terminal while the job was launched in the compute nodes. Inspecting the contents of this file which also show if there was an error or a successful run. If you had errors, try troubleshooting through them.
|
|
|
|
|
|
We will use this sbatch script as a base for further tutorials, so remember where you keep it. You can later create copies of it. |
|
|
\ No newline at end of file |