# Fiddling with CPU Mask for GPU/non-GPU Processes
Task: Utilize most cores of JUWELS Booster by associating a core with GPU affinity with remainder cores (without GPU affinity) within a rank. There should be 4 ranks per node; in each rank, 1 GPU process is launched and the remaining cores given to the CPU-only process.
This repo contains examples to generate some more advanced CPU masks for JUWELS Booster and associated tools. While I those aremade for JUWELS Booster, they can easily be ported to any other system.
## Usage
## Associated Tools
Launch `split_mask.sh` with 2 arguments. The first argument is the process which should not run on the GPU, the second argument is the process to be run on the GPU. The acording CPU masks are set, `CUDA_VISIBLE_DEVICES` is set as well. If only 1 argument is provided, it will be taken for both cases. No argument or >2 arguments will result in masks printed (for debugging).
*`mask-tools.py`: Maybe the most relevant tool here. Python CLI app with two sub-commands to a) generate a hexa-decimal mask for given CPU cores (also taking in ranges), and b) generate a binary representation of a hexa-decimal mask to quickly study the mask. See `mask-tools.py --help` for help and usage info.
*`omp_id.c` / `Makefile`: Simple C program to print MPI rank, master thread ID and OMP-parallel thread IDs; compile with the `Makefile`
*`combine-masks.py`: Takes CPU hex masks for individual NUMA domains and combines NUMA domains pairwise.
*`process_info.sh`: Similar to `omp_id.c`, but no OMP but GPU ID
*`get_close_gpu.sh`: Helper script needed by Example 2 just holding a list of GPUs with affinity to a NUMA domain
## Example 1
**Task:** 4 MPI ranks on a node, 1 rank spanning 3 NUMA domains. Process first launches a GPU kernel before opening up OMP parallel regions to keep the CPU cores busy. The GPU-dispatching master rank needs to be launched from a core with GPU affinity.
**Strategy:** Use `OMP_PLACES` to provide an explicit list of cores to the application. `OMP_PLACES` is a list of cores, as retrieved by `numactl -s`, but resorted such that the first core has GPU affinity.
**Usage:** Insert `put_to_first_core.sh` as a wrapper before the application, after Slurm.
_Very similar but not identical to previous example; also, this was made much before Example 1_
**Task:** Utilize most cores of JUWELS Booster by associating a core with GPU affinity with remainder cores (without GPU affinity) within a rank. There should be 4 ranks per node; in each rank, 1 GPU process is launched and the remaining cores given to the CPU-only process.
**Usage:** Launch `split_mask.sh` with 2 arguments. The first argument is the process which should not run on the GPU, the second argument is the process to be run on the GPU. The according CPU masks are set, `CUDA_VISIBLE_DEVICES` is set as well. If only 1 argument is provided, it will be taken for both cases. No argument or >2 arguments will result in masks printed (for debugging).