|
|
[[_TOC_]]
|
|
|
|
|
|
# Usage
|
|
|
Linktest has to be started in parallel, with an even number of proccesses for example using `srun --ntasks 2 linktest.` You can control its execution via the following command-line arguments:
|
|
|
LinkTest has to be started in parallel, with an even number of proccesses for example using `srun --ntasks 2 linktest.` You can control its execution via the following command-line arguments:
|
|
|
|
|
|
`-h` or `--help`: Prints a help message similar to the following:
|
|
|
You can check the usage via `linktest -h` (even without srun), which should look similar to this
|
... | ... | @@ -12,7 +12,7 @@ Usage : linktest [options] |
|
|
Possible options (default values in parathesis):
|
|
|
|
|
|
-h/--help Print this help message and exit
|
|
|
-v/--version Print Linktest version and exit
|
|
|
-v/--version Print LinkTest version and exit
|
|
|
-m/--mode VAL Transport Layer to be used [REQUIRED]*
|
|
|
-w/--num-warmup-messages VAL Number of warm-up messages [REQUIRED]
|
|
|
-n/--num-messages VAL Number of messages [REQUIRED]
|
... | ... | @@ -21,8 +21,8 @@ Possible options (default values in parathesis): |
|
|
--no-sion-file Do not write data to sion file (0)
|
|
|
--parallel-sion-file Write data SION file in parallel (0)
|
|
|
--num-slowest VAL Number of slowest pairs to be retested (10)
|
|
|
--min-iterations VAL Linktest repeats for at least <min_iterations> (1)
|
|
|
--min-runtime VAL Linktest runs for at least <min_runtime> seconds communication time (0.0)
|
|
|
--min-iterations VAL LinkTest repeats for at least <min_iterations> (1)
|
|
|
--min-runtime VAL LinkTest runs for at least <min_runtime> seconds communication time (0.0)
|
|
|
--memory_buffer_allocator VAL Allocator type for memory (DEFAULT)
|
|
|
--all-to-all Additionally perform MPI all-to-all tests (0)
|
|
|
--unidirectional Perform unidirectional tests (0)
|
... | ... | @@ -41,15 +41,15 @@ Possible options (default values in parathesis): |
|
|
Alternatively to --mode, the transport layer can be defined by using linktest.LAYER
|
|
|
or setting environment variable LINKTEST_VCLUSTER_IMPL
|
|
|
```
|
|
|
where `<<<VERSION>>>` is the three part version of Linktest executable and `<<<SUPPORTED COMMUNICATION APIs>>>` is a list of support communication APIs/Layers. This option supersedes all others. When executing Linktest with this command-line option it does not need to be run in parallel.
|
|
|
where `<<<VERSION>>>` is the three part version of LinkTest executable and `<<<SUPPORTED COMMUNICATION APIs>>>` is a list of support communication APIs/Layers. This option supersedes all others. When executing LinkTest with this command-line option it does not need to be run in parallel.
|
|
|
|
|
|
`-v` or `--version`: Prints the following version information:
|
|
|
```
|
|
|
FZJ Linktest (<<<VERSION>>>)
|
|
|
```
|
|
|
where `<<<VERSION>>>` is the three part version of Linktest executable. Like the `-h` or `--help` option Linktest does not need to be executed with this option. This option supersedes all other aside from the `-h` or `--help` option.
|
|
|
where `<<<VERSION>>>` is the three part version of LinkTest executable. Like the `-h` or `--help` option LinkTest does not need to be executed with this option. This option supersedes all other aside from the `-h` or `--help` option.
|
|
|
|
|
|
`-m` or `--mode`: Specifies that the following ASCII string indicates the communication API to use for testing. Alternatively the communication API can be extracted from the extension of the Linktest executable name or from the `LINKTEST_VCLUSTER_IMPL` environment variable. When multiple ways of specifying the communication API are used then `-m` or `--mode` supersedes the linktest executable extension, which in turn also supersedes the `LINKTEST_VCLUSTER_IMPL` environment variable.
|
|
|
`-m` or `--mode`: Specifies that the following ASCII string indicates the communication API to use for testing. Alternatively the communication API can be extracted from the extension of the LinkTest executable name or from the `LINKTEST_VCLUSTER_IMPL` environment variable. When multiple ways of specifying the communication API are used then `-m` or `--mode` supersedes the linktest executable extension, which in turn also supersedes the `LINKTEST_VCLUSTER_IMPL` environment variable.
|
|
|
|
|
|
`-w`or `--num-warmup-messages`: Specifies that the following integer indicates the number of warm-up messages to use to warm up a connection before testing it. When not printing help or version information this command-line argument is required.
|
|
|
|
... | ... | @@ -67,7 +67,7 @@ where `<<<VERSION>>>` is the three part version of Linktest executable. Like the |
|
|
|
|
|
`--min-iterations`: Specifies that the following integer indicates the number of times the linktest benchmark should be repeated. If not one the writing of SION files is disabled. This command-line argument is useful to apply a communication load to the system.
|
|
|
|
|
|
`--min-runtime`: Specifies that the following floating-point--precision number indicates the number of seconds that Linktest should repeat itself for. If non-zero the writing of SION files is disabled. This command-line is useful to apply a communication load to the system.
|
|
|
`--min-runtime`: Specifies that the following floating-point--precision number indicates the number of seconds that LinkTest should repeat itself for. If non-zero the writing of SION files is disabled. This command-line is useful to apply a communication load to the system.
|
|
|
|
|
|
`--memory_buffer_allocator`: Specifies that the following string indicates the memory buffer allocator type to be used for allocating the memory buffers for sending and receiving data. The following options are permitted:
|
|
|
|
... | ... | @@ -79,7 +79,7 @@ where `<<<VERSION>>>` is the three part version of Linktest executable. Like the |
|
|
| `POSIX_aligned-memory_allocator` | Uses `posix_memalign` to allocate buffers on a page boundary. |
|
|
|
| `CUDA_memory_allocator` | Uses CUDA `memalloc` to allocate memory on the GPU. |
|
|
|
|
|
|
`--all-to-all`: Specifies that all-to-all testing should be done before and after the main Linktest test if the used communication API is MPI.
|
|
|
`--all-to-all`: Specifies that all-to-all testing should be done before and after the main LinkTest test if the used communication API is MPI.
|
|
|
|
|
|
`--unidirectional`: Specifies that testing should occur unidirectionally instead of semi-directionally, which is the default. Only communication-API MPI is currently supported.
|
|
|
|
... | ... | @@ -170,7 +170,7 @@ To perform a bisection bandwidth test, in which the parallel bandwidth between t |
|
|
`--unidirectional` causes linktest to test unidirectionally connections in parallel. Testing semidirectionally or bidirectionally does not ensure that communication occurs unidirectionally between the two halves at any given point in time. `--bidirictional` can be used with the understanding that at no point the tests guarantee a certain communication pattern and direction between the two bisecting halves. The individual communications can not be sufficiently synchronized for this. For `--semidirectional` we have seen that the communication organizes itself in such a way that on a given link communication occurs in one direction, but the direction any given link communicates at any given time is random.
|
|
|
|
|
|
## Usage of TCP Communication API Without miniPMI
|
|
|
Linktest can be configured to test MPI or TCP without the miniPMI library. In the case of MPI no additional work is necessary, aside from executing with `mpiexe` or the like, and linktest can be used as above. When testing TCP communication without the miniPMI library the cluster configuration needs to be specified explicitly via the following four environment variables: `LINKTEST_TCP_SIZE`, `LINKTEST_TCP_RANK`, `LINKTEST_TCP_IPADDR_<<<RANK>>>` and `LINKTEST_TCP_PORT_<<<RANK>>>`.
|
|
|
LinkTest can be configured to test MPI or TCP without the miniPMI library. In the case of MPI no additional work is necessary, aside from executing with `mpiexe` or the like, and linktest can be used as above. When testing TCP communication without the miniPMI library the cluster configuration needs to be specified explicitly via the following four environment variables: `LINKTEST_TCP_SIZE`, `LINKTEST_TCP_RANK`, `LINKTEST_TCP_IPADDR_<<<RANK>>>` and `LINKTEST_TCP_PORT_<<<RANK>>>`.
|
|
|
|
|
|
`LINKTEST_TCP_SIZE`: An integer indicating the number of tasks to be used for the test.
|
|
|
|
... | ... | @@ -178,11 +178,11 @@ Linktest can be configured to test MPI or TCP without the miniPMI library. In th |
|
|
|
|
|
`LINKTEST_TCP_IPADDR_<<<RANK>>>`: The IP address of rank `<<<RANK>>`, where `<<<RANK>>>` is the eight-digit zero-filled integer rank to which the environment variable corresponds.
|
|
|
|
|
|
`LINKTEST_TCP_PORT_<<<RANK>>>`: The communication port to use of rank `<<<RANK>>`, where `<<<RANK>>>` is the eight-digit zero-filled integer rank to which the environment variable corresponds. Note that it is imperative that these ports are free on the respective machines. Linktest will not test this, nor will it port-scan to find free ports and communicate them to the partners. Setting free ports is the users responsibility.
|
|
|
`LINKTEST_TCP_PORT_<<<RANK>>>`: The communication port to use of rank `<<<RANK>>`, where `<<<RANK>>>` is the eight-digit zero-filled integer rank to which the environment variable corresponds. Note that it is imperative that these ports are free on the respective machines. LinkTest will not test this, nor will it port-scan to find free ports and communicate them to the partners. Setting free ports is the users responsibility.
|
|
|
|
|
|
For a given task `LINKTEST_TCP_SIZE` and `LINKTEST_TCP_RANK` must be specified. `LINKTEST_TCP_IPADDR_<<<RANK>>>` and `LINKTEST_TCP_PORT_<<<RANK>>>`must also be specified for all other tasks.
|
|
|
|
|
|
With the thus configured cluster environment Linktest can be executed like normal. Below is an example of how to configure this cluster environment given a host-name list, which in this case is queried via a SLURM environment variable under the assumption that this script is submitted via SLURM and that there is one task per node:
|
|
|
With the thus configured cluster environment LinkTest can be executed like normal. Below is an example of how to configure this cluster environment given a host-name list, which in this case is queried via a SLURM environment variable under the assumption that this script is submitted via SLURM and that there is one task per node:
|
|
|
```BASH
|
|
|
# 1. List of Host Names
|
|
|
hosts=($(scontrol show hostnames ${SLURM_JOB_NODELIST} | paste -s -d " "))
|
... | ... | @@ -203,18 +203,18 @@ for i in $(seq 0 $((${#hosts[@]}-1))); do |
|
|
export LINKTEST_TCP_PORT_${task}=$((${base_port}+${i}));
|
|
|
done
|
|
|
|
|
|
# 4. Execute Linktest
|
|
|
# 4. Execute LinkTest
|
|
|
linktest --mode tcp --num-warmup-messages 10 --num-messages 1000 --size-messages 1024 --output tcp.sion;
|
|
|
```
|
|
|
|
|
|
# JSC Run Examples
|
|
|
|
|
|
**Linktest on 2048 nodes, 1 task per node, message size 16 MiB, 2 warmup messages and 4 messages for measurement:**
|
|
|
**LinkTest on 2048 nodes, 1 task per node, message size 16 MiB, 2 warmup messages and 4 messages for measurement:**
|
|
|
```
|
|
|
xenv -L GCC -L CUDA -L ParaStationMPI -L SIONlib salloc -N 2048 srun -n 2048 ./linktest --mode mpi --num-warmup-messages 2 --num-messages 4 --size-messages $((16*1024*1024))
|
|
|
```
|
|
|
|
|
|
**Linktest on 936 nodes, 4 tasks per node (one per GPU) using device memory:**
|
|
|
**LinkTest on 936 nodes, 4 tasks per node (one per GPU) using device memory:**
|
|
|
```
|
|
|
xenv -L GCC -L CUDA -L ParaStationMPI -L SIONlib salloc -N 936 srun -n 3744 ./linktest --mode mpi --num-warmup-messages 2 --num-messages 4 --size-messages $((16*1024*1024)) --use-gpus
|
|
|
```
|
... | ... | @@ -229,7 +229,7 @@ xenv -L GCC -L CUDA -L ParaStationMPI -L SIONlib salloc -N 936 srun -n 3744 ./li |
|
|
xenv -L GCC -L CUDA -L ParaStationMPI -L SIONlib salloc -N 936 srun -n 3744 ./linktest --mode mpi --num-warmup-messages 2 --num-messages 4 --size-messages $((16*1024*1024)) --use-gpus --bisect
|
|
|
```
|
|
|
|
|
|
**Linktest on JUSUF (MPI through UCP)**
|
|
|
**LinkTest on JUSUF (MPI through UCP)**
|
|
|
|
|
|
```
|
|
|
$ xenv -L GCC -L CUDA -L ParaStationMPI \
|
... | ... | @@ -242,12 +242,12 @@ $ xenv -L GCC -L CUDA -L ParaStationMPI \ |
|
|
```
|
|
|
|
|
|
# Output
|
|
|
Linktest writes measurement results to stdout and monitoring information to stderr. Additionally by default a binary file in sion format will be produced containing detailed measurement data. These files are often quite sparse, therefore they can be compressed very efficiently if needed.
|
|
|
LinkTest writes measurement results to stdout and monitoring information to stderr. Additionally by default a binary file in sion format will be produced containing detailed measurement data. These files are often quite sparse, therefore they can be compressed very efficiently if needed.
|
|
|
|
|
|
## stdout
|
|
|
The stdout output starts with the settings that were given for this run
|
|
|
```
|
|
|
------------------- Linktest Args ------------------------
|
|
|
-------------------- LinkTest Args -------------------------
|
|
|
Virtual-Cluster Implementation: mpi
|
|
|
Message length: 1024 B
|
|
|
Number of Messages: 1000
|
... | ... | @@ -259,7 +259,7 @@ serial test only: No |
|
|
max serial retest: 2
|
|
|
write protocol (SION): Yes, funneled
|
|
|
output file: "linktest_mpi_2nx4c.sion"
|
|
|
----------------------------------------------------------
|
|
|
------------------------------------------------------------
|
|
|
```
|
|
|
followed by the main benchmark cycle
|
|
|
```
|
... | ... | @@ -290,7 +290,7 @@ RESULT: Min Time: 433.63310397 ns ( 2.199 GiB/s) |
|
|
RESULT: Max Time: 4.62629204 us ( 211.090 MiB/s)
|
|
|
RESULT: Avg Time: 2.25120053 us ( 433.796 MiB/s)
|
|
|
```
|
|
|
At the end the slowest connections are retested in serial, which ensures that Linktest places no additional stress on the system aside from the stress required to measure the connection. This is useful to see if the poor performance of a given connection may be due to the load Linktest places on the system, for example the interconnects, or if the connection is just bad, for example due to a badly seated connection.
|
|
|
At the end the slowest connections are retested in serial, which ensures that LinkTest places no additional stress on the system aside from the stress required to measure the connection. This is useful to see if the poor performance of a given connection may be due to the load LinkTest places on the system, for example the interconnects, or if the connection is just bad, for example due to a badly seated connection.
|
|
|
```
|
|
|
0: PINGPONG 3 <-> 6: 1st: 4.62629 us ( 211.0897 MiB/s) 2nd: 3.89782 us ( 250.5408 MiB/s)
|
|
|
1: PINGPONG 2 <-> 5: 1st: 4.20862 us ( 232.0387 MiB/s) 2nd: 3.17407 us ( 307.6689 MiB/s)
|
... | ... | @@ -327,9 +327,9 @@ timings[000] [sionclose] t= 403.51134 us |
|
|
timings[000] [all] t= 312.74890 ms
|
|
|
```
|
|
|
## SION Files
|
|
|
Unless turned off, Linktest will, by default, also generate a binary SION file, whose default name is `pingpong_results_bin.sion`. This file contains the Linktest measurements, a list of the involved hosts, as well as the options passed to Linktest when it was executed.
|
|
|
Unless turned off, LinkTest will, by default, also generate a binary SION file, whose default name is `pingpong_results_bin.sion`. This file contains the LinkTest measurements, a list of the involved hosts, as well as the options passed to LinkTest when it was executed.
|
|
|
|
|
|
If `--no-sion-file` is specified as a command-line option when executing Linktest then no SION file is generated. If `--parallel-sion-file` is specified as a command-line option when executing Linktest then the output SION file, if enabled, will be written out in parallel. This speeds up the output to file systems that support parallel access. The name of the output SION file can be changed via the command-line argument `-o` or `--output` followed by a space and the name of the file.
|
|
|
If `--no-sion-file` is specified as a command-line option when executing LinkTest then no SION file is generated. If `--parallel-sion-file` is specified as a command-line option when executing LinkTest then the output SION file, if enabled, will be written out in parallel. This speeds up the output to file systems that support parallel access. The name of the output SION file can be changed via the command-line argument `-o` or `--output` followed by a space and the name of the file.
|
|
|
|
|
|
### SION File Defragmentation
|
|
|
The format of these SION files is optimized for parallel access which causes them to be very sparse. You can compress the SION files as follows:
|
... | ... | |