... | @@ -18,9 +18,9 @@ This demonstrates why using warm-up messages important. Using just 3 already hel |
... | @@ -18,9 +18,9 @@ This demonstrates why using warm-up messages important. Using just 3 already hel |
|
[linktest_jusuf_ucp_64nx1c_512B_3Warm.pdf](https://gitlab.jsc.fz-juelich.de/cstao-public/linktest/uploads/5fb8f26d40d4de586e8407313663d462/linktest_jusuf_ucp_64nx1c_512B_3Warm.pdf)
|
|
[linktest_jusuf_ucp_64nx1c_512B_3Warm.pdf](https://gitlab.jsc.fz-juelich.de/cstao-public/linktest/uploads/5fb8f26d40d4de586e8407313663d462/linktest_jusuf_ucp_64nx1c_512B_3Warm.pdf)
|
|
|
|
|
|
# Testing Inter- & Intra-CPU Communication Performance For A Single Node: A AMD EPYC 7742 Case Study
|
|
# Testing Inter- & Intra-CPU Communication Performance For A Single Node: A AMD EPYC 7742 Case Study
|
|
Linktest cannot only be used to test connections between computers, or nodes in a HPC setting, it can also be used to benchmark the intra-node connectivity between cores. In this short case study we will have a look at two JURECA-DC cluster nodes, each is equipped with 2 64-core AMD EPYC 7742 CPUs and 16 32 GiB DDR4 RAM sticks clocked at 3.2 GHz. We now wish to benchmark the communication performance between the cores in these nodes.
|
|
LinkTest cannot only be used to test connections between computers, or nodes in a HPC setting, it can also be used to benchmark the intra-node connectivity between cores. In this short case study we will have a look at two JURECA-DC cluster nodes, each is equipped with 2 64-core AMD EPYC 7742 CPUs and 16 32 GiB DDR4 RAM sticks clocked at 3.2 GHz. We now wish to benchmark the communication performance between the cores in these nodes.
|
|
|
|
|
|
Linktest is just the tool for this as it can benchmark communication between tasks, which run on CPU cores. For that we need to pin Linktest tasks to physical CPU cores when executing Linktest. Using SLURM's `srun` this can be done using the `` --cpu-bind=map_cpu:`seq -s, 0 127` `` command-line argument. Pinning tasks to physical cores ensures that we only test connectivity between physical and not with logical cores via simultaneous multi-threading.
|
|
LinkTest is just the tool for this as it can benchmark communication between tasks, which run on CPU cores. For that we need to pin LinkTest tasks to physical CPU cores when executing LinkTest. Using SLURM's `srun` this can be done using the `` --cpu-bind=map_cpu:`seq -s, 0 127` `` command-line argument. Pinning tasks to physical cores ensures that we only test connectivity between physical and not with logical cores via simultaneous multi-threading.
|
|
|
|
|
|
With the correct pinning we can now test the core-to-core connectivity on a node. For reliable numbers it is, however, imperative to use a sufficient number of messages, including warm-up messages. We chose to use 2000 messages for testing and 200 for warm up with a message size of 1 MiB. Testing was performed via ParaStation MPI 5.4.10-1 which uses local memory for transfers when possible.
|
|
With the correct pinning we can now test the core-to-core connectivity on a node. For reliable numbers it is, however, imperative to use a sufficient number of messages, including warm-up messages. We chose to use 2000 messages for testing and 200 for warm up with a message size of 1 MiB. Testing was performed via ParaStation MPI 5.4.10-1 which uses local memory for transfers when possible.
|
|
|
|
|
... | @@ -38,14 +38,14 @@ Now let us look at the finer details. The purple blocks along the diagonal group |
... | @@ -38,14 +38,14 @@ Now let us look at the finer details. The purple blocks along the diagonal group |
|
|
|
|
|
We see the same behavior for communication between the quadrants of the two different CPUs, where certain quadrants communicate faster. It seems as if 2 quadrants have consistently worse inter-CPU performance than the other two. This may again be related to the IO die.
|
|
We see the same behavior for communication between the quadrants of the two different CPUs, where certain quadrants communicate faster. It seems as if 2 quadrants have consistently worse inter-CPU performance than the other two. This may again be related to the IO die.
|
|
|
|
|
|
The second page of the report shows the same Linktest run for a different node identically configured and we see the same results, demonstrating that the results are reproducible using different CPUs from the same family.
|
|
The second page of the report shows the same LinkTest run for a different node identically configured and we see the same results, demonstrating that the results are reproducible using different CPUs from the same family.
|
|
|
|
|
|
This information is useful from an optimization standpoint as it suggests that communication should best be kept within a CCX. If that is not possible certain quadrant-to-quadrant communications are faster than other.
|
|
This information is useful from an optimization standpoint as it suggests that communication should best be kept within a CCX. If that is not possible certain quadrant-to-quadrant communications are faster than other.
|
|
|
|
|
|
In conclusion this case study demonstrates that Linktest can be used to benchmark inter- and intra-CPU communication between cores. Benchmarking of CPUs in such a fashion can help to optimize software for certain architectures by providing the necessary information on how best to communicate within a CPU.
|
|
In conclusion this case study demonstrates that LinkTest can be used to benchmark inter- and intra-CPU communication between cores. Benchmarking of CPUs in such a fashion can help to optimize software for certain architectures by providing the necessary information on how best to communicate within a CPU.
|
|
|
|
|
|
# Difference Between Uni-, Bi- & Semi-Directional Reports
|
|
# Difference Between Uni-, Bi- & Semi-Directional Reports
|
|
The Linktest testing method defines the results that you will get. It is up to you, the user, to interpret these results and to understand how the testing methodology affects results.
|
|
The LinkTest testing method defines the results that you will get. It is up to you, the user, to interpret these results and to understand how the testing methodology affects results.
|
|
|
|
|
|
When the network topology between two nodes is isotropic, a fancy terminology to say that the two nodes can communicate with the same speed in both directions the test results when using uni-directional and semi-directional testing do not differ much. Bi-directional testing results may differ, but we will get back to that later.
|
|
When the network topology between two nodes is isotropic, a fancy terminology to say that the two nodes can communicate with the same speed in both directions the test results when using uni-directional and semi-directional testing do not differ much. Bi-directional testing results may differ, but we will get back to that later.
|
|
|
|
|
... | | ... | |