... | ... | @@ -24,7 +24,9 @@ Linktest is just the tool for this as it can benchmark communication between tas |
|
|
|
|
|
With the correct pinning we can now test the core-to-core connectivity on a node. For reliable numbers it is, however, imperative to use a sufficient number of messages, including warm-up messages. We chose to use 2000 messages for testing and 200 for warm up with a message size of 1 MiB. Testing was performed via ParaStation MPI 5.4.10-1 which uses local memory for transfers when possible.
|
|
|
|
|
|
The results of this test for two different identically configured nodes can be seen in this report: [JURECA-DC_AMD-EPYC-7742.pdf](uploads/15b18b5e70bef06406ca25d33e6e8766/JURECA-DC_AMD-EPYC-7742.pdf). A newer test using unidirectional MPI and taking the median across 64 CPUs results in less noisy results: ![JURECA-DC_AMD-EPYC-7742-MedianCorrected](uploads/e3ec65939074355d2dab0e2b3d5f8a6f/JURECA-DC_AMD-EPYC-7742-MedianCorrected.png). Note that for this report the central purple blocks along the diagonal come from a single test sending 8192 messages.
|
|
|
The results of this test for two different identically configured nodes can be seen in this report: [JURECA-DC_AMD-EPYC-7742.pdf](uploads/15b18b5e70bef06406ca25d33e6e8766/JURECA-DC_AMD-EPYC-7742.pdf). A newer test using unidirectional MPI and taking the median across 64 CPUs results in less noisy results: [JURECA-DC_AMD-EPYC-7742_MedianCorrected.pdf](uploads/fbf7c1470f2edb0616f9fa1c9500143b/JURECA-DC_AMD-EPYC-7742_MedianCorrected.pdf). Note that for this report the central purple blocks along the diagonal come from a single test sending 8192 messages. This report is shown below:
|
|
|
|
|
|
![JURECA-DC_AMD-EPYC-7742-MedianCorrected](uploads/e3ec65939074355d2dab0e2b3d5f8a6f/JURECA-DC_AMD-EPYC-7742-MedianCorrected.png)
|
|
|
|
|
|
To understand these results we need to understand how AMD EPYC 7742 CPUs are internally built up. These CPUs are 64-bit 64-core x86 server microprocessors based on the ZEN-2 micro-architecture with the logic dies fabricated using the TSMC 7 nm process, while the IO die is fabricated using GlobalFoundries 14nm process. They were first introduced in 2019. They have a base clock speed 2.25 GHz, which can boost up to 3.4 GHz on a single core. The processors support up to two-way simultaneous multi-threading, hence the need for pinning above.
|
|
|
|
... | ... | |