|
|
# All-to-All Testing
|
|
|
All-to-all communication is a commonly occurring communication pattern in which everybody has to communicate something with everybody else. Linktest supports the testing of such communication patterns only for MPI. For an all-to-all communication only the time it took for everyone to finish communicating with everyone else is returned. To turn on all-to-all testing in conjunction with MPI testing please specify the `--alltoall` command-line option.
|
|
|
All-to-all communication is a commonly occurring communication pattern in which everybody has to communicate something with everybody else. LinkTest supports the testing of such communication patterns only for MPI. For an all-to-all communication only the time it took for everyone to finish communicating with everyone else is returned. To turn on all-to-all testing in conjunction with MPI testing please specify the `--alltoall` command-line option.
|
|
|
|
|
|
Please note that the exact implementation of how all-to-all communication occurs depends on the used MPI implementation. There are a variety of performant algorithms for all-to-all communication, each with advantages and drawbacks.
|
|
|
|
... | ... | @@ -7,28 +7,28 @@ Please note that the exact implementation of how all-to-all communication occurs |
|
|
A application programming interface facilitates the interaction between different pieces of software, which potentially run on disparate machines. They allow for the communication between software, and by extension between different compute devices.
|
|
|
|
|
|
# Benchmarking
|
|
|
The act of collecting data to compare things. Linktest benchmarks communication APIs and the associated hardware by measuring how long it takes for a message to be sent back-and-forth between two tasks, which allows for the comparison to the time it takes the same message to be sent back-and-forth between a different pair of tasks or using a different communication API.
|
|
|
The act of collecting data to compare things. LinkTest benchmarks communication APIs and the associated hardware by measuring how long it takes for a message to be sent back-and-forth between two tasks, which allows for the comparison to the time it takes the same message to be sent back-and-forth between a different pair of tasks or using a different communication API.
|
|
|
|
|
|
# Bidirectional Testing
|
|
|
Not to be confused with bisection testing. In bidirectional testing messages are sent between tasks asynchronously. Normally Linktest benchmarks communication times by sending a message from one task in pair to the other and then the other sends the same message back. In bidirectional testing both tasks send messages to each other at the same time. This means that neither task weights on the other before sending their message. Such communication is more taxing between two tasks but also commonly faster because neither task has to wait on the other before send their message. Bisection testing can be turned on by specifying the `--bisection` command-line option.
|
|
|
Not to be confused with bisection testing. In bidirectional testing messages are sent between tasks asynchronously. Normally LinkTest benchmarks communication times by sending a message from one task in pair to the other and then the other sends the same message back. In bidirectional testing both tasks send messages to each other at the same time. This means that neither task weights on the other before sending their message. Such communication is more taxing between two tasks but also commonly faster because neither task has to wait on the other before send their message. Bisection testing can be turned on by specifying the `--bisection` command-line option.
|
|
|
|
|
|
# Bisection Testing
|
|
|
Not to be confused with bidirectional testing. In bisection communication testing a population is split into two halves and the communication between the two halves is benchmarked. In Linktest the set of tasks is split into two halves and the communication times for a given message size is benchmarked between the two halves. In Linktest this is done by taking the two halves and iterating over all possible pairs with members from differing sets and timing their back-and-forth communication time for a given message size. This is for example useful for testing cell-to-cell communication performance in hierarchically routed network topologies. Linktest tests bisecting halves of tasks when the `--bisection` command-line option is specified.
|
|
|
Not to be confused with bidirectional testing. In bisection communication testing a population is split into two halves and the communication between the two halves is benchmarked. In LinkTest the set of tasks is split into two halves and the communication times for a given message size is benchmarked between the two halves. In LinkTest this is done by taking the two halves and iterating over all possible pairs with members from differing sets and timing their back-and-forth communication time for a given message size. This is for example useful for testing cell-to-cell communication performance in hierarchically routed network topologies. LinkTest tests bisecting halves of tasks when the `--bisection` command-line option is specified.
|
|
|
|
|
|
Please note that the sets are determined at beginning of testing and are never changed. As such a given configuration always results in the same split of tasks into halves. If you wish to have different tasks associated with the two different halves then the task order needs to be changed. This is ideally done when submitting the parallel job for Linktest.
|
|
|
Please note that the sets are determined at beginning of testing and are never changed. As such a given configuration always results in the same split of tasks into halves. If you wish to have different tasks associated with the two different halves then the task order needs to be changed. This is ideally done when submitting the parallel job for LinkTest.
|
|
|
|
|
|
# Buffer Randomization
|
|
|
Linktest has the ability to randomize buffers before transferring them. As some new communication APIs compress messages on the fly this option is there to ensure that the actual time it takes to transfer the messages can be measured, not the time it takes for a compressed version of the message to be sent.
|
|
|
LinkTest has the ability to randomize buffers before transferring them. As some new communication APIs compress messages on the fly this option is there to ensure that the actual time it takes to transfer the messages can be measured, not the time it takes for a compressed version of the message to be sent.
|
|
|
|
|
|
# Checking Memory-Buffer Content
|
|
|
Linktest has the ability to check its buffers after a connection has been tested, this is to ensure that the correct information has been transferred, i.e. that the communication API transferred the correct buffer to the receiver and did not modify the sending buffer. This is done by iteratively going through the buffer and checking each byte.
|
|
|
LinkTest has the ability to check its buffers after a connection has been tested, this is to ensure that the correct information has been transferred, i.e. that the communication API transferred the correct buffer to the receiver and did not modify the sending buffer. This is done by iteratively going through the buffer and checking each byte.
|
|
|
|
|
|
# Communication API
|
|
|
Communication APIs facilitate the communication between different computers by abstracting the underlying necessary hardware commands into easy-to-use portable instructions that can work on a host of different machines. A classical example is MPI.
|
|
|
|
|
|
Linktest can test and benchmark different communication APIs. The communication API that Linktest uses can be controlled via the `--mode` command-line option. Alternatively it can be specified by appending it as a suffix to the Linktest-executable name, for example `linktest.mpi`, or it can be specified via the `LINKTEST_VCLUSTER_IMPL` environment variable.
|
|
|
LinkTest can test and benchmark different communication APIs. The communication API that LinkTest uses can be controlled via the `--mode` command-line option. Alternatively it can be specified by appending it as a suffix to the LinkTest-executable name, for example `linktest.mpi`, or it can be specified via the `LINKTEST_VCLUSTER_IMPL` environment variable.
|
|
|
|
|
|
Linktest supports the following communication APIs:
|
|
|
LinkTest supports the following communication APIs:
|
|
|
|
|
|
| API | Env. Variable | Default | Description |
|
|
|
| ----- | -------- | ------- | ----------- |
|
... | ... | @@ -39,43 +39,43 @@ Linktest supports the following communication APIs: |
|
|
| `ucp` | `HAVE_UCP` | Enabled | UCX |
|
|
|
| `tcp` | `HAVE_TCP` | Enabled | TCP sockets |
|
|
|
|
|
|
Note that during Linktest installation only desired supported communication APIs are installed by setting the corresponding environment variable to `1` to install or `0` to not install. As such a given Linktest executable may not support all the listed communication APIs. By default all communication APIs are supported after installation, however this rarely builds successfully as most platforms do not support all communication APIs due to a lack of relevant hardware.
|
|
|
Note that during LinkTest installation only desired supported communication APIs are installed by setting the corresponding environment variable to `1` to install or `0` to not install. As such a given LinkTest executable may not support all the listed communication APIs. By default all communication APIs are supported after installation, however this rarely builds successfully as most platforms do not support all communication APIs due to a lack of relevant hardware.
|
|
|
|
|
|
# Communication Time
|
|
|
The communication time in Linktest is the time it takes from when a message is ready to be sent till it arrives at the recipient and a receipt is returned. Linktest tests two-way communication times, which is the time it takes between the message being ready to be sent till that message is returned and a receipt is sent. This is referred to as the two-way communication time. As opposed to the one-way communication time which is the time from the message-being ready to be sent till a receipt is received that the message has been successfully delivered. If bidirectional testing is used, both communication partners send their identical messages at the same time and timing ends when a partner receives a receipt.
|
|
|
The communication time in LinkTest is the time it takes from when a message is ready to be sent till it arrives at the recipient and a receipt is returned. LinkTest tests two-way communication times, which is the time it takes between the message being ready to be sent till that message is returned and a receipt is sent. This is referred to as the two-way communication time. As opposed to the one-way communication time which is the time from the message-being ready to be sent till a receipt is received that the message has been successfully delivered. If bidirectional testing is used, both communication partners send their identical messages at the same time and timing ends when a partner receives a receipt.
|
|
|
|
|
|
In a gross oversimplification the communication time consists of two parts, the latency and the transit time. The latency is the time from the message being ready to be sent till sending actually begins. During this time, for example, the connection used to transmit the message is initialized. The transit time is the time it then takes the message to get from its origin to its destination and for a receipt to go back to the destination that the message has been successfully received.
|
|
|
|
|
|
For small message sizes the communication time is dominated by the latency. For large message sizes the communication time is dominated by transmit time, which depends on the communication bandwidth. As such to benchmark communication latency message sizes as small as possible should be used, ideally 0, but messages must have a non-zero message size, as such 1 should be used. To benchmark transmit times, and indirectly bandwidth, as large as possible, although often as large as feasible, message sizes should be used. This ensures that the latency plays a vanishing role in the communication time. Why should message sizes as large as feasible and not as large as possible be used here? The answer is that as message size increases the length of time for the benchmark also grows and too large messages sizes might make Linktest take too long. This is often the case when testing connections serially.
|
|
|
For small message sizes the communication time is dominated by the latency. For large message sizes the communication time is dominated by transmit time, which depends on the communication bandwidth. As such to benchmark communication latency message sizes as small as possible should be used, ideally 0, but messages must have a non-zero message size, as such 1 should be used. To benchmark transmit times, and indirectly bandwidth, as large as possible, although often as large as feasible, message sizes should be used. This ensures that the latency plays a vanishing role in the communication time. Why should message sizes as large as feasible and not as large as possible be used here? The answer is that as message size increases the length of time for the benchmark also grows and too large messages sizes might make LinkTest take too long. This is often the case when testing connections serially.
|
|
|
|
|
|
# CPU RAM
|
|
|
The Random Access Memory (RAM) associated with the Central Processing Units (CPU) of a system, this is usually the main RAM and default RAM Linktest uses to store its messages. However, the dedicated on-card RAM of Graphics Processing Units (GPU) from NVIDIA GPUs can also be used via CUDA. Turning on the option `--use-gpus` enables this. Note that Linktest does not keep track of which GPU memory was pinned to, it does not even keep track of which CPU a given Linktest task is executed on. This is the responsibility of the one executing the Linktest benchmark.
|
|
|
The Random Access Memory (RAM) associated with the Central Processing Units (CPU) of a system, this is usually the main RAM and default RAM LinkTest uses to store its messages. However, the dedicated on-card RAM of Graphics Processing Units (GPU) from NVIDIA GPUs can also be used via CUDA. Turning on the option `--use-gpus` enables this. Note that LinkTest does not keep track of which GPU memory was pinned to, it does not even keep track of which CPU a given LinkTest task is executed on. This is the responsibility of the one executing the LinkTest benchmark.
|
|
|
|
|
|
# GPU RAM
|
|
|
The Random Access Memory (RAM) associated with a Graphics Processing Unit (GPU) on a system, this is usually not the main RAM of the system associated with the Central Processing Units (CPU) of the system. Linktest uses the latter RAM by default to store its messages. For NVIDIA GPUs the GPU RAM, however, can also be used to store the Linktest messages via CUDA. Turning on the option `--use-gpus` enables this. Pinning Linktest tasks to specific GPUs is required for this. Linktest does not keep track of which GPU memory was pinned to, it does not even keep track of which CPU a given Linktest task is executed on. This is the responsibility of the one executing the Linktest benchmark.
|
|
|
The Random Access Memory (RAM) associated with a Graphics Processing Unit (GPU) on a system, this is usually not the main RAM of the system associated with the Central Processing Units (CPU) of the system. LinkTest uses the latter RAM by default to store its messages. For NVIDIA GPUs the GPU RAM, however, can also be used to store the LinkTest messages via CUDA. Turning on the option `--use-gpus` enables this. Pinning LinkTest tasks to specific GPUs is required for this. LinkTest does not keep track of which GPU memory was pinned to, it does not even keep track of which CPU a given LinkTest task is executed on. This is the responsibility of the one executing the LinkTest benchmark.
|
|
|
|
|
|
# Latency
|
|
|
The time it takes before an action can be executed. For Linktest this is the time it takes between a message being ready to be sent till sending begins.
|
|
|
The time it takes before an action can be executed. For LinkTest this is the time it takes between a message being ready to be sent till sending begins.
|
|
|
|
|
|
For the relationship between latency transit time and message size see [Communication Time](#communication-time).
|
|
|
|
|
|
# Message Size
|
|
|
The message size is used to refer to the size of messages in bytes used by Linktest to benchmark communication. For the relationship between latency transit time and message size see [Communication Time](#communication-time). Note that many communication APIs only support message sizes up to 2 GiB. For 32-bit MPI implementations the cumulative message size of all messages is restricted in total to less than 2 GiB.
|
|
|
The message size is used to refer to the size of messages in bytes used by LinkTest to benchmark communication. For the relationship between latency transit time and message size see [Communication Time](#communication-time). Note that many communication APIs only support message sizes up to 2 GiB. For 32-bit MPI implementations the cumulative message size of all messages is restricted in total to less than 2 GiB.
|
|
|
|
|
|
# Memory-Buffer Allocator
|
|
|
Linktest has the ability to allocate its memory buffers, which are used to store the messages for sending and the received messages, using a variety of allocators. Currently there are four options: 1) Memory-aligned malloc, 2) Pinned memory-map, 3) POSIX memory-aligned malloc and 4) CUDA malloc. Memory-aligned malloc uses the C++ function `std::aligned_alloc()`. Pinned memory-map uses the C function `mmap()`. POSIX memory-aligned malloc uses the POSIX C function `posix_memalign`. CUDA malloc uses the CUDA memory allocator to allocate memory on GPUs, this is the only option to allocate memory on GPUs.
|
|
|
LinkTest has the ability to allocate its memory buffers, which are used to store the messages for sending and the received messages, using a variety of allocators. Currently there are four options: 1) Memory-aligned malloc, 2) Pinned memory-map, 3) POSIX memory-aligned malloc and 4) CUDA malloc. Memory-aligned malloc uses the C++ function `std::aligned_alloc()`. Pinned memory-map uses the C function `mmap()`. POSIX memory-aligned malloc uses the POSIX C function `posix_memalign`. CUDA malloc uses the CUDA memory allocator to allocate memory on GPUs, this is the only option to allocate memory on GPUs.
|
|
|
|
|
|
# Mode
|
|
|
The `--mode` command-line option defines which communication API Linktest benchmarks. As a shorthand `-m` can be used. See [Communication API](#communication-api) for a list of supported communication APIs.
|
|
|
The `--mode` command-line option defines which communication API LinkTest benchmarks. As a shorthand `-m` can be used. See [Communication API](#communication-api) for a list of supported communication APIs.
|
|
|
|
|
|
# Multiple Buffers
|
|
|
For the unidirectional MPI case, see [Unidirectional Testing](#unidirectional-testing) Linktest is able to use multiple buffers to send and receive messages. If there are less buffers available than messages to be sent then Linktest cycles through the buffers. The idea behind this is to avoid cache thrashing due to multiple accesses of the buffers during transfer. This can improve speed, however, often using a single buffer that can be kept in the CPU cache is more performant.
|
|
|
For the unidirectional MPI case, see [Unidirectional Testing](#unidirectional-testing) LinkTest is able to use multiple buffers to send and receive messages. If there are less buffers available than messages to be sent then LinkTest cycles through the buffers. The idea behind this is to avoid cache thrashing due to multiple accesses of the buffers during transfer. This can improve speed, however, often using a single buffer that can be kept in the CPU cache is more performant.
|
|
|
|
|
|
# Number Of Messages
|
|
|
Linktest benchmarks communications by repeating a communication many times. The amount of times it repeats the sending of messages for timing purposes is controlled via the `--num-messages` command-line argument. This defines how many times the back-and-forth sending of messages is repeated for timing purposes. The final returned times are the average time it took the message to be sent back-and-forth.
|
|
|
LinkTest benchmarks communications by repeating a communication many times. The amount of times it repeats the sending of messages for timing purposes is controlled via the `--num-messages` command-line argument. This defines how many times the back-and-forth sending of messages is repeated for timing purposes. The final returned times are the average time it took the message to be sent back-and-forth.
|
|
|
|
|
|
# Number of Warm-Up Messages
|
|
|
Linktest warms up connections by testing them multiple times before timing begins. Basically the same actions as during timing occur multiple times beforehand. This is often done because connections need to be first initialized, which means that sending a message the first time often takes longer than when it is sent the second time a short time afterwards. During the first time, sometimes couple of times, a message is sent over a network the network optimizes itself for the transmission of the message, i.e. it becomes primed for this message. As such it often makes sense to include at least one warm-up message before benchmarking a connection. For small message sizes more should be used, 3-5 work well. In Linktest this number of warm-up messages must be stipulated via the `--num-warmup-messages` command-line argument.
|
|
|
LinkTest warms up connections by testing them multiple times before timing begins. Basically the same actions as during timing occur multiple times beforehand. This is often done because connections need to be first initialized, which means that sending a message the first time often takes longer than when it is sent the second time a short time afterwards. During the first time, sometimes couple of times, a message is sent over a network the network optimizes itself for the transmission of the message, i.e. it becomes primed for this message. As such it often makes sense to include at least one warm-up message before benchmarking a connection. For small message sizes more should be used, 3-5 work well. In LinkTest this number of warm-up messages must be stipulated via the `--num-warmup-messages` command-line argument.
|
|
|
|
|
|
# Ping-Pong Testing
|
|
|
Ping-Pong tests are a standard tool for network operators. They can thought of as an extension to the `ping` command used to test for the accessibility of machines for a given address, which is a ping test. In a ping test a message is sent from an origin to a destination and the time is taken at the origin till a receipt is received at the destination. Ping-pong tests extend this by timing at the origin till the original message is received back again, i.e. the pong in ping-pong testing. In birectional testing the sending of messages is done by the origin and destination congruently, see [Bidirectional Testing](#bidirectional-testing).
|
... | ... | @@ -93,23 +93,23 @@ B sends N messages of size S to A |
|
|
A takes the time t2 after the last receive finished
|
|
|
A writes average time (t2-t1)/2N to the sion file
|
|
|
|
|
|
In the Matrix seen in Linktest reports this time corresponds to the entry in column A, ow B
|
|
|
In the Matrix seen in LinkTest reports this time corresponds to the entry in column A, ow B
|
|
|
|
|
|
# Randomizing Testing Order
|
|
|
Although by default Linktest tests the connection between a given task and all other tasks, results may depend on the order in which the testing is performed. The `--randomize` command-line option causes the testing order to be randomly mixed which means that consecutive Linktest runs with this on will likely test physical connections in a different order.
|
|
|
Although by default LinkTest tests the connection between a given task and all other tasks, results may depend on the order in which the testing is performed. The `--randomize` command-line option causes the testing order to be randomly mixed which means that consecutive LinkTest runs with this on will likely test physical connections in a different order.
|
|
|
|
|
|
# Serial Testing
|
|
|
By default Linktest tests as many connections as possible in parallel, this, however, can cause tests to interfere. This is sometimes desired, for example, if real-world network performance under a sustained network load is to be tested. In other cases peak performance without the interference of other parallel tests is desired. In this case serial testing is done, in which each connection between a pair of tasks is tested individually. This effectively serializes the test and will cause it to take significantly longer. Serial testing can be turned on in Linktest by using the `--serial-testing` command-line option.
|
|
|
By default LinkTest tests as many connections as possible in parallel, this, however, can cause tests to interfere. This is sometimes desired, for example, if real-world network performance under a sustained network load is to be tested. In other cases peak performance without the interference of other parallel tests is desired. In this case serial testing is done, in which each connection between a pair of tasks is tested individually. This effectively serializes the test and will cause it to take significantly longer. Serial testing can be turned on in LinkTest by using the `--serial-testing` command-line option.
|
|
|
|
|
|
# Serial Retesting
|
|
|
Some of the connections tested by Linktest will perform worse than others. By default Linktest retests some of the worst connections serially. This is to determine if the poor performance is due to conflicts with other parallel connections or other processes that run in parallel on the same node/CPU. If the times for a serially retested connection improves to expected values then that indicates that during the main measurement there was some type of conflict. It is a good indication that something might be wrong with a connection if said connection performance does not improve as expected after serial retesting.
|
|
|
Some of the connections tested by LinkTest will perform worse than others. By default LinkTest retests some of the worst connections serially. This is to determine if the poor performance is due to conflicts with other parallel connections or other processes that run in parallel on the same node/CPU. If the times for a serially retested connection improves to expected values then that indicates that during the main measurement there was some type of conflict. It is a good indication that something might be wrong with a connection if said connection performance does not improve as expected after serial retesting.
|
|
|
|
|
|
The amount of connections to be serially retested can be controlled via the `--num-slowest` command-line argument followed by a positive integer indicating the number of worst connections to serial retest.
|
|
|
|
|
|
# Stress Testing
|
|
|
Linktest can be used to apply a nearly continuous connection load to stress a network. This is useful to see how stable a network remains under a continuous load.
|
|
|
LinkTest can be used to apply a nearly continuous connection load to stress a network. This is useful to see how stable a network remains under a continuous load.
|
|
|
|
|
|
Linktest can be configured to stress test using two command-line arguments. `--min-iterations` followed by a positive integer indicates how many times the main test of Linktest is at least repeated. `--min-runtime` followed by an integer indicates at least how long Linktest should repeat the main test. Linktest only stops repeating the main test if both are satisfied. If only one is specified then Linktest only tests against that one.
|
|
|
LinkTest can be configured to stress test using two command-line arguments. `--min-iterations` followed by a positive integer indicates how many times the main test of LinkTest is at least repeated. `--min-runtime` followed by an integer indicates at least how long LinkTest should repeat the main test. LinkTest only stops repeating the main test if both are satisfied. If only one is specified then LinkTest only tests against that one.
|
|
|
|
|
|
# Transit time
|
|
|
Transit time is the time it takes for an object to go from its origin to its destination. During this travel period the object is said to be in transit. For communication times, i.e. the time the message is in transit
|
... | ... | @@ -117,7 +117,7 @@ Transit time is the time it takes for an object to go from its origin to its des |
|
|
For the relationship between latency transit time and message size see [Communication Time](#communication-time).
|
|
|
|
|
|
# Transport Layer
|
|
|
In the OSI model a transport layer is conceptual division of the methods and protocols related to the transport of information, generally in terms of bytes. In Linktest it defines the API (the aforementioned methods and protocols) used to communicate data between, or within, systems. It is generally used in conjunction with which communication-API Linktest should test, which is controlled over the `--mode` option. It, however, should not be confused with the communication API used for testing. The transport layer is an abstract concept. Linktest uses the communication API for the actual establishment and testing of connections.
|
|
|
In the OSI model a transport layer is conceptual division of the methods and protocols related to the transport of information, generally in terms of bytes. In LinkTest it defines the API (the aforementioned methods and protocols) used to communicate data between, or within, systems. It is generally used in conjunction with which communication-API LinkTest should test, which is controlled over the `--mode` option. It, however, should not be confused with the communication API used for testing. The transport layer is an abstract concept. LinkTest uses the communication API for the actual establishment and testing of connections.
|
|
|
|
|
|
# Unidirectional Testing
|
|
|
Linktest using the MPI communication API can also test unidirectionally. In this case after timing begins messages are send back to back from the sending host to the receiving host. Once the receiving host received all messages it sends a receipt back to the sender, who then stops timing. The advantage of this test is that it is sensitive to the case where bandwidth is between two partners is not isotropic, i.e. the bandwidth depends on the send direction. |
|
|
LinkTest using the MPI communication API can also test unidirectionally. In this case after timing begins messages are send back to back from the sending host to the receiving host. Once the receiving host received all messages it sends a receipt back to the sender, who then stops timing. The advantage of this test is that it is sensitive to the case where bandwidth is between two partners is not isotropic, i.e. the bandwidth depends on the send direction. |