|
|
|
# Glossary
|
|
|
|
|
|
|
|
## All-to-All Testing
|
|
|
|
All-to-all communication is a commonly occurring communication pattern in which everybody has to communicate something with everybody else. Linktest supports the testing of such communication patterns only for MPI. For an all-to-all communication only the time it took for everyone to finish communicating with everyone else is returned. To turn on all-to-all testing in conjunction with MPI testing please specify the `--alltoall` command-line option.
|
|
|
|
|
|
|
|
Please note that the exact implementation of how all-to-all communication occurs depends on the used MPI implementation. There are a variety of performant algorithms for all-to-all communication, each with advantages and drawbacks.
|
|
|
|
|
|
|
|
## API
|
|
|
|
A application programming interface facilitates the interaction between different pieces of software, which potentially run on disparate machines. They allow for the communication between software, and by extension between different compute devices.
|
|
|
|
|
|
|
|
## Benchmarking
|
|
|
|
The act of collecting data to compare things. Linktest benchmarks communication APIs and the associated hardware by measuring how long it takes for a message to be sent back-and-forth between two tasks, which allows for the comparison to the time it takes the same message to be sent back-and-forth between a different pair of tasks or using a different communication API.
|
|
|
|
|
|
|
|
## Mode
|
|
|
|
The `--mode` command-line option defines which communication API Linktest benchmarks. As a shorthand `-m` can be used. See XXX for a list of supported communication APIs.
|
|
|
|
|
|
|
|
## Bidirectional Testing
|
|
|
|
Not to be confused with bisection testing. In bidirectional testing messages are sent between tasks asynchronously. Normally Linktest benchmarks communication times by sending a message from one task in pair to the other and then the other sends the same message back. In bidirectional testing both tasks send messages to each other at the same time. This means that neither task weights on the other before sending their message. Such communication is more taxing between two tasks but also commonly faster because neither task has to wait on the other before send their message. Bisection testing can be turned on by specifying the `--bisection` command-line option.
|
|
|
|
|
|
|
|
## Bisection Testing
|
|
|
|
Not to be confused with bidirectional testing. In bisection communication testing a population is split into two halves and the communication between the two halves is benchmarked. In Linktest the set of tasks is split into two halves and the communication times for a given message size is benchmarked between the two halves. In Linktest this is done by taking the two halves and iterating over all possible pairs with members from differing sets and timing their back-and-forth communication time for a given message size. This is for example useful for testing cell-to-cell communication performance in hierarchically routed network topologies. Linktest tests bisecting halves of tasks when the `--bisection` command-line option is specified.
|
|
|
|
|
|
|
|
Please note that the sets are determined at beginning of testing and are never changed. As such a given configuration always results in the same split of tasks into halves. If you wish to have different tasks associated with the two different halves then the task order needs to be changed. TODO: Does this work with the `--mix` option?
|
|
|
|
|
|
|
|
## Communication API
|
|
|
|
Communication APIs facilitate the communication between different computers by abstracting the underlying necessary hardware commands into easy-to-use portable instructions that can work on a host of different machines. A classical example is MPI.
|
|
|
|
|
|
|
|
Linktest can test and benchmark different communication APIs. The communication API that Linktest uses can be controlled via the `--mode` command-line option. Alternatively it can be specified by appending it as a suffix to the Linktest-executable name, for example `linktest.mpi`, or it can be specified via the `LINKTEST_VCLUSTER_IMPL` environment variable.
|
|
|
|
|
|
|
|
Linktest supports the following communication APIs:
|
|
|
|
|
|
|
|
| API | Env. Variable | Default | Description |
|
|
|
|
| ----- | -------- | ------- | ----------- |
|
|
|
|
| `mpi` | `HAVE_MPI` | Enabled | MPI |
|
|
|
|
| `ibverbs` | `HAVE_IBVERBS` | Enabled | Verbs-based implementation |
|
|
|
|
| `psm2` | `HAVE_PSM2` | Enabled | PSM2 (Omni-Path) |
|
|
|
|
| `cuda` | `HAVE_CUDA` | Enabled | NVLink (node-internal) |
|
|
|
|
| `ucp` | `HAVE_UCP` | Enabled | UCX |
|
|
|
|
| `tcp` | `HAVE_TCP` | Enabled | TCP sockets |
|
|
|
|
|
|
|
|
Note that during Linktest installation only desired supported communication APIs are installed. As such a given Linktest executable may not support all the listed communication APIs. By default all communication APIs are supported after installation.
|
|
|
|
|
|
|
|
## Communication Time
|
|
|
|
The communication time in Linktest is the time it takes from when a message is ready to be sent till it arrives at the recipient and a receipt is returned. Linktest tests two-way communication times, which is the time it takes between the message being ready to be sent till that message is returned and a receipt is sent. This is referred to as the two-way communication time. As opposed to the one-way communication time which is the time from the message-being ready to be sent till a receipt is received that the message has been successfully delivered. If bidirectional testing is used, both communication partners send their identical messages at the same time and timing ends when a partner receives a receipt.
|
|
|
|
|
|
|
|
In a gross oversimplification the communication time consists of two parts, the latency and the transit time. The latency is the time from the message being ready to be sent till sending actually begins. During this time, for example, the connection used to transmit the message is initialized. The transit time is the time it then takes the message to get from its origin to its destination and for a receipt to go back to the destination that the message has been successfully received.
|
|
|
|
|
|
|
|
For small message sizes the communication time is dominated by the latency. For large message sizes the communication time is dominated by transmit time, which depends on the communication bandwidth. As such to benchmark communication latency message sizes as small as possible should be used, ideally 0, but messages must have a non-zero message size, as such 1 should be used. To benchmark transmit times, and indirectly bandwidth, as large as possible, although often as large as feasible, should be used. This ensures that the latency plays a vanishing role in the communication time. Why should message sizes as large as feasible and not as large as possbiel be used here? The answer is that as message size increases the length of time for the benchmark also grows and too large messages sizes might make Linktest take too long.
|
|
|
|
|
|
|
|
## Latency
|
|
|
|
The time it takes before an action can be executed. For Linktest this is the time it takes between a message being ready to be sent till sending begins.
|
|
|
|
|
|
|
|
For the relationship between latency transit time and message size see `[Communication Time](Glossary#Communication Time)`. TODO: Link
|
|
|
|
|
|
|
|
## Message Size
|
|
|
|
The message size is used to refer to the size of messages in bytes used by Linktest to benchmark communication. The relationship between message size and communication/transit time is complex.
|
|
|
|
|
|
|
|
For the relationship between latency transit time and message size see Communication Time. TODO: Link
|
|
|
|
|
|
|
|
## Number Of Messages
|
|
|
|
Linktest benchmarks communications by repeating a communication many times. The amount of times it repeats the sending of messages for timing purposes is controlled via the `--num-messages` command-line argument. This defines how many times the back-and-forth sending of messages is repeated for timing purposes. The final returned times are the average time it took the message to be sent back-and-forth.
|
|
|
|
|
|
|
|
## Number of Warm-Up Messages
|
|
|
|
Linktest warms up connections by testing them multiple times before timing begins. Basically the same actions as during timing occur multiple times beforehand. This is often done because connections need to be first initialized, which means that sending a message the first time often takes longer than when it is sent the second time a short time afterwards. During the first time, sometimes couple of times, a message is sent over a network the network optimizes itself for the transmission of the message, i.e. it becomes primed for this message. As such it often makes sense to include at least one warm-up message before benchmarking a connection. For small message sizes more should be used, 3-5 work well. In Linktest this number of warm-up messages must be stipulated via the `--num-warmup-messages` command-line argument.
|
|
|
|
|
|
|
|
## Serial Testing
|
|
|
|
By default Linktest tests as many connections as possible in parallel, this, however, can cause tests to interfere. This is sometimes desired, for example, if real-world network performance under a sustained network load is to be tested. In other cases peak performance without the interference of other parallel tests is desired. In this case serial testing is done, in which each connection between a pair of tasks is tested individually. This effectively serializes the test and will cause it to take significantly longer. Serial testing can be turned on in Linktest by using the `--serial` command-line option.
|
|
|
|
|
|
|
|
## Transit time
|
|
|
|
Transit time is the time it takes for an object to go from its origin to its destination. During this travel period the object is said to be in transit. For communication times, i.e. the time the message is in transit
|
|
|
|
|
|
|
|
For the relationship between latency transit time and message size see Communication Time. TODO: Link
|
|
|
|
|
|
|
|
## Transport Layer
|
|
|
|
In the OSI model a transport layer is conceptual division of the methods and protocols related to the transport of information, generally in terms of bytes. In Linktest it defines the API (the aforementioned methods and protocols) used to communicate data between, or within, systems. It is generally used in conjunction with which communication-API Linktest should test, which is controlled over the `--mode` option. It, however, should not be confused with the communication API used for testing. The transport layer is an abstract concept. Linktest uses the communication API for the actual establishment and testing of connections. |
|
|
|
\ No newline at end of file |