LinkTest Version 2.1.16
LinkTest SION files consist of three components:
- A File Header
- Data Chunks Per Rank
- A File Footer
The SION file starts with the File Header, which in turn starts with a SIONlib header. Then N
Data Chunks follow, one for each rank enumerate from rank 0. Each is terminated by an "END_BLOCK" statement. After all Data chunks there is a file footer at the end.
The structure is:
File Header
Data Chunk 0
Data Chunk 1
Data Chunk 2
...
Data Chunk N
File Footer
In the following we will describe the binary structure of each of the different components of the file.
File Header
Name | ID | Start [B] | End [B] | Size [B] | Fixed Value | Description |
---|---|---|---|---|---|---|
SIONlib Header | 0 | 0+a | a | SIONlib file header | ||
LinkTest ID String | 0+a | 5+a | 5 | "LKTST" | LinkTest ASCII-character tag identifying the start of the LinkTest portion of the SION file. | |
LinkTest Major Version | 5+a | 9+a | 4 | 2 | 32-bit unsigned integer identifying the LinkTest major version number. | |
LinkTest Minor Version | 9+a | 13+a | 4 | 1 | 32-bit unsigned integer identifying the LinkTest minor version number. | |
LinkTest Patch Level | 13+a | 17+a | 4 | 16 | 32-bit unsigned integer identifying the LinkTest patch-level number. | |
LinkTest GitHash | 17+a | 58+a | 41 | 41-byte null-terminated ASCII-character git hash identifying the commit used to generated the data. | ||
LinkTest Mode Length | b | 58+a | 62+a | 4 | 32-bit unsigned integer specifying the length of the following string indicating the transport layer/mode/virtual-cluster implementation used for the LinkTest run. | |
LinkTest Mode | 62+a | 62+a+b | b | Null-terminated ASCII character array identifying the transport layer/mode/virtual-cluster implementation used for the LinkTest run. | ||
All-To-All Flag | c | 62+a+b | 63+a+b | 1 | 8-bit integer that if non-zero indicates that all-to-all testing was performed. | |
Bidirectional Flag | 63+a+b | 64+a+b | 1 | 8-bit integer that if non-zero indicates that bidirectional testing was performed. | ||
Unidirectional Flag | 64+a+b | 65+a+b | 1 | 8-bit integer that if non-zero indicates that unidirectional testing was performed. | ||
Bisection Flag | 65+a+b | 66+a+b | 1 | 8-bit integer that if non-zero indicates that bisection testing was performed. | ||
Step-Randomization Flag | 66+a+b | 67+a+b | 1 | 8-bit integer that if non-zero indicates that the step order was randomized during testing. | ||
Serial Flag | 67+a+b | 68+a+b | 1 | 8-bit integer that if non-zero indicates that serialized testing was performed. | ||
No-SION-File Flag | 68+a+b | 69+a+b | 1 | 0 | 8-bit integer that if non-zero indicates that no SION file was written. | |
Parallel-SION Flag | 69+a+b | 70+a+b | 1 | 8-bit integer that if non-zero indicates that the SION file was written in parallel. | ||
GPU-Memory Flag | 70+a+b | 71+a+b | 1 | 8-bit integer that if non-zero indicates that messages were stored in GPU RAM instead of CPU RAM. | ||
Multi-Buffer Flag | 71+a+b | 72+a+b | 1 | 8-bit integer that if non-zero indicates that messages were stored in multiple memory buffers. | ||
Randomized-Buffer Flag | 72+a+b | 73+a+b | 1 | 8-bit integer that if non-zero indicates that message buffers were randomized. | ||
Check-Buffer Flag | 73+a+b | 74+a+b | 1 | 8-bit integer that if non-zero indicates that message buffers were checked at the end of a step. | ||
Memory Allocator Type | 74+a+b | 75+a+b | 1 | 8-bit integer that indicates the type of memory allocator used to allocate the buffers for the messages. | ||
Number Of Messages | 75+a+b | 83+a+b | 8 | 64-bit unsigned integer indicating the number of messages passed between partners during measurements. | ||
Message Size | 83+a+b | 91+a+b | 8 | 64-bit unsigned integer indicating the size of the messages passed between partners during measurements and warm up. | ||
Num. Warm-up Messages | 91+a+b | 99+a+b | 8 | 64-bit unsigned integer indicating the number of warm-up messages passed between partners during measurements. | ||
Collect P-Num | 99+a+b | 107+a+b | 8 | Depreciated! | ||
Num. Serial Retests | d | 107+a+b | 115+a+b | 8 | 64-bit unsigned integer indicating the number of serial retests of the worst connections. | |
Num. Multiple Buffers | 115+a+b | 123+a+b | 8 | 64-bit unsigned integer indicating the number of buffers used to store messages in a rolling fashion. | ||
Buf. Randomization Seed | 123+a+b | 131+a+b | 8 | The 64-bit seed used for buffer randomization. | ||
Num. Randomized Tasks | 131+a+b | 139+a+b | 8 | The number of test iterations with randomized tasks. | ||
Task Randomization Seed | 139+a+b | 147+a+b | 8 | The 64-bit seed used for task randomization. |
Data Chunk
Each Data Chunk consists of a small header, which indicate where the rank ran. This is followed by the main data section which contains the timing data and additional aggregated data in the case of Rank 0. Finally a tiny footer indicates the end of the chunk.
If multiple permutations of randomized ranks were tests this data is stored back-to-back inside the Data Chunk. Given M permutations the Data Chunk for a rank would look like:
Data Chunk Header
Data Chunk Data - Permutation 1
Data Chunk Data - Permutation 2
...
Data Chunk Data - Permutation M
Data Chunk Footer
Data Chunk Header
Each Data Chunk start with a small header indicate where the rank ran. For the non-zero ranks this is started with the "LKTST" ID-String.
Rank 0
Name | ID | Start [B] | End [B] | Size [B] | Description |
---|---|---|---|---|---|
Hostname Length | d | 0 | 4 | 4 | Length of the hostname on which the rank ran. |
Hostname | 4 | 4+e | d | The null-terminated ASCII hostname on which the rank ran. | |
Core ID | 4+e | 8+e | 4 | 32-bit integer indicating the core on which the rank ran. |
Non-Zero Rank
Name | ID | Start [B] | End [B] | Size [B] | Description |
---|---|---|---|---|---|
LinkTest ID String | 0 | 5 | 5 | LinkTest "LKTST" ASCII-character tag. | |
Hostname Length | d | 5 | 9 | 4 | Length of the hostname on which the rank ran. |
Hostname | 9 | 9+e | d | The null-terminated ASCII hostname on which the rank ran. | |
Core ID | 9+e | 13+e | 4 | 32-bit integer indicating the core on which the rank ran. |
Data Chunk Data
After the Data Chunk header comes the recorded timing data, with rank 0 including additional aggregated data. If multiple randomized permutations of the ranks are used then the recorded timing data of each permutation follow directly after each other. Let us first present the table for rank 0 and then the table for the other ranks.
Rank 0
Name | Start [B] | End [B] | Size [B] | Logical Expression | Description |
---|---|---|---|---|---|
Start Time | 0 | 32 | 32 | Null-terminated ASCII character array stating the time when this test started. | |
Minimum Time | 32 | 40 | 8 | Minimum recorded time for the test as a double-precision number. | |
Average Time | 40 | 48 | 8 | Average recorded time for the test as a double-precision number. | |
Maximum Time | 48 | 56 | 8 | Maximum recorded time for the test as a double-precision number. | |
All-To-All Minimum Time | 56 | 56+1f | f | If c: f=8 else f=0 | Minimum recorded all-to-all time for the test as a double-precision number. |
All-To-All Average Time | 56+1f | 56+2f | f | Average recorded all-to-all time for the test as a double-precision number. | |
All-To-All Maximum Time | 56+2f | 56+3f | f | Maximum recorded all-to-all time for the test as a double-precision number. | |
Timing Data | 56+3f | 56+3f+8(N-1) | 8(N-1) | Recorded timing data for the rank as an array of double-precision numbers. | |
Access Pattern | 56+3f+8(N-1) | 56+3f+16(N-1) | 8(N-1) | Access pattern for generating the recorded data for the rank as an array of 64-bit unsigned integers. | |
All-To-All Timing | 56+3f+16(N-1) | 56+4f+16(N-1) | f | Recorded all-to-all time for the rank as a double-precision number. | |
Retested Slow Timings | 56+4f+16(N-1) | 56+4f+16(N-1)+8d | 8d | Timings for the retested slowest connections as an array of double precision numbers. | |
Slowest Timings | 56+4f+16(N-1)+8d | 56+4f+16(N-1)+16d | 8d | Timings of the slowest connections as an array of double precision numbers. | |
Slow Sending Partners | 56+4f+16(N-1)+16d | 56+4f+16(N-1)+24d | 8d | Rank of the partner that initiated the connections for the slow timings. | |
Slow Receiving Partners | 56+4f+16(N-1)+24d | 56+4f+16(N-1)+32d | 8d | Rank of the partner that was on the receiving end of the connections for the slow timings. | |
End Time | 56+4f+16(N-1)+32d | 88+4f+16(N-1)+32d | 32 | Null-terminated ASCII character array stating the time when this test finished. |
Non-Zero Rank
Name | Start [B] | End [B] | Size [B] | Logical Expression | Description |
---|---|---|---|---|---|
Timing Data | 0 | 8(N-1) | 8(N-1) | Recorded timing data for the rank as an array of double-precision numbers. | |
Access Pattern | 8(n-1) | 16(N-1) | 8(N-1) | Access pattern for generating the recorded data for the rank as an array of 64-bit unsigned integers. | |
All-To-All Timing | 16(N-1) | 16(N-1)+g | g | If A: g=8 else g=0 | Recorded all-to-all time for the rank as a double-precision number. |
Data Chunk Footer
Each Data Chunk end with an "END_BLOCK" for alignment purposes.
Name | Start [B] | End [B] | Size [B] | Description |
---|---|---|---|---|
End Block | 0 | 9 | 9 | "END_BLOCK" statement indicating the end of a Data Chunk. |
After the Data Chunk Footer there maybe unused data till the end of a SIONlib Chunk. Please see SIONlib on how to identify the SIONlib chunk size.
File Footer
The end of the file is used by SIONlib to store data inside a footer. LinkTest does not store any data in a footer.