For Version 2.1.17
For older releases see here.
LinkTest SION files consist of three components:
- A File Header
- Data Chunks Per Rank
- A File Footer
The SION file starts with the File Header, which in turn starts with a SIONlib header. Then N
Data Chunks follow, one for each rank enumerate from rank 0. Each is terminated by an "END_BLOCK" statement. After all Data chunks there is a file footer at the end.
The structure is:
File Header
Data Chunk 0
Data Chunk 1
Data Chunk 2
...
Data Chunk N
File Footer
In the following we will describe the binary structure of each of the different components of the file.
File Header
Name | ID | Start [B] | End [B] | Size [B] | Fixed Value | Description |
---|---|---|---|---|---|---|
SIONlib Header | 0 | 0+a | a | SIONlib file header - exact size depends on SIONlib configuration | ||
LinkTest ID String | 0+a | 8+a | 5 | "LinkTest" | LinkTest ASCII-character tag identifying the start of the LinkTest portion of the SION file. | |
LinkTest Major Version | 8+a | 12+a | 4 | 2 | 32-bit unsigned integer identifying the LinkTest major version number. | |
LinkTest Minor Version | 12+a | 16+a | 4 | 1 | 32-bit unsigned integer identifying the LinkTest minor version number. | |
LinkTest Patch Level | 16+a | 20+a | 4 | 17 | 32-bit unsigned integer identifying the LinkTest patch-level number. | |
LinkTest GitHash | 20+a | 61+a | 41 | 41-byte ASCII-character git hash identifying the commit used to generated the data. | ||
LinkTest Mode Length | b | 61+a | 65+a | 4 | 32-bit unsigned integer specifying the length of the following string indicating the transport layer/mode/virtual-cluster implementation used for the LinkTest run. | |
LinkTest Mode | 65+a | 65+a+b | b | Null-terminated ASCII character array identifying the transport layer/mode/virtual-cluster implementation used for the LinkTest run. | ||
All-To-All Flag | c | 65+a+b | 66+a+b | 1 | 8-bit integer that if non-zero indicates that all-to-all testing was performed. | |
Bidirectional Flag | 66+a+b | 67+a+b | 1 | 8-bit integer that if non-zero indicates that bidirectional testing was performed. | ||
Unidirectional Flag | 67+a+b | 68+a+b | 1 | 8-bit integer that if non-zero indicates that unidirectional testing was performed. | ||
Bisection Flag | 68+a+b | 69+a+b | 1 | 8-bit integer that if non-zero indicates that bisection testing was performed. | ||
Step-Randomization Flag | 69+a+b | 70+a+b | 1 | 8-bit integer that if non-zero indicates that the step order was randomized during testing. | ||
Serial Flag | 70+a+b | 71+a+b | 1 | 8-bit integer that if non-zero indicates that serialized testing was performed. | ||
No-SION-File Flag | 71+a+b | 72+a+b | 1 | 0 | 8-bit integer that if non-zero indicates that no SION file was written. | |
Parallel-SION Flag | 72+a+b | 73+a+b | 1 | 8-bit integer that if non-zero indicates that the SION file was written in parallel. | ||
GPU-Memory Flag | 73+a+b | 74+a+b | 1 | 8-bit integer that if non-zero indicates that messages were stored in GPU RAM instead of CPU RAM. | ||
Multi-Buffer Flag | 74+a+b | 75+a+b | 1 | 8-bit integer that if non-zero indicates that messages were stored in multiple memory buffers. | ||
Randomized-Buffer Flag | 75+a+b | 76+a+b | 1 | 8-bit integer that if non-zero indicates that message buffers were randomized. | ||
Check-Buffer Flag | 76+a+b | 77+a+b | 1 | 8-bit integer that if non-zero indicates that message buffers were checked at the end of a step. | ||
Memory Allocator Type | 77+a+b | 78+a+b | 1 | 8-bit integer that indicates the type of memory allocator used to allocate the buffers for the messages. | ||
Number Of Messages | 78+a+b | 86+a+b | 8 | 64-bit unsigned integer indicating the number of messages passed between partners during measurements. | ||
Message Size | 86+a+b | 94+a+b | 8 | 64-bit unsigned integer indicating the size of the messages passed between partners during measurements and warm up. | ||
Num. Warm-up Messages | 94+a+b | 102+a+b | 8 | 64-bit unsigned integer indicating the number of warm-up messages passed between partners during measurements. | ||
Num. Serial Retests | d | 102+a+b | 110+a+b | 8 | 64-bit unsigned integer indicating the number of serial retests of the worst connections. | |
Num. Multiple Buffers | 110+a+b | 118+a+b | 8 | 64-bit unsigned integer indicating the number of buffers used to store messages in a rolling fashion. | ||
Buf. Randomization Seed | 118+a+b | 126+a+b | 8 | The 64-bit seed used for buffer randomization. | ||
Num. Randomized Tasks | 126+a+b | 134+a+b | 8 | The number of test iterations with randomized tasks. | ||
Task Randomization Seed | 134+a+b | 142+a+b | 8 | The 64-bit seed used for task randomization. | ||
End-Of-Header ID | 142+a+b | 152+a+b | 10 | "END_HEADER" | 10-byte ASCII ASCII character array identifying the end of the header. |
Data Chunk
Each Data Chunk consists of a small header, which indicate where the rank ran. This is followed by the main data section which contains the timing data and additional aggregated data in the case of Rank 0. Finally a tiny footer indicates the end of the chunk.
If multiple permutations of randomized ranks were tests this data is stored back-to-back inside the Data Chunk. Given M permutations the Data Chunk for a rank would look like:
Data Chunk Header
Data Chunk Data - Permutation 1
Data Chunk Data - Permutation 2
...
Data Chunk Data - Permutation M
Data Chunk Footer
Data-Chunk Header
Each Data Chunk start with a small header indicate where the rank ran.
Rank 0
Name | ID | Start [B] | End [B] | Size [B] | Description |
---|---|---|---|---|---|
Hostname Length | d | 0 | 4 | 4 | Length of the hostname on which the rank ran. |
Hostname | 4 | 4+e | d | The null-terminated ASCII hostname on which the rank ran. | |
Core ID | 4+e | 8+e | 4 | 32-bit integer indicating the core on which the rank ran. |
Data Chunk Data
After the Data Chunk header comes the recorded timing data, with rank 0 including additional aggregated data at the start. If multiple randomized permutations of the ranks are used then the recorded timing data of each permutation follow directly after each other. Let us first present the additional data for rank 0 rank 0.
Rank 0 Additional Data-Chunk Header
Name | Start [B] | End [B] | Size [B] | Logical Expression | Description |
---|---|---|---|---|---|
Start Time | 0 | 32 | 32 | Null-terminated ASCII character array stating the time when this test started. | |
Minimum Time | 32 | 40 | 8 | Minimum recorded time for the test as a double-precision number. | |
Average Time | 40 | 48 | 8 | Average recorded time for the test as a double-precision number. | |
Maximum Time | 48 | 56 | 8 | Maximum recorded time for the test as a double-precision number. | |
All-To-All Minimum Time | 56 | 56+1f | f | If c: f=8 else f=0 | Minimum recorded all-to-all time for the test as a double-precision number. |
All-To-All Average Time | 56+1f | 56+2f | f | Average recorded all-to-all time for the test as a double-precision number. | |
All-To-All Maximum Time | 56+2f | 56+3f | f | Maximum recorded all-to-all time for the test as a double-precision number. | |
Retested Slow Timings | 56+3f | 56+3f+8d | 8d | Timings for the retested slowest connections as an array of double precision numbers. | |
Slowest Timings | 56+3f+8d | 56+3f+16d | 8d | Timings of the slowest connections as an array of double precision numbers. | |
Slow Sending Partners | 56+3f+16d | 56+3f+24d | 8d | Rank of the partner that initiated the connections for the slow timings. | |
Slow Receiving Partners | 56+3f+24d | 56+3f+32d | 8d | Rank of the partner that was on the receiving end of the connections for the slow timings. | |
End Time | 56+3f+32d | 56+3f+32d | 32 | Null-terminated ASCII character array stating the time when this test finished. |
Data-Chunk Data
For each rank the data-chunk data looks as follows (for rank 0 this follows after the above header):
Name | Start [B] | End [B] | Size [B] | Fixed Value | Description |
---|---|---|---|---|---|
Timing Data | 0 | 8(N-1) | 8(N-1) | Recorded timing data for the rank as an array of double-precision numbers. | |
Access Pattern | 8(N-1) | 16(N-1) | 8(N-1) | Access pattern for generating the recorded data for the rank as an array of 64-bit unsigned integers. | |
All-To-All Timing | 16(N-1) | 16(N-1)+f | f | Recorded all-to-all time for the rank as a double-precision number. |
This is repeated for every iteration.
Data Chunk Footer
Each Data Chunk end with an "END_BLOCK" for alignment purposes.
Name | Start [B] | End [B] | Size [B] | Fixed Value | Description |
---|---|---|---|---|---|
End Block | 0 | 9 | 9 | "END_BLOCK" | "END_BLOCK" statement indicating the end of a Data Chunk. |
File Footer
The end of the file is used by SIONlib to store data inside a footer. LinkTest does not store any data in a footer.