... | ... | @@ -3,25 +3,25 @@ |
|
|
# My SION files are HUGE! What can I do?
|
|
|
If you write out your SION files in parallel, which causes fragmented SION files to be written out, see [SION File Defragmentation](Usage#sion-file-defragmentation), otherwise only non-lossy compression can help you further.
|
|
|
|
|
|
Note that you can still load defragmented SION files into python, see [Linktest Python Reader](Linktest-Python-Reader), and they can still be used to generate reports, see [Linktest Report](Linktest-Report). After defragmentation compressed SION files can be further compressed using any non-lossy compression tools. The resultant compressed file can no longer be loaded into python and hence reports based on it cannot be generated unless the file is decompressed first.
|
|
|
Note that you can still load defragmented SION files into python, see [LinkTest Python Reader](LinkTest-Python-Reader), and they can still be used to generate reports, see [LinkTest Report](LinkTest-Report). After defragmentation compressed SION files can be further compressed using any non-lossy compression tools. The resultant compressed file can no longer be loaded into python and hence reports based on it cannot be generated unless the file is decompressed first.
|
|
|
|
|
|
# In Linktest Report (the python tool) I cannot read the indexed-image tick labels. Can I increase the font size?
|
|
|
You may be able to indirectly increase the font size. The font size is limited by two factors, the number of ticks, which limits the vertical height of the text of each tick, and the maximum length of each tick label. See the `--downsampling_factor_matrix_ticks` option in [the Linktest Report options](Linktest-Report#options) to learn how to reduce the number of ticks plotted, which increases the vertical space allocated to each tick label. See the `--domain` option in [the Linktest Report options](Linktest-Report#options) to earn how domains, or any string, can be removed from the tick labels. Shortening tick labels allows for larger font sizes for a given maximum tick-label width.
|
|
|
# In LinkTest Report (the python tool) I cannot read the indexed-image tick labels. Can I increase the font size?
|
|
|
You may be able to indirectly increase the font size. The font size is limited by two factors, the number of ticks, which limits the vertical height of the text of each tick, and the maximum length of each tick label. See the `--downsampling_factor_matrix_ticks` option in [the LinkTest Report options](LinkTest-Report#options) to learn how to reduce the number of ticks plotted, which increases the vertical space allocated to each tick label. See the `--domain` option in [the LinkTest Report options](LinkTest-Report#options) to earn how domains, or any string, can be removed from the tick labels. Shortening tick labels allows for larger font sizes for a given maximum tick-label width.
|
|
|
|
|
|
# Linktest Report (the python tool) takes too long. Can it go faster?
|
|
|
# LinkTest Report (the python tool) takes too long. Can it go faster?
|
|
|
Probably, here are a few things that will speed up report generation:
|
|
|
|
|
|
1. Defragment the SION file before using it to generate a report, see [SION File Defragmentation](Usage#sion-file-defragmentation). This will speed up loading the data into python. However, if you only plan to generate one report this is likely not worth as the time gained in making the report is lost during the defragmentation of the SION file.
|
|
|
|
|
|
2. Use the `--downsampling_factor_matrix_ticks` option, see [the Linktest Report options](Linktest-Report#options). Plotting tick labels in MatPlotLib is very slow, as such reducing the number of tick labels to plot also speeds up the report generation. An added bonus is that tick labels may also become larger, making them easier to read.
|
|
|
2. Use the `--downsampling_factor_matrix_ticks` option, see [the LinkTest Report options](LinkTest-Report#options). Plotting tick labels in MatPlotLib is very slow, as such reducing the number of tick labels to plot also speeds up the report generation. An added bonus is that tick labels may also become larger, making them easier to read.
|
|
|
|
|
|
3. If you are prone to cancelling seemingly hanging processes early because of no command-line output use the `verbose` option to see timing information for segments of the report generation.
|
|
|
|
|
|
4. Use a newer version of Python or MatPlotLib. Although the report tool was originally developed for Python 3.8.5 and MatPlotLib version 3.3.1 upgrading MatPlotLib version 3.3.4 improved a 2 minute run using a defragmented SION file by approximately 15%. Upgrading to Python 3.9.0 cut the time to just above 1 minute. The problem is mostly the slow MatPlotLib back end for generating plots. The back ends are optimized for quality, not performance. Profiling indicates that for larger SION files, 500 MiB and above after defragmentation, the MatPlotLib back end takes up about 80% of the compute time of the report.
|
|
|
|
|
|
5. Use the supplied pingponganalysis tools. These create postscript files that can be converted to pdf. Generating a comparable PDF report to the above mentioned 2 minute report only takes about 5 to 10 seconds. Please note that the pingponganalysis tools are only kept up-to-date with the current version of Linktest.
|
|
|
5. Use the supplied pingponganalysis tools. These create postscript files that can be converted to pdf. Generating a comparable PDF report to the above mentioned 2 minute report only takes about 5 to 10 seconds. Please note that the pingponganalysis tools are only kept up-to-date with the current version of LinkTest.
|
|
|
|
|
|
6. Read the SION files directly into Python and inspect the data there using the [Linktest Python Reader](Linktest-Python-Reader). This does not substitute a nice and easy to read report, but gives you the flexibility of looking at the data more in depth or to produce figures that better fit your needs.
|
|
|
6. Read the SION files directly into Python and inspect the data there using the [LinkTest Python Reader](LinkTest-Python-Reader). This does not substitute a nice and easy to read report, but gives you the flexibility of looking at the data more in depth or to produce figures that better fit your needs.
|
|
|
|
|
|
# The colourbar in the report extends outside its bounding box! How can I fix this?
|
|
|
TLDR: Update MatPlotLib and Python.
|
... | ... | @@ -45,26 +45,26 @@ Here is a comparison table between the two standards: |
|
|
|G|$`10^9`$|Ti|$`2^{30}`$|1.073|
|
|
|
|T|$`10^{12}`$|Gi|$`2^{40}`$|1.100|
|
|
|
|
|
|
A common problem with these unit prefixes is that they are equated to metric prefixes, however, for larger units the difference between prefixes grows substantially as indicated in the fifth column of the table, which shows the ratio of binary prefix value to the corresponding metric prefix value. Lesson to learn, do not equate these prefixes! A 7.3% difference may not seem like a lot but when benchmarking connections it can be the difference between the value you expect and the one Linktest returns.
|
|
|
A common problem with these unit prefixes is that they are equated to metric prefixes, however, for larger units the difference between prefixes grows substantially as indicated in the fifth column of the table, which shows the ratio of binary prefix value to the corresponding metric prefix value. Lesson to learn, do not equate these prefixes! A 7.3% difference may not seem like a lot but when benchmarking connections it can be the difference between the value you expect and the one LinkTest returns.
|
|
|
|
|
|
# I am running a latency test and the first row in my timings matrix is much slower than the others. What can I do?
|
|
|
TLDR: You likely forgot to use warm-up messages.
|
|
|
|
|
|
This depends on what you want to measure. Most systems in a computer operate on a on-demand basis to conserve resources. That means that connections are only established and relevant devices intialized the first they are used. For a default Linktest run without randomization of the test order the first row in the timing matrix corresponds to the first connections that were tested. This means that the relevant connections and associated hardware, like the required interconnects, had to be initialized which takes times. The other rows in the timing matrix are in the default scenario measured later and as a result their latency only depends on the initialization of the connection.
|
|
|
This depends on what you want to measure. Most systems in a computer operate on a on-demand basis to conserve resources. That means that connections are only established and relevant devices initialized the first they are used. For a default LinkTest run without randomization of the test order the first row in the timing matrix corresponds to the first connections that were tested. This means that the relevant connections and associated hardware, like the required interconnects, had to be initialized which takes times. The other rows in the timing matrix are in the default scenario measured later and as a result their latency only depends on the initialization of the connection.
|
|
|
|
|
|
This means that if you want to approximate the latency due to the start up of the relevant devices you can take the first row and from each element subtract the average time for each column, excluding the first element.
|
|
|
|
|
|
If we want our timing information not to be corrupted by these start-up times for the connections and associated hardware then we have two options:
|
|
|
|
|
|
1. We can turn on all-to-all testing if using the MPI communication API. All-to-all testing occurs before and after the main test. As all-to-all testing requires a subset of the connections is likely that all relevant hardware is initialized and some of the to-be-tested connections. This, however, is not a perfect solution.
|
|
|
2. Use warm-up messages. Using a non-zero number of warm-up messages causes connections to be pretested, also applies to all-to-all. As such timing only occurres using initialized connections. This will hence fully remove the described error that the first row exhibits slower timings and will in general improve timings across the board.
|
|
|
2. Use warm-up messages. Using a non-zero number of warm-up messages causes connections to be pretested, also applies to all-to-all. As such timing only occurs using initialized connections. This will hence fully remove the described error that the first row exhibits slower timings and will in general improve timings across the board.
|
|
|
|
|
|
Note that Linktest does not have the ability to explicitly pre-initialise hardware or connections.
|
|
|
Note that LinkTest does not have the ability to explicitly pre-initialise hardware or connections.
|
|
|
|
|
|
# My Linktest timing matrix has a checkerboard pattern when running multiple tasks per node. What can I do?
|
|
|
# My LinkTest timing matrix has a checkerboard pattern when running multiple tasks per node. What can I do?
|
|
|
TLDR: Change your process pinning.
|
|
|
|
|
|
This is an artifact of your process pinning. Due to the way in which modern CPUs are constructed certain CPU cores have faster access to certain hardware devices, and hence faster to connections to the CPUs of other nodes, than other cores. This manifests itself commonly in checkerboard patterns in the timing matrix. The checkerboard pattern can commonly be avoided by reorganizing the rows and columns. This reorganization can be achieved by changing the processor pinning when Linktest is executed. For more information on how this is done please see the documentation for the tools you use to execute Linktest in parallel, for example `mpiexec` or `srun`.
|
|
|
This is an artifact of your process pinning. Due to the way in which modern CPUs are constructed certain CPU cores have faster access to certain hardware devices, and hence faster to connections to the CPUs of other nodes, than other cores. This manifests itself commonly in checkerboard patterns in the timing matrix. The checkerboard pattern can commonly be avoided by reorganizing the rows and columns. This reorganization can be achieved by changing the processor pinning when LinkTest is executed. For more information on how this is done please see the documentation for the tools you use to execute LinkTest in parallel, for example `mpiexec` or `srun`.
|
|
|
|
|
|
# How can I generate an animated GIF of multiple indexed-images from the Python reports, e.g. for presentations?
|
|
|
This can be done using [Image Magick](https://imagemagick.org/index.php), a command-line image-manipulation tool. The basic idea is to combine the various PDF reports that you want to include in the GIF into one PDF, for example using `pdfunite`, a commonly available tool on many Linux distributions. Then process the PDF using Image Magick as follows:
|
... | ... | |