Using PRNGs for for permuting tasks is insufficient for a large number of tasks.

As we are planning to allow users to average LinkTest results over multiple runs that permute tasks/rank to get a better averages that depend less on the exact test configuration, with respect to which tasks are tested in parallel together in a step, I have had a look into how good our randomization engine needs to be. The answer is so good that for moderate cluster sizes, 100 nodes or more, we will not be able to permute with sufficient statistical uniformity to avoid bias effects.

For shuffling it is suggested to have a bit width of at least roundUp(log2(n!-1)) for high-quality PRNGs (van Staveren, 2000). For a 52-card shuffle this means the PRNG state size for a perfect PRNG should have at least bit width of 226 bits. Its is suggested that the bit-size be several magnitudes greater. For 52-card card-games bit sizes of 100 000 or more are used. For 100 nodes a perfect PRNG therefore requires a bit width of 526. I have created a table below for bit widths (b) of perfect PRNGs for set sizes (n) that are a power of to 10:

n	b
1	1
10	4
100	527
1 000	8 559

For n= 10 000 I was no longer bothered to do fix-point arithmetic (the numbers are too big for floating-point arithmetic using 64 bits), based on a lazy second-order polynomial regression the values should exceed 100 000. If we take a bit width 3 magnitudes larger because our PRNG is not perfect we end up with a bit width in excess of 100 000 000, approximately 100 MiB. Note this is an upper bound, as the growth rate of b is approximately logarithmic as n tends to infinity, see https://www.wolframalpha.com/input?i=D%5BLog%5B2%2CGamma%5B1+%2B+x%5D%5D%2Cx%5D. This kind of state is going to quickly become unwieldy. For one we will need to use fixed-point arithmetic to compute the value, which is slow and we will need to calculate the fixed point factorial which is slow. Then we will need a good implementation of PRNGs for arbitrary state sizes. I do not think that implementing this is a realistic option. Therefore we need to allow the user to determine their own patterns, this way we can also leave it up to the user to generate sufficiently uncorrelated permutations. Generating these is not something we can provide in the near future.

So what happens if we just ignore this, well in that case we will run into situations in which the state size of our PRNG will no longer be able to cover the entire space of all possible permutation, any sort of uniformity in the permutation selection would have broken down long before this point. This means that for a given n it could be possible that, in the worst case, the only permutations that we can generate are integer shifts of each other, which for our testing would be akin to changing the order in which we tests steps. This means that if there are artifacts in the generated data due to the test configuration then no matter what seed the user specifies to LinkTest always the same artifacts will be present as effectively always the same test is performed. For what value of n this will occur is not readily predictable, but it highlights that our permutations could easily become biased.

Our Mersenne Twister 19937 is therefore barely good enough for permuting 100 elements, but is insufficient for permuting 1 000+ elements. It can barely theoretically cover the permutation space of 2000 elements, which for a perfect PRNG requires a bit width of 19 119. Furthermore using 32-bit seeds we can only initiate it at best with a meager 32-bit subset of the 19937-bit wide state space. This means that if we sequentially query form it from one iteration to the next, all but the first permutation may not be able to be matched to a seed for the purposes of reproducibility. They can be matched to a seed plus iteration. To circumvent this we could allow users to specify a state.

This makes it all the more important that we allow users to use their own user-defined communication patterns.

Here is some good reading:

van Staveren 2000 -> https://sater.home.xs4all.nl/doc.html
https://peteroupc.github.io/random.html#Shuffling
Generating Random Permutations by Jörg Arndt (2009) -> https://maths-people.anu.edu.au/~brent/pd/Arndt-thesis.pdf
https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

Edited Mar 02, 2022 by Max Holicki

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information