Implemented feature to group processes according to common hostnames

assigned to @mueller24

After checking the mapWithBisection algorithm, I am pretty sure it does not match with the picture you drew in the wiki. The algorithm is a round robin where (sticking with your 16 ranks example) rank 0 will be paired to ranks 15,8,9,10,...,14. Rank 1 to 14,15,8,9,... It is unclear where rank 0 is, but none of the ranks show this round robin behaviour. How did you get to the matching in your picture? Did you use DEBUG_PRINT_COMMUNICATION_SCHEME?

I would also argue that it is unclear how those features should interact, I would say they shouldn't be used together right now, for the sake of clarity. So my suggestion is that I prohibit use of bisection + groupByHostname before merging.

How is it not a round robin?

Let us map processes to ranks:

Red     0 ->  0
Red     1 ->  1
Red     2 ->  2
Red     3 ->  3
Green   0 ->  4
Green   1 ->  5
Green   2 ->  6
Green   3 ->  7
Blue    0 ->  8
Blue    1 ->  9
Blue    2 -> 10
Blue    3 -> 11
Magenta 0 -> 12
Magenta 1 -> 13
Magenta 2 -> 14
Magenta 3 -> 15

Let us map connection colors to steps:

1 -> Black
2 -> red
3 -> dark green
4 -> blue
5 -> magenta
6 -> cyan
7 -> yellow
8 -> green

With this mapping the processes connect to partners in the following order:

0 -> 15, 8, 9,10,11,12,13,14
1 -> 14,15, 8, 9,10,11,12,13
...

Is this not what you describe?

If you want to see how I made the picture open it in a text editor. I have included the python script to generate the paths at the top.

Why would you prohibit using the features working together? They work together as intended. If you combine bisection with hostname grouping then hostname groups are split into bisecting halves. These are iterated through. While testing between one-group and another all possible pair-iterations are iterated over, although not all possible permutations (which are partially covered by randomization if

n

is not too large. I made sure that all these features would work together.

The question is, what is "the intention" nobody ever requested this feature. (Group + Bisection). You implemented it so that it would work together, but few thoughts were spent on how it should work. What is the use case and what do we want to measure with it? If that is defined we can decide if our implementation is suitable or not.

Thanks for the clarification on the mapping, I tried to map ranks around the clock, which did not work, that's why I thought this can't be correct. But with your provided mapping it seems to fit my expectations. It's a bit odd maybe that we start counting clockwise and then jump back to green counterclockwise, but ok.

I think the best is to merge this as is. Users who request both bisection and grouping should know what they want. And I will enhance the documentation in terms of 1-factor vs round-robin matching and the resulting effects.

approved this merge request

Yes, you are right regarding the intention. I gave little thought to it. I have implemented it such that bisection would behave the same as for individual processes as for groups of processes. I thought this made the most sense, but I agree that no one asked for this feature. I just found it odd to make the grouping-according-to-hostname and bisection features exclusive. Thanks for merging.

Implemented feature to group processes according to common hostnames

Merge request reports

Activity