Timings are run twice and half are discarded
When Linktest performs ping-pong benchmarks it performs the send and receive operation necessary to generate the timings twice. Once in benchmark.cc
where the main parallel benchmark resides:
427:static int linktest_benchmark_work_pingpong_parallel(LinktestBenchmark& bench,
428: int partner, int sign,
429: double* time)
430:{
431: double tmp1, tmp2;
432:
433: bench.barrier();
434:
435: auto from = (sign < 0) ? partner : bench.rank();
436: auto to = (sign < 0) ? bench.rank() : partner;
437:
438: auto ret = bench.kernel(from, to, &tmp1);
439: if (unlikely(ret))
440: return 1;
441:
442: bench.barrier();
443:
444: ret = bench.kernel(to, from, &tmp2);
444: if (unlikely(ret))
445: return 1;
446:
447: *time = (sign > 0) ? tmp1 : tmp2;
448:
449: bench.barrier();
450:
451: return 0;
452:}
Here we see that benchmark kernel bench.kernel
is executed twice with exchanged sender and recipient. Note that half of the timing data is discarded in line 447. @frings2 Do you know why we do this? Is this for load balancing?
Internally bench.kernel
calls either kpinpong
or kbipingpong
, which performs a pingpong test by sending to and receiving a message from another party. This is what is timed, i.e. the time to repeatedly send and receive a message. Here is an example from the UCX implementation:
684:int VirtualClusterUCP::kpingpong(int from, int to, MemoryBuffer& buf,
685: int num_msg, double* timing)
686:{
687: static const int PING = 100;
688: static const int PONG = 101;
689:
690: double tv = 0;
691:
692: buf.fill();
693:
694: if(rank() == from) {
695: tv = walltime();
696:
697: for (auto i = 0; i < num_msg; ++i) {
698: auto ret = kpingpong_send(to, buf, PING);
699: if (unlikely(ret))
700: return 1;
701:
702: ret = kpingpong_recv(to, buf, PONG);
703: if (unlikely(ret))
704: return 1;
705: }
706:
707: tv = walltime() - tv;
708: tv = tv / (2.0*num_msg);
709: } else if(rank() == to) {
710: for (auto i = 0; i < num_msg; ++i) {
711: auto ret = kpingpong_recv(from, buf, PING);
712: if (unlikely(ret))
713: return 1;
714:
715: ret = kpingpong_send(from, buf, PONG);
716: if (unlikely(ret))
717: return 1;
718: }
719: } else {
720: error("Invalid rank.");
721: return 1;
722: }
723:
724: if(timing)
725: *timing = tv;
726:
727: return 0;
728:}
Is this what we want? We are effectively doing all computations twice for no apparent reasons. We could probably cut down compute time by up to 50% by correcting this. @frings2 what do you think?