Why do we fill our memory buffers after creating them only to later fill them again in the kernels?
I think we can remove the filling in the kernels. I am just unsure if that will work properly with device memory (GPU). GPU context may not be available yet. It may also be the case that the fill is required for the memory mapping.
See:
829: auto N = args->len_msg + cl->kpingpong_minimal_buffer_overhead();
830: buf1.reset(new MemoryBuffer(alloc.get(), N, 4096));
831: buf1->fill(); //TODO: Why do we refill later?
832: if (args->do_bidir) {
833: buf2.reset(new MemoryBuffer(alloc.get(), N, 4096));
834: buf2->fill(); //TODO: Why do we refill later?
835: }
836: if (args->do_unidir) {
837: buf2.reset(new MemoryBuffer(alloc.get(), 0, 4096));
838: //buf2->fill(); No fill necessary, buffers are already refilled in kernels?
839: //TODO: Maybe remove buffer refills from kernels?
840: }
841: if (args->do_alltoall){ //All-to-All
842: N=args->len_msg*cl->size()+cl->kpingpong_minimal_buffer_overhead();
843: buf_all2all.reset(new MemoryBuffer(alloc.get(),N,4096));
844: buf_all2all->fill(); //TODO: Why do we refill later?
845: }