cuda mode fails on juwels
qrun --ntasks=2 ./linktest -m cuda -n 100 -w 10 -s 1024
srun: job 3568986 queued and waiting for resources
srun: job 3568986 has been allocated resources
p_Init(r1): unsupported PMI version received: version=2, subversion=0
p_Init(r0): unsupported PMI version received: version=2, subversion=0
[vcluster_cuda.cc in init:290] warning: GpuContext::singleton() returned nullptr. Retrying later.
[linktest.cc in main:92] info: System string = "generic"
[vcluster_cuda.cc in init:290] warning: GpuContext::singleton() returned nullptr. Retrying later.
[benchmark.cc in benchmark:908] info: Using PinnedMmapAllocator
timings[000] [first sync] t= 7.41007 us
task[000000] on jwc00n002.juwels ( 0) mem= 56.1016 kiB
task[000001] on jwc00n002.juwels ( 1) mem= 56.1016 kiB
timings[000] [mapping] t= 23.35105 us
timings[000] [randvec] t= 195.57774 ns
PE00000: psum=4 pasum=4 do_mix=0
timings[000] [getpart] t= 6.42380 us
linktest: vcluster_cuda.cc:324: int linktest::cuda::VirtualClusterImpl::kpingpong(int, int, MemoryBuffer&, int, double*): Assertion `gpuctx_' failed.
------------------- Linktest Args ------------------------
Virtual-Cluster Implementation: cuda
Message length: 1024 B
Number of Messages: 100
Number of Messages. (Warmup): 10
Communication Pattern: Semidirectional End to End
use gpus: No
mixing pe order: No
serial test only: No
max serial retest: 2
write protocol (SION): Yes, funneled
output file: "pingpong_results_bin.sion"
----------------------------------------------------------
Starting Test of all connections:
---------------------------------
linktest: vcluster_cuda.cc:324: int linktest::cuda::VirtualClusterImpl::kpingpong(int, int, MemoryBuffer&, int, double*): Assertion `gpuctx_' failed.
Parallel PingPong for step 1: srun: error: jwc00n002: tasks 0-1: Aborted (core dumped)