OpenMPI fails for all but MPI on JUSUF
OpenMPI on JUSUF does not play nice with MINIPMI. Linktest MPI still works, but the other options do not.
Here is the error:
[pmi.cc in linktest_minipmi_context_borrow:45] error: minipmi_initialize() failed.
[vcluster_ucp.cc in init:522] error: linktest_minipmi_context_borrow() failed.
[linktest.cc in main:86] error: Failed to initialize communication ops.
It looks like communications simply fail.
Here is a sample command:
srun --ntasks=8 --distribution=block:block:block:pack Compile/benchmark/linktest --mode ibverbs --num-warmup-messages 2 --num-messages 4 --size-messages 1024 --num-slowest 2 --output linktest_ibverbs_2nx4c.sion --no-sion-file
Here are some modules:
module load Stages/2020
module load GCC/9.3.0
module load OpenMPI/4.1.0rc1
module load CUDA/11.0
module load SIONlib/1.7.6