Collective writes larger than the chunk size deadlock
Reported by Michael Bareford of EPCC (m.bareford@epcc.ed.ac.uk):
So each process makes two calls to a
write_vector
method that then callssion_coll_fwrite(rhs.data(), sizeof(T), rhs.size(), _sid)
, whereT
is eitherunsigned int
ordouble
. At present, it seems that writing the string data and the element ids causes no problem, only when the element data (the vectors ofdouble
s) are also written do some processes fail to return fromsion_coll_fwrite
.
And later:
Just to let you know, I have fixed my problem. It was caused by specifying an inadequate chunk size in the call to sion_paropen_mpi.
This probably can and should be detected and reported as an error.