fi_cq_tagged_entry array should live until the message is properly handled
When we increase MAX_ENTRIES_PER_POLL
to 10, It results in very unstable behavior. Most of the time it hangs but sometimes it shows this error
Pool_Manager:0 1 CQ-H-0-0 (nid04278 32709) 36161189407044: mstro_pm_handle_msg(pool_manager.c:2725) Failed to unpack the message
[E:comm] Pool_Manager:0 1 CQ-H-0-0 (nid04278 32709) 36161189417248: mstro_ofi__handle_cq_entry__recv(ofi.c:1409) Error handling incoming message, dropping it: 6 (invalid pool protocol message)
which comes from https://gitlab.jsc.fz-juelich.de/maestro/maestro-core/-/blob/105-memory-leak-in-pm-message-envelope-handling/maestro/pool_manager.c#L2724, when msg == NULL
This is because entry[MAX_ENTRIES_PER_POLL]
is allocated locally in https://gitlab.jsc.fz-juelich.de/maestro/maestro-core/-/blob/105-memory-leak-in-pm-message-envelope-handling/maestro/ofi.c#L2966, therefore could be overwritten when the function returns.
"Therefore, this array needs to outlive the message handling"