More pervasive buffering

Recently, Michael Bareford of EPCC (m.bareford@epcc.ed.ac.uk), in a follow-up discussion to his report on integrating SIONlib in Nektar++, reported that he did not see improvements in performance when switching from individual I/O to collective I/O.

His benchmark case is characterised by a moderately large number of tasks (6144) writing a relatively low volume of data – 25 MB in total, so around 4 KB per task – spread out over around 25 write calls per task, i.e. around 160 bytes per write call on average. He uses a file system with 65 KiB block size and he has set the chunk size equal to the block size, for collective I/O, he uses 32 collectors with 191 senders each.

When using collective I/O as it is currently implemented, this constellation results in sub-optimal behaviour. For every sion_coll_fwrite, the collector tasks have to write 192 pieces of data, each smaller than a file system block to 192 different file system blocks with a seek operation between all writes.

As a workaround, Michael currently uses merge mode which touches fewer file system blocks and skips the seek operations. This improves write performance by a factor of 10, but complicates reading the data later on.

A different workaround, proposed by Wolfgang, would be to use chunk sizes smaller than fs block sizes in normal merge mode, in combination with application side buffering.

As an improvement, SIONlib should probably offer its own buffering beyond what the ANSI functions do in collective mode. This buffering must be per task/file part. One open question is, whether these buffers should reside on the collectors or on each individual task (buffer sends vs. buffer writes).

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information