SIONlib issueshttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues2021-01-25T11:43:35+01:00https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/173More pervasive buffering2021-01-25T11:43:35+01:00Benedikt Steinbuschb.steinbusch@fz-juelich.deMore pervasive bufferingRecently, Michael Bareford of EPCC (m.bareford@epcc.ed.ac.uk), in a follow-up discussion to his [report on integrating SIONlib in Nektar++](http://exaflow-project.eu/index.php/news/25-nektar-aorta-test-case-on-archer-improving-i-o-effici...Recently, Michael Bareford of EPCC (m.bareford@epcc.ed.ac.uk), in a follow-up discussion to his [report on integrating SIONlib in Nektar++](http://exaflow-project.eu/index.php/news/25-nektar-aorta-test-case-on-archer-improving-i-o-efficiency-for-exaflow-use-cases), reported that he did not see improvements in performance when switching from individual I/O to collective I/O.
His benchmark case is characterised by a moderately large number of tasks (6144) writing a relatively low volume of data – 25 MB in total, so around 4 KB per task – spread out over around 25 write calls per task, i.e. around 160 bytes per write call on average. He uses a file system with 65 KiB block size and he has set the chunk size equal to the block size, for collective I/O, he uses 32 collectors with 191 senders each.
When using collective I/O as it is currently implemented, this constellation results in sub-optimal behaviour. For every `sion_coll_fwrite`, the collector tasks have to write 192 pieces of data, each smaller than a file system block to 192 different file system blocks with a seek operation between all writes.
As a workaround, Michael currently uses merge mode which touches fewer file system blocks and skips the seek operations. This improves write performance by a factor of 10, but complicates reading the data later on.
A different workaround, proposed by Wolfgang, would be to use chunk sizes smaller than fs block sizes in normal merge mode, in combination with application side buffering.
As an improvement, SIONlib should probably offer its own buffering beyond what the ANSI functions do in collective mode. This buffering must be per task/file part. One open question is, whether these buffers should reside on the collectors or on each individual task (buffer sends vs. buffer writes).https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/172`_sion_free_filedesc_all_localranks` is dead code but should probably be call...2021-08-25T15:20:29+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.de`_sion_free_filedesc_all_localranks` is dead code but should probably be called somewhereWhile working on #157 (dead code removal), I did not remove `_sion_free_filedesc_all_localranks`, because the corresponding allocation function is indeed called and the `all_localranks` field is used. Find the right spot to insert calls ...While working on #157 (dead code removal), I did not remove `_sion_free_filedesc_all_localranks`, because the corresponding allocation function is indeed called and the `all_localranks` field is used. Find the right spot to insert calls to `_sion_free_filedesc_all_localranks`.https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/171Public functions without tests2021-08-25T15:20:31+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.dePublic functions without testsThese symbols were not removed while working on #157 (dead code removal), because they appear in public headers and are not prefix by `_`.
```
sion_free_io_info
sion_set_fp_closed
sion_get_dfile
sion_isdebug
sion_debug_off
sion_generic...These symbols were not removed while working on #157 (dead code removal), because they appear in public headers and are not prefix by `_`.
```
sion_free_io_info
sion_set_fp_closed
sion_get_dfile
sion_isdebug
sion_debug_off
sion_generic_register_get_multi_filename_cb
```
They should probably be called from the test suite.https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/170_sion_errorprint(_, _SION_ERROR_RETURN, ...) without returning2019-12-02T16:03:43+01:00Benedikt Steinbuschb.steinbusch@fz-juelich.de_sion_errorprint(_, _SION_ERROR_RETURN, ...) without returningAs a reaction to #169, I dug out my knowledge of regexen and searched the code for additional instances of `_sion_errorprint(_, _SION_ERROR_RETURN, ...)` that do not obviously return. Here is a list:
- [src/parlib/sion_generic_buddy.c@2...As a reaction to #169, I dug out my knowledge of regexen and searched the code for additional instances of `_sion_errorprint(_, _SION_ERROR_RETURN, ...)` that do not obviously return. Here is a list:
- [src/parlib/sion_generic_buddy.c@2173#L795](src/parlib/sion_generic_buddy.c@2173#L795)
- [src/parlib/sion_generic_buddy.c@2173#L897](src/parlib/sion_generic_buddy.c@2173#L897)
- [src/parlib/sion_generic_collective.c@2173#L188](src/parlib/sion_generic_collective.c@2173#L188)
- [src/parlib/sion_ompi_coll_cb_gen.c@2173#L158](src/parlib/sion_ompi_coll_cb_gen.c@2173#L158)
- [src/parlib/sion_ompi_coll_cb_gen.c@2173#L218](src/parlib/sion_ompi_coll_cb_gen.c@2173#L218)
- [src/parlib/sion_ompi_coll_cb_gen.c@2173#L364](src/parlib/sion_ompi_coll_cb_gen.c@2173#L364)
- [src/parlib/sion_ompi_coll_cb_gen.c@2173#L419](src/parlib/sion_ompi_coll_cb_gen.c@2173#L419)
https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/169Collective writes larger than the chunk size deadlock2021-08-25T15:20:30+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deCollective writes larger than the chunk size deadlockReported by Michael Bareford of EPCC (m.bareford@epcc.ed.ac.uk):
> So each process makes two calls to a `write_vector` method that then calls `sion_coll_fwrite(rhs.data(), sizeof(T), rhs.size(), _sid)`, where `T` is either `unsigned int...Reported by Michael Bareford of EPCC (m.bareford@epcc.ed.ac.uk):
> So each process makes two calls to a `write_vector` method that then calls `sion_coll_fwrite(rhs.data(), sizeof(T), rhs.size(), _sid)`, where `T` is either `unsigned int` or `double`. At present, it seems that writing the string data and the element ids causes no problem, only when the element data (the vectors of `double`s) are also written do some processes fail to return from `sion_coll_fwrite`.
And later:
> Just to let you know, I have fixed my problem. It was caused by specifying an inadequate chunk size in the call to sion_paropen_mpi.
This probably can and should be detected and reported as an error.https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/168Dead code in sion_paropen_mpi2019-09-25T16:07:28+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deDead code in sion_paropen_mpiTest coverage has revealed parts of `sion_paropen_mpi` to be dead code, namely the block starting here: [src/parlib/sion_mpi_gen.c@2172#L192.](src/parlib/sion_mpi_gen.c@2172#L192.) The problem is that `numFiles <= 0` is impossible at tha...Test coverage has revealed parts of `sion_paropen_mpi` to be dead code, namely the block starting here: [src/parlib/sion_mpi_gen.c@2172#L192.](src/parlib/sion_mpi_gen.c@2172#L192.) The problem is that `numFiles <= 0` is impossible at that point. If `numFiles <= 0` when entering the function, the `true` branch is taken here [src/parlib/sion_mpi_gen.c@2172#L129](src/parlib/sion_mpi_gen.c@2172#L129) and `numFiles` is set to a positive number by `_sion_get_info_from_splitted_comm_mpi`.
Things seem to be working without this code though, so is this a bug or legitimately dead code that can be removed?Benedikt Steinbuschb.steinbusch@fz-juelich.deBenedikt Steinbuschb.steinbusch@fz-juelich.dehttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/167sion_swap's aflag argument has different meaning for in-place vs. disjunct bu...2019-09-25T16:07:28+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.desion_swap's aflag argument has different meaning for in-place vs. disjunct buffersSIONlib's byte swap function, `sion_swap`, has the following documentation and signature:
```
/**
* perform byte-order swapping for arrays
* n units of size byte are swapped
* if and only if aflag==0
* in-place swapping (target==sou...SIONlib's byte swap function, `sion_swap`, has the following documentation and signature:
```
/**
* perform byte-order swapping for arrays
* n units of size byte are swapped
* if and only if aflag==0
* in-place swapping (target==source) is allowed
* if target != source, the buffers must not overlap
*/
void sion_swap(void* target, void* source, int size, int n, int aflag);
```
And there seems to be some confusion about the `aflag` argument. Different pieces of documentation seem to have different things to say about it:
1. The documentation comment (this is also part of the Doxygen documentation) above the function definition in `sion_convert.c` was recently changed (see r2136) to invert the meaning of `aflag` from what it had been all the way since r49, namely `aflag == 1` means swap bytes.
2. Curiously, there is another documentation comment (this one does not go in the Doxygen output) introduced in `sion.h` (nowadays it resides in `sion_common.h`) all the way back in r43 which claims `aflag == 0` means swap bytes, contradicting the original comment in `sion_convert.c`.
3. The training material used in the PATC parallel I/O course in the last two years sides with the `sion_common.h` comment in saying `aflag == 0` means swap bytes.
4. This is not really documentation, but all internal uses use `filedesc->swapbytes` in place of `aflag`, which again supports the first interpretation, `aflag == 1` means swap bytes. The Mandelbrot test supports this further, calling the variable it uses in place of `aflag` `doswap`.
So which meaning is true and whence the confusion? Well, it turns out that both are true, because the meaning of `aflag` changes, depending on whether `target == source`. The internals of `sion_swap` actually look like this:
```
void sion_swap(void* target, void* source, int size, int n, int aflag) {
if (target == source) {
if (!aflag) { // Note the negation operator
// do nothing
} else {
// swap in place
}
} else {
if (aflag) { // Note the absence of the negation operator
// memcpy from source to target (no swapping)
} else {
// swap bytes while copying
}
}
}
```
So, at the moment, all pieces of documentation do not fully describe the behavior of `sion_swap`. I see three ways to fixing this:
1. Make `aflag == 1` mean swap bytes in all cases and adapt all documentation. **Pro:** small internal code changes, since they all fall into the _in-place_ case which already matches this. **Contra:** this meaning contradicts the PATC training material.
2. Make `aflag == 0` mean swap bytes in all cases. **Pro:** matches the PATC training material. **Contra:** more internal code changes and contradicts the Doxygen documentation.
3. Keep the behavior as it is and adapt the documentation to explain both cases. **Pro:** no code changes. **Contra:** convoluted logic.Benedikt Steinbuschb.steinbusch@fz-juelich.deBenedikt Steinbuschb.steinbusch@fz-juelich.dehttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/166File header prefix field unused?2019-09-25T16:07:28+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deFile header prefix field unused?Digging deeper into #160 and #165, I cannot seem to find a place in SIONlib where the prefix field in the `_sion_filedesc` struct is actually used. It is written to files and read back in, it is printed by siondump, but the file names fo...Digging deeper into #160 and #165, I cannot seem to find a place in SIONlib where the prefix field in the `_sion_filedesc` struct is actually used. It is written to files and read back in, it is printed by siondump, but the file names for multi-files all seem to be calculated from the file name arguments passed to the various open functions.
Am I missing something or is this field superfluous and might be dropped in the next file format?
2.0.0Benedikt Steinbuschb.steinbusch@fz-juelich.deBenedikt Steinbuschb.steinbusch@fz-juelich.dehttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/165_sion_read_header_fix_part does not sanitize input from file2019-09-25T16:07:28+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.de_sion_read_header_fix_part does not sanitize input from file`_sion_read_header_fix_part` allocates a buffer `lprefix` of length `SION_FILENAME_LENGTH` into which it reads `SION_FILENAME_LENGTH` bytes from the prefix field of the file header. Next, `lprefix` is duplicated using `strdup` without en...`_sion_read_header_fix_part` allocates a buffer `lprefix` of length `SION_FILENAME_LENGTH` into which it reads `SION_FILENAME_LENGTH` bytes from the prefix field of the file header. Next, `lprefix` is duplicated using `strdup` without ensuring that `lprefix` actually contains a null byte. This is a potential buffer overflow. Possible fixes:
- make sure `lprefix` contains a null byte,
- use `strndup`,
- do not duplicate `lprefix`, possibly wasting up to `SION_FILENAME_LENGTH - 1` bytes.Benedikt Steinbuschb.steinbusch@fz-juelich.deBenedikt Steinbuschb.steinbusch@fz-juelich.dehttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/164Fortran Interface forces unnecessary export of ANSI filepointer2019-09-25T16:07:28+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deFortran Interface forces unnecessary export of ANSI filepointerThe C part of the Fortran interface passes a non-null filepointer argument to SIONlib's open functions, causing the `fileptr_exported` flag to be set. However, the filepointer value is never actually exported to the Fortran side, because...The C part of the Fortran interface passes a non-null filepointer argument to SIONlib's open functions, causing the `fileptr_exported` flag to be set. However, the filepointer value is never actually exported to the Fortran side, because it cannot be used there. This causes unnecessary overhead for synchronising the file poisition (see `_sion_update_fileposition` and all of its callers).Benedikt Steinbuschb.steinbusch@fz-juelich.deBenedikt Steinbuschb.steinbusch@fz-juelich.dehttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/163Incorrect intent declarations in Fortran bindings2019-12-02T16:03:43+01:00Benedikt Steinbuschb.steinbusch@fz-juelich.deIncorrect intent declarations in Fortran bindingsAnke Kreuzer recently reported a segmentation fault originating in a call to `fsion_paropen_mpi`.
> Hallo Benedikt,
>
> also der Code rund um das FSION_PAROPEN_MPI sieht so aus aus:
>
> fsblksize = -1
> chunksize = INT(16+nt*(96+...Anke Kreuzer recently reported a segmentation fault originating in a call to `fsion_paropen_mpi`.
> Hallo Benedikt,
>
> also der Code rund um das FSION_PAROPEN_MPI sieht so aus aus:
>
> fsblksize = -1
> chunksize = INT(16+nt*(96+120*ndl),KIND=8)
>
> IF(myid==0) THEN
> WRITE(*,*) " Before create_Com"
> ENDIF
> color = crank/24
> key = modulo(crank,24);
> ! WRITE(*,*) "My id: ",myid , "color: ", color, "key: ", key
>
> CALL MPI_COMM_SPLIT(MPI_COMM_WORLD, color, myid,
> & lcomm, ierr)
> IF(myid==0) THEN
> WRITE(*,*) " After create_Com, before SION_PAROPEN_MPI"
> ENDIF
>
> So, bis hier hin laeuft das Programm durch und dann kommt:
>
> CALL FSION_PAROPEN_MPI('NAM_CPs',
> & 'bw',-1,icomm0,lcomm,
> & chunksize, fsblksize, myid, newfname, sid)
>
> IF(myid==0) THEN
> WRITE(*,*) " After paropen"
> ENDIF
>
> Und diese letzte Ausgabe wird nie ausgegeben, da vorher der Segmentation
> fault passiert.
The problem is that SIONlib tries to return the number of files it actually opened (based on the number of distinct local communicators) in the `nfiles` argument. However, the special value `-1` is specified as an integer literal and as the dummy argument `nfiles` is declared with `intent(in)` is probably placed in a read-only part of memory.
A quick inspection reveals that the `intent` declarations on the Fortran subroutine and the argument const-ness on the C function that is called underneath differ in several places.
```
subroutine fsion_paropen_mpi(fname,file_mode,nfiles,fgcomm,flcomm,chunksizes,fsblksize,&
& globalrank,newfname,sid)
implicit none
character(len=*), intent(in) :: fname
character(len=*), intent(in) :: file_mode
integer, intent(in) :: nfiles
integer, intent(in) :: fgcomm
integer, intent(inout) :: flcomm
integer*8, intent(inout) :: chunksizes
integer*4, intent(inout) :: fsblksize
integer, intent(in) :: globalrank
character(len=*), intent(out) :: newfname
integer, intent(out) :: sid
call fsion_paropen_mpi_c(fname,file_mode,nfiles,fgcomm,flcomm,chunksizes,fsblksize,&
& globalrank,newfname,sid)
end subroutine fsion_paropen_mpi
```
```
void fsion_paropen_mpi_c(char *fname,
char *file_mode,
int *numFiles,
MPI_Fint * fgComm,
MPI_Fint * flComm,
sion_int64 *chunksize,
sion_int32 *fsblksize,
int *globalrank,
char *newfname,
int *sid,
int fname_len,
int file_mode_len,
int newfname_len);
```https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/162Incorrect error handling for POSIX read and write2019-09-25T16:07:28+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deIncorrect error handling for POSIX read and writePOSIX `read` and `write` return a value of the _signed_ type `ssize_t`, in particular, both return `-1` on error. Neither `_sion_file_read_posix` nor `_sion_file_write_posix` check for this special value, assuming that like ANSI `fread` ...POSIX `read` and `write` return a value of the _signed_ type `ssize_t`, in particular, both return `-1` on error. Neither `_sion_file_read_posix` nor `_sion_file_write_posix` check for this special value, assuming that like ANSI `fread` and `fwrite` only short counts will be returned and retry the operation. What makes things worse is that the return value of `read` and `write` are assigned to `bread` and `bwrote`, both of which are of _unsigned_ type `size_t`, meaning an `ssize_t` value `-1` will be converted to `SIZE_MAX` which makes the arithmetic in the latter part of the retry loops adventurous (`dataptr += bread` comes to mind).2.0.0Benedikt Steinbuschb.steinbusch@fz-juelich.deBenedikt Steinbuschb.steinbusch@fz-juelich.dehttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/161Simplistic error handling for ANSI read and write2019-09-25T16:07:28+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deSimplistic error handling for ANSI read and write`fwrite` and `fread` return a short count in case of an error (or when end-of-file is reached while reading). Both `_sion_file_write_ansi` and `_sion_file_read_ansi` ignore these return values that signal errors and retry the operation u...`fwrite` and `fread` return a short count in case of an error (or when end-of-file is reached while reading). Both `_sion_file_write_ansi` and `_sion_file_read_ansi` ignore these return values that signal errors and retry the operation until the requested number of bytes has been written. In cases where an error does not solve itself over time, this will result in infinite loops.2.0.0Benedikt Steinbuschb.steinbusch@fz-juelich.deBenedikt Steinbuschb.steinbusch@fz-juelich.dehttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/160Buffer overflow in sion_generic_paropen2019-09-25T16:07:28+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deBuffer overflow in sion_generic_paropen`sion_generic_paropen` allocates a buffer `lprefix` of fixed length `SION_FILENAME_LENGTH` and then uses `strcpy` to copy the user-provided argument `fname` into `lprefix` without checking the length of `fname`.`sion_generic_paropen` allocates a buffer `lprefix` of fixed length `SION_FILENAME_LENGTH` and then uses `strcpy` to copy the user-provided argument `fname` into `lprefix` without checking the length of `fname`.2.0.0Benedikt Steinbuschb.steinbusch@fz-juelich.deBenedikt Steinbuschb.steinbusch@fz-juelich.dehttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/159Finish integration of hide-chunks / continuous write2019-09-25T16:07:28+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deFinish integration of hide-chunks / continuous writeDa die Problembeschreibung bisher (glaube ich) nur als E-Mail existiert erstelle ich mal noch ein Ticket und zitiere besagte E-Mail:
> Der Hase liegt in diesem Fall in r2050 begraben. Die Änderungen, die durch die ‘hide-chunks’ Branch e...Da die Problembeschreibung bisher (glaube ich) nur als E-Mail existiert erstelle ich mal noch ein Ticket und zitiere besagte E-Mail:
> Der Hase liegt in diesem Fall in r2050 begraben. Die Änderungen, die durch die ‘hide-chunks’ Branch eingeführt wurden enthalten einen Fehler in `sion_fread`. Diese ruft nun, statt `_sion_file_read`, die Funktion `_sion_read_multi_chunk`. `_sion_read_multi_chunk` zählt das Feld `currentpos` des `_sion_filedesc` Typen hoch (sion_internal.c:1308). Da die Funktion `_sion_file_read` dies nicht tut, enthält `sion_fread` entsprechende Anweisungen (sion_common.c:642). Diese sind in ‘hide-chunks’ nicht entfernt worden, `currentpos` wird also nun beim Lesen doppelt hochgezählt. `_sion_update_fileposition` hilft in diesem Fall auch nicht, da das C++ Interface den File Pointer nicht herausreicht (sion_internal.c:1002).
>
> In `test_cxxwr__1.cpp` werden insgesamt 16 Byte geschrieben, zwei 4 Byte `int` und ein 8 Byte `double`. Wenn diese anschließend wieder eingelesen werden entwickelt sich `currentpos` wie folgt: N (Startwert) -> N + 8 -> N + 16. Nach dem Einlesen des zweiten Integers steht `currentpos` auf dem Ende der Datei. Beim Einlesen des Double Werts schlägt `sion_feof` an, und es wird tatsächlich nichts eingelesen (sion_common.c:630). An dieser Stelle sollte das C++ Interface vermutlich Alarm schlagen, tut es aber nicht.
>
> Dass der Test in Kays Testumgebung gut geht liegt daran, dass sein Compiler `double_data` die selbe Stelle im Stack zuweist wie `z` und der Speicher nicht initialisiert wird. Der Compiler auf der Workstation gibt den beiden Variablen unterschiedliche Stellen im Stack, dieser wird durch `sion_fread` nicht überschrieben (besser gesagt initialisiert) und der Test schlägt Fehl, da der Speicher nicht zufällig den Wert 2 enthält.
>
> Erstaunlich ist, dass nur dieser Test fehlschlägt. Es sieht so aus, als würden andere Tests:
>
> - nicht `sion_fread` verwenden oder
> - nicht mehrere `sion_fread` hintereinander verwenden oder
> - beim letzten `sion_fread` noch nicht hinter dem Ende der Datei zu stehen oder
> - die gelesenen Daten nicht mit in die Ausgabe packen und den Rückgabewert von `sion_fread` nicht zu prüfen oder
> - in r2050 fälschlich mit angepasst worden zu sein.2.0.0Benedikt Steinbuschb.steinbusch@fz-juelich.deBenedikt Steinbuschb.steinbusch@fz-juelich.dehttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/158Drop support for historic architectures2021-05-05T08:10:07+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deDrop support for historic architecturesThere is still support for BG/L and BG/P in SIONlib. Can support for these architectures be dropped? What about others?There is still support for BG/L and BG/P in SIONlib. Can support for these architectures be dropped? What about others?2.0.0Benedikt Steinbuschb.steinbusch@fz-juelich.deBenedikt Steinbuschb.steinbusch@fz-juelich.dehttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/157Dead code cleanup2019-09-25T16:07:28+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deDead code cleanupThe following functions have declarations and definitions but are never actually called. Can they be removed?
```
_sion_cache_init
_sion_cache_destroy
sion_print_time_stamp
_sion_ompicol_size_of_dtype
_sion_set_fd
_sion_ptr2fd
_sion_fd_...The following functions have declarations and definitions but are never actually called. Can they be removed?
```
_sion_cache_init
_sion_cache_destroy
sion_print_time_stamp
_sion_ompicol_size_of_dtype
_sion_set_fd
_sion_ptr2fd
_sion_fd_size
_sion_get_numfiles_from_file_mpi
_sion_get_filenumber_from_files_mpi
(_)sion_free_io_info
sion_set_fp_closed
(_)sion_set_second_fp
(_)sion_unset_second_fp
sion_optimize_fp_buffer
sion_flush
_sion_fileptrflags_to_str
sion_get_dfile
sion_isdebug
sion_debug_off
sion_generic_register_get_multi_filename_cb
_sion_free_filedesc_all_localranks
```2.0.0Benedikt Steinbuschb.steinbusch@fz-juelich.deBenedikt Steinbuschb.steinbusch@fz-juelich.dehttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/156sion_paropen_mapped_mpi with incomplete list of globalranks2017-04-03T17:38:41+02:00Wolfgang Fringssion_paropen_mapped_mpi with incomplete list of globalrankssion_paropen_mapped_mpi should support cases where not all globalranks are used (e.g. for FTI head nodes). Especially, when rank 0 is missing, SIONlib abort with an unclear message sion_paropen_mapped_mpi should support cases where not all globalranks are used (e.g. for FTI head nodes). Especially, when rank 0 is missing, SIONlib abort with an unclear message v1.6Wolfgang FringsWolfgang Fringshttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/155Fehler in sion_generic_register_nam_restore_file_cb2017-02-09T14:03:16+01:00Kay ThustFehler in sion_generic_register_nam_restore_file_cbDEEP-ERWolfgang FringsWolfgang Fringshttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/154SIONlib 2.0 API changes2021-08-25T15:20:29+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deSIONlib 2.0 API changes# `open` functions
## Current API
Some parts of the current API combine many functions into one. This makes it easy to write down advanced operations concisely, but arguably makes it harder than necessary for the user to get started do...# `open` functions
## Current API
Some parts of the current API combine many functions into one. This makes it easy to write down advanced operations concisely, but arguably makes it harder than necessary for the user to get started doing easy things at first. Take for example the following function
```
int sion_paropen_mpi(
const char *fname,
const char *file_mode,
int *numFiles,
MPI_Comm gComm,
const MPI_Comm *lComm,
sion_int64 *chunksize,
sion_int32 *fsblksize,
int *globalrank,
FILE **fileptr,
char **newfname
)
```
For each of these 10 function arguments, the user has to find out:
* the argument `intent`:
* some are `in` parameters,
* some are `out` parameters,
* some are both,
* some are marked incorrectly;
* whether he has to supply a value or not:
* some of the arguments are mandatory,
* some are optional with different special values to signal the absence of a value (e.g. `NULL` vs `-1`);
* does it interact with other arguments:
* some arguments have intricate relationships to one another, where they can only be used in certain combinations (e.g. `numFiles` and `lComm`).
## Proposed Changes
In the branch `DEV-BS-API2` I propose some structural changes to the API that I hope simplify the getting started scenario by deferring some of these questions to a later point during the development of a project that wants to make use of SIONlib. To that end, I have introduced a type `sion_mpi_options` (see https://trac.version.fz-juelich.de/SIONlib/browser/branches/DEV-BS-API2/src/parlib/sion_mpi.h#L94) that encapsulates the argument list of `sion_paropen_mpi`. It is initialized using a function that takes all arguments that are strictly necessary to open a file (except for rank, which I recently found out is an unnecessary `out` argument):
```
sion_mpi_options sion_mpi_options_new(
const char *filename,
sion_open_mode mode,
MPI_Comm communicator,
int rank // not strictly necessary, see ticket #153
); // ignoring rank, that is three parameters to fopen's two
```
All other members are initialised to sensible default values, i.e. `fsblksize` is detected from the file system, `chunksize` is chosen accordingly (a "correct" chunksize is no longer necessary with the `hide-chunks` changes) and the `file_number` is set to 1...
Setter functions are used to override default values, e.g. `sion_mpi_options_set_chunksize` or `sion_mpi_options_set_multifile_number`.
Once the user has modified the options object to his satisfaction, it is used as the sole argument of `sion_mpi_paropen_from_options`. This design is inspired by the Builder pattern (https://en.wikipedia.org/wiki/Builder_pattern)
Along with these changes to the `open` function, an exhaustive list of getter functions for file properties needs to be offered, to query things like `newfname` in cases where these are of interest to the user.
### An Alternative
Alternatively, the responsibilities could be split slightly differently, encapsulating only the optional arguments in the `sion_mpi_options` type. The open function would then look like this:
```
int sion_paropen_mpi_with_options(
const char *filename,
sion_open_mode mode,
MPI_Comm communicator,
const sion_mpi_options *options
);
```
And the user can pass in `NULL` for `options` if he is fine with the defaults and thus does not have to come into contact with the `sion_mpi_options` type and associated functions until he needs them. There could also be an even simpler function to do just this:
```
int sion_paropen_mpi(const char *filename, sion_open_mode mode, MPI_Comm communicator) {
sion_paropen_mpi_with_options(filename, mode, communicator, NULL);
}
```
## Pro
All in all, these changes help answer or defer the questions mentioned earlier:
* **Argument intent:** When opening a file, all arguments have intent `in`. Dynamic properties of open files can later be queried by the user through getter functions (in case he is interested).
* **Optional arguments:** Mandatory arguments have to be provided. Optional arguments are hidden inside the `sion_mpi_options` type. Instead of having to find out what special value signals the absence of an argument, the user can simply ignore them.
* **Argument interactions:** The setter functions of the `sion_mpi_options` type can be used to enforce constraints between arguments, see `sion_mpi_options_set_multifile_{number|communicator}`.
## Contra
Although these changes can make it easier to get into working with SIONlib at first by hiding some of the complexity, it does in turn hurt the visibility of some advanced mechanisms. E.g., in the current API, the `file_number` argument explicitly appears in the function signature of `sion_mpi_paropen` alerting the user to the presence of this feature, whereas in the proposed API, a user can successfully open a file and use it without learning about the `multifile` mechanism and as a consequence miss potential performance improvements.
## Open Questions
* Can this mechanism replace all arguments of `open`, especially those that are currently encoded in the mode string?
* Does this mechanism work for the other `open` flavors (serial, OpenMP, hybrid)?
* ...?
# `seek` and file position in general
## Current API
Currently, a location inside a SIONlib container is identified by a triple of (`rank`, `blocknr`, `position_in_block`).
```
int sion_seek(
int sid,
int rank,
int blocknr,
sion_int64 position_in_block
)
```
Once the continuous write work has been completed, the model of a logical file in a SIONlib container can be changed from a sequence of chunks each with individual fill amount to a sequence of `n-1` chunks containing `chunksize` bytes and a last chunk containing the remaining `size(logical file) - (n - 1) * chunksize` bytes. In this picture the block numbers can be de-emphasized in favour of absolute offsets into the logical file (which are straightforward to calculate) or relative offsets from the current file pointer position. This would make SIONlib logical files and their API more similar to the naive task-local files they are trying to replace.
## Proposed Changes
The names and documentation of the function arguments could be changed, renaming the `blocknr` argument to `whence` as found on conventional `seek` functions and more emphasis would be placed on the special values `SION_ABSOLUTE_POS` and `SION_END_POS` as well as another special value `SION_SEEK_RELATIVE` that allows searching relative to the current position, but across block boundaries. The `position_in_block` argument would be renamed to `position`.
### A step further
Alternatively, to further de-emphasize the concept of blocks, the option to seek by blocks could be removed from `sion_seek` altogether and be moved to a different function e.g. `sion_seek_block`. This simplifies `sion_seek` for use cases where block boundaries do not matter. Furthermore, this enables a more versatile family of seek by block operations, such as relative seek by block (e.g. seek to the beginning of the next block, after writing a header of metadata at the beginning of a file).
### Yet further
The interface of `sion_seek` could further be simplified by removing the `rank` parameter (which, with the changes above, would make the interface as good as identical to a conventional `seek` function). This functionality could be moved to a separate function, e.g. `sion_seek_logical_file`. This simplifies common use cases where opening a SIONlib container means opening a single logical file inside of it, e.g. the (non-mapped) `sion_paropen...` functions or `sion_open_rank` where seeking by rank is not possible anyways.
## Open Questions
- Which alternative should we decide on?2.0.0