SIONlib issueshttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues2019-12-02T16:03:57+01:00https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/121Support time-varying task-mappings2019-12-02T16:03:57+01:00Kay ThustSupport time-varying task-mappingsEnhancement of the SIONlib meta-data structure and the functionality of the underlying parallel software layer to support time-varying task-mappings.Enhancement of the SIONlib meta-data structure and the functionality of the underlying parallel software layer to support time-varying task-mappings.DEEP-ERhttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/119Compatibility for OmpSs2019-12-02T16:03:57+01:00Kay ThustCompatibility for OmpSsIntegration of an additional parallel API driver layer for OmpSs. This will extend the list of currently supported parallel paradigms (MPI, OMP, hybrid)Integration of an additional parallel API driver layer for OmpSs. This will extend the list of currently supported parallel paradigms (MPI, OMP, hybrid)DEEP-ERhttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/155Fehler in sion_generic_register_nam_restore_file_cb2017-02-09T14:03:16+01:00Kay ThustFehler in sion_generic_register_nam_restore_file_cbDEEP-ERWolfgang FringsWolfgang Fringshttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/138Buddy checkpointing hangs when main file is missing2016-07-05T10:38:32+02:00Kay ThustBuddy checkpointing hangs when main file is missingtest_fitest_1 does not pass TEST H, so it is deactivated and the log is adapted to make it pass (see [1990]).test_fitest_1 does not pass TEST H, so it is deactivated and the log is adapted to make it pass (see [1990]).DEEP-ERWolfgang FringsWolfgang Fringshttps://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/210Duplication of information in generic API2021-08-25T15:20:30+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deDuplication of information in generic API`sion_generic_paropen` and `sion_generic_paropen_mapped` both have a "commgroup" argument ("global communicator") as well as two arguments `grank` ("global rank of process / calling task") and `gsize` ("size of global communicator"). The...`sion_generic_paropen` and `sion_generic_paropen_mapped` both have a "commgroup" argument ("global communicator") as well as two arguments `grank` ("global rank of process / calling task") and `gsize` ("size of global communicator"). The information contained in the latter two should be contained somehow in the first and indeed, every specific API in SIONlib (MPI, OpenMP, Hybrid) has fields in its "commgroup" structure that contain exactly the values passed in `grank` and `gsize`. Is it necessary to keep all three arguments or should there generic interface be extended to allow the user to register callbacks for inspecting the "commgroup" for rank and size on the generic level?2.0.0https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/212Make API creation foolproof2021-08-25T15:20:30+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deMake API creation foolproof`sion_generic_create_api` creates an API that is in a completely invalid state and the various `sion_generic_register...` functions all register a single callback function, leaving the API again in an invalid state until all necessary ca...`sion_generic_create_api` creates an API that is in a completely invalid state and the various `sion_generic_register...` functions all register a single callback function, leaving the API again in an invalid state until all necessary callbacks have been registered. Why not have `sion_generic_create_api` take all necessary callbacks as an argument and create a valid API descriptor in a single step?
As there seem to be different levels of capability that require fewer or more callbacks to be defined, there could be several `create` functions, one per level or, a single `create` function that accepts `NULL` for optional callbacks and sets the correct capability level.2.0.0https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/213Deprecate or remove sion_get_current_position etc.2021-08-25T15:20:29+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deDeprecate or remove sion_get_current_position etc.These functions expose internal fields, remove in favor of `sion_tell` (and possibly more functions to be implemented).These functions expose internal fields, remove in favor of `sion_tell` (and possibly more functions to be implemented).2.0.0-rc.4https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/203Adapt python bindings to 2.0 interface2019-12-02T16:03:44+01:00Benedikt Steinbuschb.steinbusch@fz-juelich.deAdapt python bindings to 2.0 interfaceOtherwise disable for now and adapt after 2.0.0 release.Otherwise disable for now and adapt after 2.0.0 release.2.0.0https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/204Document all new open functions and the accompanying option constructors and ...2021-08-25T15:20:29+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deDocument all new open functions and the accompanying option constructors and setters2.0.0-rc.4https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/205Write a guide for the 1.0 to 2.0 transition2021-08-25T15:20:30+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deWrite a guide for the 1.0 to 2.0 transitionDocument the following:
- changes in open functions
- option structs for advanced use cases
- split up seek function
- reading and writing "without" chunksDocument the following:
- changes in open functions
- option structs for advanced use cases
- split up seek function
- reading and writing "without" chunks2.0.0-rc.4https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/201Delete 1.X open functions, remove _with_options suffix from 2.x open functions2021-08-25T15:20:29+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deDelete 1.X open functions, remove _with_options suffix from 2.x open functions2.0.0-rc.4https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/202Re-design and implement Fortran bindings2021-08-25T15:20:30+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deRe-design and implement Fortran bindingsOtherwise disable for now and design and implement after 2.0.0 release.Otherwise disable for now and design and implement after 2.0.0 release.2.0.0https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/207Adapt all examples to new interface2021-08-25T15:20:30+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deAdapt all examples to new interfaceOr move examples to separate repository and adapt post 2.0.0 releaseOr move examples to separate repository and adapt post 2.0.0 release2.0.0https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/206Design and implement forward-compatibility CLI utility2021-08-25T15:20:30+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deDesign and implement forward-compatibility CLI utilityReads a SIONlib container with file format 6 and writes a file format 5 containerReads a SIONlib container with file format 6 and writes a file format 5 container2.0.0https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/189Collective I/O and mapped mode, (how) does it work?2021-08-25T15:20:28+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deCollective I/O and mapped mode, (how) does it work?`_sion_paropen_mapped_generic` suggests, that it is at least partially supported to open files in mapped mode for collective I/O:
- in write mode, the `usecoll` field of the file descriptor struct is set according to the file mode flags...`_sion_paropen_mapped_generic` suggests, that it is at least partially supported to open files in mapped mode for collective I/O:
- in write mode, the `usecoll` field of the file descriptor struct is set according to the file mode flags, but `_sion_calculate_startpointers_collective` is not called and thus the `collector` and `collsize` fields of the file descriptor are not set to correct values,
- in read mode, `_sion_calculate_startpointers_collective` **is** called (but `_sion_calculate_startpointers_collective_merge` is not) so the `collector` and `collsize` fields are populated with values and later on broadcast to all processes.
However, it is not clear:
- whether this works at the moment (there are no tests exercising this combination),
- how this is supposed to work (see below).
Several aspects of collective I/O seem to not have been translated from the original picture (normal `paropen`) to the mapped picture (`paropen_mapped`):
- `_sion_calculate_startpointers_collective` determines the identity of collectors (their rank) and the size of collective groups according to the number of logical files rather than the number of actual processes, but
- e.g. `_sion_mpi_gather_process_cb` (which is used underneath `sion_coll_fwrite`) determines the identity of collectors according to `MPI_Comm_rank` and also
- `_sion_mpi_gather_process_cb` expects collectors to receive messages from `collsize - 1` processes with rank numbers `rank_collector + 1, ..., rank_collector + collsize - 1`
which cannot work in all scenarios, e.g. if there are less processes than logical files.2.0.0https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/191Move documentation of `sionconfig` / build process to its own page2021-08-25T15:20:30+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deMove documentation of `sionconfig` / build process to its own pageCurrently the Doxygen documentation contains a short paragraph about `sionconfig` and how to use SIONlib from a user's code on the page "Installation, debugging and error messages". This should be expanded and moved to its own page.Currently the Doxygen documentation contains a short paragraph about `sionconfig` and how to use SIONlib from a user's code on the page "Installation, debugging and error messages". This should be expanded and moved to its own page.2.0.0https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/180Backward compatibility for continuous read/write2021-08-25T15:20:29+02:00Benedikt Steinbuschb.steinbusch@fz-juelich.deBackward compatibility for continuous read/writeFrom [SionVersionTwo](SionVersionTwo):
- Backward
- Old format should be readable
- Both, serial and parallel
- Forward
- External tool (CLI)
For reading old files: have to keep chunksizes array(s) in file descriptor.
Design q...From [SionVersionTwo](SionVersionTwo):
- Backward
- Old format should be readable
- Both, serial and parallel
- Forward
- External tool (CLI)
For reading old files: have to keep chunksizes array(s) in file descriptor.
Design question: two read functions for old files and new files, or only old file read functions, synthesize chunksizes arrays on open for new files?2.0.0https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/177Simplify `globalranks` vs. `ranks`?2019-12-02T16:03:44+01:00Benedikt Steinbuschb.steinbusch@fz-juelich.deSimplify `globalranks` vs. `ranks`?A SIONlib file container contains an arbitrary number `n` of logical files. Each one of these `n` files has two identifying numbers attached to it:
- an implicit one in the range `0...n-1`, `siondump` calls this "Task",
- an explicit on...A SIONlib file container contains an arbitrary number `n` of logical files. Each one of these `n` files has two identifying numbers attached to it:
- an implicit one in the range `0...n-1`, `siondump` calls this "Task",
- an explicit one that is freely assigned by the user, called "globalrank".
Depending on how the file was opened (e.g. serial open vs. mapped open) either the implicit or explicit rank numbers are used to refer to a logical file when doing a `sion_seek`.
- This is confusing. Why are there two rank numbers for a logical file? Why does `sion_seek` behave differently, depending on how a file was opened?
- The current implementation is incomplete. The serial open functions do not perform any validation of the `globalranks` making it possible to create files that trip up a later mapped open (see e.g. #156 and #174).
Is the additional identifying number really needed or can the mechanism be simplified?2.0.0https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/176Consistent terminology for upcoming publications and improved documentation2019-12-02T16:03:43+01:00Benedikt Steinbuschb.steinbusch@fz-juelich.deConsistent terminology for upcoming publications and improved documentationFor the upcoming version 2.0 release of SIONlib, it would be nice to have somewhat comprehensive documentation of at least the user facing API. Care should be taken to either rethink or make consistent the terminology that is currently i...For the upcoming version 2.0 release of SIONlib, it would be nice to have somewhat comprehensive documentation of at least the user facing API. Care should be taken to either rethink or make consistent the terminology that is currently in use. This effort should also extend to upcoming publications and training courses which ideally will use terminology consistent with the API documentation.
This ticket is meant as a place to collect and discuss suggestions for future terminology.
# Parallel execution streams
Different technologies for parallel programming provide differently named software abstractions with different semantics for parallel execution streams. SIONlib via its different specific APIs (MPI, OpenMP, hybrid) has to fit into these naming schemes and ideally provide a single abstract name for this concept. Here is how these terms are defined by the various technologies and how I suggest they should be used by SIONlib.
== Process==
A parallel program that uses MPI consists of a number of parallel _processes_. In the simplest case all _processes_ run the same program (possibly taking different branches inside the program) and are all started at the beginning of the computation. This picture can be more complicated via mechanisms such as MPMD and dynamic process management.
In SIONlib documentation this term should be used mostly in the documentation of the MPI API to refer to MPI processes.
## Rank
In MPI, a _rank_ is a number that is used to identify a _process_ in the context of a specific communicator (or more precisely within a group). There is, in general, not a unique one to one map between _rank_ numbers and _processes_. The terminology is often used interchangeably, but should not.
The term _rank_ should only be used to describe exactly a number that identifies an MPI _process_ in a specific group, mostly as part of the documentation of function signatures in the MPI API.
## Thread
A parallel program using OpenMP can use a varying number of _threads_ (officially "OpenMP threads") to run different parts of the program in parallel.
In SIONlib documentation this term should be used mostly in documentation of the OpenMP API and the hybrid API to refer to this concept from OpenMP.
## Thread number
OpenMP groups _threads_ into teams. Each _thread_ in a team is assigned an integer _thread number_ from `0` to `team size - 1`. This is a similar concept to _rank_ numbers in MPI.
## Task
OpenMP uses the word _task_ to refer to a mechanism of encapsulation a block of code and associated data (essentially a closure) that can be run either completely independently from other tasks or according to a set of user defined constraints such as _task_ dependencies or explicit _task_ synchronisation operations.
SIONlib does not directly interact with this concept from OpenMP and so _task_ might be a candidate for a generic abstract term that can be used to refer to the API specific terms _process_ and _thread_ (and possibly others in the future). _Task_ is also the abstract term used by the in-progress paper.
## Alternative generic term
It might not be ideal to use _task_ as the generic abstract term for the concept of a parallel execution stream, because it collides with the OpenMP terminology. However that is probably also the case for all alternatives. _Parallel execution stream_ is probably too verbose. Other suggestions?
## Groups, Teams, ...
Both MPI (Groups) and OpenMP (Teams) have terms to refer to sets of _processes_/_threads_. SIONlib documentation probably needs a term for this as well, but maybe its used rarely enough that _set of tasks_ or whatever the abstract term ends up being is fine.
# Components of SIONlib file containers
The SIONlib file format is quite intricate and consists of a number of components nested inside one another. These components all have their own names. In addition, the file format is influenced by file system mechanisms that have to fit into the terminology.
## Container
A _SIONlib container_ is the object stored on the file system to hold both the application data and SIONlib metadata to describe the application data layout.
### Implementation aspects
A _container_ is stored in one or more _physical files_. Each _physical file_ contains two blocks of metadata, one at the beginning and one at the end and a number of _blocks_ containing _chunks_ between the two.
### User facing aspects
A single _container_ contains a sequence of _logical files_.
## Logical file
This is a user facing concept that is currently variously referred to as _task_ or _rank_. It describes the concatenation of all _chunks_ in a _physical file_ that together form a single logical part of the file that can be addressed via the `rank` argument of `sion_seek()` or selected for opening via the `globalranks` argument of `sion_paropen_mapped_XXX()`.
The terminology currently in use is probably historically motivated by the fact that there was a one to one mapping between file _parts_ and _tasks_ / _processes_ (or _ranks_). This particular concept should probably be renamed going forward to avoid confusion with terminology from the previous section. The in-progress paper consistently uses the term _logical (task-local) file_.
## Physical file
Describes an actual file on the file system. A single _container_ that is opened through a single SIONlib `...open...` function might span several _physical files_ on disk.
This term is used consistently by the in-progress paper.
## Chunks
These are the smallest components of a _physical file_ that make up a _logical file_. All _chunks_ that belong to the same _logical file_ have the same (individual per _logical file_) size. In previous versions of SIONlib the _chunk_ size was an upper bound on the amount of data that could be transferred from/to a file in a single read/write function call. This limitation will be lifted once the work on continuous write is completed.
This term is currently used consistently and should remain unchanged.
## Blocks
_Physical files_ are a sequence of _blocks_. For every _logical file_, a _block_ contains as many (possibly empty) _chunks_ as will fit into a _file system block_, but at least one that might also span several _file system blocks_. So _blocks_ themselves are sequences of _chunks_, possibly interspersed with padding to prevent sharing of _file system blocks_.
This term is currently used consistently and should remain unchanged.
## File system blocks
Unit of storage of the underlying file system that has to be modified in a read-modify-write cycle. Sharing of _file system blocks_ between _tasks_ should be avoided. Thus for every _logical file_ the first _chunk_ in every _block_ is aligned to the start of a _file system block_. In effect, no _file system block_ will ever contain data that belongs to more than one _logical file_.
This term is used in a way that is consistent with its definition outside of SIONlib. There is no need to change that.
_Caution:_ The previous description of _chunk_ alignment ignores collective mode where data from different _logical files_ that are handled by the same collector might coexist in the same _file system block_.
## Range
This term is currently used in the description of the SIONlib file format to signify a sequence of _file system blocks_.
## Multi-file
This term is currently used to refer to the collection of _physical files_ that form a single _container_. For me, the distinction between _container_ and _multi-file_ is not completely clear.
## SION file(?)
This term is currently used to describe a SIONlib container and should be replaced.2.0.0https://gitlab.jsc.fz-juelich.de/cstao-public/SIONlib/SIONlib/-/issues/134Add convenience functions to access filedesc members2021-08-25T15:20:29+02:00Kay ThustAdd convenience functions to access filedesc membersE.g. sion_get_chunksize
Alternatively fix sion_get_locations.E.g. sion_get_chunksize
Alternatively fix sion_get_locations.2.0.0