Consistent terminology for upcoming publications and improved documentation
For the upcoming version 2.0 release of SIONlib, it would be nice to have somewhat comprehensive documentation of at least the user facing API. Care should be taken to either rethink or make consistent the terminology that is currently in use. This effort should also extend to upcoming publications and training courses which ideally will use terminology consistent with the API documentation.
This ticket is meant as a place to collect and discuss suggestions for future terminology.
Parallel execution streams
Different technologies for parallel programming provide differently named software abstractions with different semantics for parallel execution streams. SIONlib via its different specific APIs (MPI, OpenMP, hybrid) has to fit into these naming schemes and ideally provide a single abstract name for this concept. Here is how these terms are defined by the various technologies and how I suggest they should be used by SIONlib.
== Process==
A parallel program that uses MPI consists of a number of parallel processes. In the simplest case all processes run the same program (possibly taking different branches inside the program) and are all started at the beginning of the computation. This picture can be more complicated via mechanisms such as MPMD and dynamic process management.
In SIONlib documentation this term should be used mostly in the documentation of the MPI API to refer to MPI processes.
Rank
In MPI, a rank is a number that is used to identify a process in the context of a specific communicator (or more precisely within a group). There is, in general, not a unique one to one map between rank numbers and processes. The terminology is often used interchangeably, but should not.
The term rank should only be used to describe exactly a number that identifies an MPI process in a specific group, mostly as part of the documentation of function signatures in the MPI API.
Thread
A parallel program using OpenMP can use a varying number of threads (officially "OpenMP threads") to run different parts of the program in parallel.
In SIONlib documentation this term should be used mostly in documentation of the OpenMP API and the hybrid API to refer to this concept from OpenMP.
Thread number
OpenMP groups threads into teams. Each thread in a team is assigned an integer thread number from 0
to team size - 1
. This is a similar concept to rank numbers in MPI.
Task
OpenMP uses the word task to refer to a mechanism of encapsulation a block of code and associated data (essentially a closure) that can be run either completely independently from other tasks or according to a set of user defined constraints such as task dependencies or explicit task synchronisation operations.
SIONlib does not directly interact with this concept from OpenMP and so task might be a candidate for a generic abstract term that can be used to refer to the API specific terms process and thread (and possibly others in the future). Task is also the abstract term used by the in-progress paper.
Alternative generic term
It might not be ideal to use task as the generic abstract term for the concept of a parallel execution stream, because it collides with the OpenMP terminology. However that is probably also the case for all alternatives. Parallel execution stream is probably too verbose. Other suggestions?
Groups, Teams, ...
Both MPI (Groups) and OpenMP (Teams) have terms to refer to sets of processes/threads. SIONlib documentation probably needs a term for this as well, but maybe its used rarely enough that set of tasks or whatever the abstract term ends up being is fine.
Components of SIONlib file containers
The SIONlib file format is quite intricate and consists of a number of components nested inside one another. These components all have their own names. In addition, the file format is influenced by file system mechanisms that have to fit into the terminology.
Container
A SIONlib container is the object stored on the file system to hold both the application data and SIONlib metadata to describe the application data layout.
Implementation aspects
A container is stored in one or more physical files. Each physical file contains two blocks of metadata, one at the beginning and one at the end and a number of blocks containing chunks between the two.
User facing aspects
A single container contains a sequence of logical files.
Logical file
This is a user facing concept that is currently variously referred to as task or rank. It describes the concatenation of all chunks in a physical file that together form a single logical part of the file that can be addressed via the rank
argument of sion_seek()
or selected for opening via the globalranks
argument of sion_paropen_mapped_XXX()
.
The terminology currently in use is probably historically motivated by the fact that there was a one to one mapping between file parts and tasks / processes (or ranks). This particular concept should probably be renamed going forward to avoid confusion with terminology from the previous section. The in-progress paper consistently uses the term logical (task-local) file.
Physical file
Describes an actual file on the file system. A single container that is opened through a single SIONlib ...open...
function might span several physical files on disk.
This term is used consistently by the in-progress paper.
Chunks
These are the smallest components of a physical file that make up a logical file. All chunks that belong to the same logical file have the same (individual per logical file) size. In previous versions of SIONlib the chunk size was an upper bound on the amount of data that could be transferred from/to a file in a single read/write function call. This limitation will be lifted once the work on continuous write is completed.
This term is currently used consistently and should remain unchanged.
Blocks
Physical files are a sequence of blocks. For every logical file, a block contains as many (possibly empty) chunks as will fit into a file system block, but at least one that might also span several file system blocks. So blocks themselves are sequences of chunks, possibly interspersed with padding to prevent sharing of file system blocks.
This term is currently used consistently and should remain unchanged.
File system blocks
Unit of storage of the underlying file system that has to be modified in a read-modify-write cycle. Sharing of file system blocks between tasks should be avoided. Thus for every logical file the first chunk in every block is aligned to the start of a file system block. In effect, no file system block will ever contain data that belongs to more than one logical file.
This term is used in a way that is consistent with its definition outside of SIONlib. There is no need to change that.
Caution: The previous description of chunk alignment ignores collective mode where data from different logical files that are handled by the same collector might coexist in the same file system block.
Range
This term is currently used in the description of the SIONlib file format to signify a sequence of file system blocks.
Multi-file
This term is currently used to refer to the collection of physical files that form a single container. For me, the distinction between container and multi-file is not completely clear.
SION file(?)
This term is currently used to describe a SIONlib container and should be replaced.