The library aims to provide an easy way to include dynamic domain-based load balancing
into particle based simulation codes. The library is developed in the Simulation Laboratory Molecular Systems of the Juelich Supercomputing Centre at Forschungszentrum Juelich.
It includes or is going to include several load-balancing schemes. The following list gives an overview about these schemes and short explanations ([o]: scheme fully included, [w]: implementation ongoing, [p]: implementation planned)
- [o] Tensor-Product:
The work on all processes is reduced over the cartesian
planes in the systems. This work is then equalized by
adjusting the borders of the cartesian planes.
- [o] Staggered-grid:
A 3-step hierarchical approach is applied, where:
(i) work over the cartesian planes is reduced, before the borders of these
planes are adjusted; (ii) in each of the cartesian planes the work is
reduced for each cartesian column. These columns are then adjusted to each
other to homogenize the work in each column; (iii) the work between
neighboring domains in each column is adjusted. Each adjustment is done
locally with the neighboring planes, columns or domains by adjusting the
adjacent boundaries.
- [w] Topological Mesh:
In contrast to the previous methods this method adjusts
domains not by moving boundaries but vertices, i.e. corner points, of
domains. For each vertex a force, based on the differences
in work of the neighboring domains, is computed and the
vertex is shifted in a way to equalize the work between these
neighboring domains.
- [w] Voronoi Mesh:
Similar to the topological mesh method, this method computes a
force, based on work differences. In contrast to the topological mesh
method, the force acts on a Voronoi point rather than a vertex, i.e. a
point defining a Voronoi cell, which describes the domain. Consequently,
the number of neighbors is not a conserved quantity, i.e. the topology
may change over time. ALL uses the Voro++ library published by the
Lawrance Berkeley Laboratory for the generation of the Voronoi mesh.
- [o] Histogram-based Staggered-grid:
Resulting in the same grid, as the staggered-grid scheme, this scheme uses
the cumulative work function in each of the three cartesian directions in
order to generate this grid. Using histograms and the previously defined
distribution of process domains in a cartesian grid, this scheme generates
in three steps a staggered-grid result, in which the work is distributed as
evenly as the resolution of the underlying histogram allows. In constrast to
the previously mentioned schemes this scheme depends on a global exchange of
work between processes.
- [p] Orthogonal Recursive Bisection
Comparable to the Histogram-based Staggered-grid scheme, this scheme uses
cumulative work functions evaluated by histograms in order to find a new
distribution of workload in a hierarchical manner. In constrast to the
Histogram-based Staggered-grid scheme, the subdivision of domains is not
based on a cartesian division of processes, but on the prime factorization
of the total number of processes. During each step, the current slice of the
system is distributed into smaller sub-slices along the longest edge, that
roughly contain the same workload. The number of sub-slices in then determined
by the corresponding prime number of that step, i.e. when trying to establish
a distribution for 24 (3 * 2 * 2 * 2) processes, the bisection would use four
steps. In the first step the system would be subdivided into 3 subdomains along
the largest edge, each containing roughly the same workload. Each of these subdomains
then is divided into two subsubdomains indepenently. This procedure is then repeated
for all the prime numbers in the prime factorisation.
Installation & Requirements:
Base requirements:
C++11
MPI
CMake 3.14+
Optional requirements:
Fortran 2003
Fortran 2008 (if the more modern Fortran MPI interface is to be used)