Skip to content
Snippets Groups Projects
Commit 65c196b6 authored by Stephan Schulz's avatar Stephan Schulz
Browse files

move information from README to sphinx documentation

parent c7b0a5bd
Branches
Tags
1 merge request!8Refactor
Pipeline #54077 failed
...@@ -5,62 +5,10 @@ load balancing into particle based simulation codes. The library is ...@@ -5,62 +5,10 @@ load balancing into particle based simulation codes. The library is
developed in the Simulation Laboratory Molecular Systems of the Jülich developed in the Simulation Laboratory Molecular Systems of the Jülich
Supercomputing Centre at Forschungszentrum Jülich. Supercomputing Centre at Forschungszentrum Jülich.
It includes several load balancing schemes. The following list gives an Only a brief summary is given here and more information can be found in
overview about these schemes and short explanations. the [official documentation](docs/index.rst) such as a detailed API
description, examples and further information regarding the load
- **Tensor product**: The work on all processes is reduced over the balancing methods.
cartesian planes in the systems. This work is then equalized by
adjusting the borders of the cartesian planes.
- **Staggered grid**: A 3-step hierarchical approach is applied, where:
1. work over the cartesian planes is reduced, before the borders of
these planes are adjusted
2. in each of the cartesian planes the work is reduced for each
cartesian column. These columns are then adjusted to each other to
homogenize the work in each column
3. the work between neighboring domains in each column is adjusted.
Each adjustment is done locally with the neighboring planes,
columns or domains by adjusting the adjacent boundaries.
- **Topological mesh** (WIP): In contrast to the previous methods this
method adjusts domains not by moving boundaries but vertices, i.e.
corner points, of domains. For each vertex a force, based on the
differences in work of the neighboring domains, is computed and the
vertex is shifted in a way to equalize the work between these
neighboring domains.
- **Voronoi mesh** (WIP): Similar to the topological mesh method, this
method computes a force, based on work differences. In contrast to
the topological mesh method, the force acts on a Voronoi point rather
than a vertex, i.e. a point defining a Voronoi cell, which describes
the domain. Consequently, the number of neighbors is not a conserved
quantity, i.e. the topology may change over time. ALL uses the Voro++
library published by the Lawrence Berkeley National Laboratory for the
generation of the Voronoi mesh.
- **Histogram based staggered grid**: Resulting in the same grid, as
the staggered grid scheme, this scheme uses the cumulative work
function in each of the three cartesian directions in order to
generate this grid. Using histograms and the previously defined
distribution of process domains in a cartesian grid, this scheme
generates in three steps a staggered-grid result, in which the work
is distributed as evenly as the resolution of the underlying
histogram allows. In contrast to the previously mentioned schemes
this scheme depends on a global exchange of work between processes.
- **Orthogonal recursive bisection** (Planned): Comparable to the
histogram based staggered grid scheme, this scheme uses cumulative
work functions evaluated by histograms in order to find a new
distribution of workload in a hierarchical manner. In contrast to the
histogram based staggered grid scheme, the subdivision of domains is
not based on a cartesian division of processes, but on the prime
factorization of the total number of processes. During each step, the
current slice of the system is distributed into smaller sub-slices
along the longest edge, that roughly contain the same workload. The
number of sub-slices in then determined by the corresponding prime
number of that step, i.e. when trying to establish a distribution for
24 (3 * 2 * 2 * 2) processes, the bisection would use four steps. In
the first step the system would be subdivided into 3 sub domains
along the largest edge, each containing roughly the same workload.
Each of these sub domains then is divided into two sub sub domains
independently. This procedure is then repeated for all the prime
numbers in the prime factorization.
## Installation and Requirements ## Installation and Requirements
...@@ -86,337 +34,11 @@ Optional requirements: ...@@ -86,337 +34,11 @@ Optional requirements:
`$ALL_ROOT_DIR`. `$ALL_ROOT_DIR`.
2. Create the build directory `$ALL_BUILD_DIR` some place else. 2. Create the build directory `$ALL_BUILD_DIR` some place else.
3. Call `cmake -S "$ALL_ROOT_DIR" -B "$ALL_BUILD_DIR"` to set up the 3. Call `cmake -S "$ALL_ROOT_DIR" -B "$ALL_BUILD_DIR"` to set up the
installation. Compile time features can be selected from: installation. To use a specific compiler and Boost installation
- `-DCMAKE_INSTALL_PREFIX=$ALL_INSTALL_DIR` (default: system use: `CC=gcc CXX=g++ BOOST_ROOT=$BOOST_DIR cmake [...]`.
dependant): Install into `$ALL_INSTALL_DIR` after compiling.
- `-DCM_ALL_VTK_OUTPUT=ON` (default: `OFF`): Enable VTK output
of domains. Requires VTK 7.1 or higher.
- `-DCM_ALL_FORTRAN=ON` (default: `OFF`): Enable Fortran interface.
Requires Fortran 2003 capable compiler.
- `-DCM_ALL_USE_F08=ON` (default: `OFF`): Enable usage of `mpi_f08`
module for MPI. Requires Fortran 2008 capable compiler and
compatible MPI installation.
- `-DCM_ALL_FORTRAN_ERROR_ABORT=ON` (default: `OFF): Abort
execution on any error when using the Fortran interface instead
of setting the error number and leaving error handling to the
user.
- `-DCM_ALL_VORONOI=ON` (default: `OFF`): Enable Voronoi mesh
method and subsequently compilation and linkage of Voro++.
- `-DCM_ALL_TESTING=ON` (default: `OFF`): Turns on unit and feature
tests. Can be run from the build directory with `ctest`. Requires
the Boost test utilities.
- `-DCMAKE_BUILD_TYPE=Debug` (default: `Release`): Enable library
internal debugging features. Using `DebugWithOpt` also turns on
some optimizations.
- `-DCM_ALL_DEBUG=ON` (default: `OFF`): Enable self consistency
checks within the library itself.
- `-DVTK_DIR=$VTK_DIR` (default: system dependant): Path to an
explicit VTK installation. Make sure that VTK is compiled with
the same MPI as the library and subsequently your code.
To use a specific compiler and Boost installation use: `CC=gcc
CXX=g++ BOOST_ROOT=$BOOST_DIR cmake [...]`.
4. To build and install the library then run: `cmake --build 4. To build and install the library then run: `cmake --build
"$ALL_BUILD_DIR"` and `cmake --install "$ALL_BUILD_DIR"`. "$ALL_BUILD_DIR"` and `cmake --install "$ALL_BUILD_DIR"`.
Afterwards, the built examples and library files are placed in Afterwards, the built examples and library files are placed in
`$ALL_INSTALL_DIR`. `$ALL_INSTALL_DIR`.
## Usage
The library is presented as a single load balancing object `ALL::ALL`
from `ALL.hpp`. All classes are encapsulated in the `ALL` name space.
Simple (and not so simple) examples are available in the `/examples` sub
directory. The simple examples are also documented at length.
Errors are treated as exceptions that are thrown of a (sub) class of
`ALL::CustomException`. Likely candidates that may throw exceptions
are `balance`,`setup`, `printVTKoutlines`.
### ALL object
The ALL object can be constructed with
ALL::ALL<T,W> (
const ALL::LB_t method,
const int dimension,
const T gamma)
Where the type of the boundaries is given by `T` and the type of the
work as `W`, which are usually float or double. The load balancing
method must be one of `ALL::LB_t::TENSOR`, `...STAGGERED`,
`...FORCEBASED`, `...VORONOI`, `...HISTOGRAM`. Where the Voronoi
method must be enabled at compile time. There is also a second form
where the initial domain vertices are already provided:
ALL::ALL<T,W> (
const ALL::LB_t method,
const int dimension,
const std::vector<ALL::Point<T>> &initialDomain,
const T gamma)
### Point object
The domain boundaries are given and retrieved as a vector of
`ALL::Point`s. These can be constructed via
ALL::Point<T> ( const int dimension )
ALL::Point<T> ( const int dimension, const T *values )
ALL::Point<T> ( const std::vector<T> &values )
and its elements can be accessed through the `[]` operator like a `std::vector`
### Set up
A few additional parameters must be set before the domains can be
balanced. Which exactly need to be set, is dependent on the method used,
but in general the following are necessary
void ALL::ALL<T,W>::setVertices ( std::vector<ALL::Point<T>> &vertices )
void ALL::ALL<T,W>::setCommunicator ( MPI_Comm comm )
void ALL::ALL<T,W>::setWork ( const W work )
If the given communicator is not a cartesian communicator the process
grid parameters must also be set beforehand (!) using
void ALL::ALL<T,W>::setProcGridParams (
const std::vector<int> &location,
const std::vector<int> &size)
with the location of the current process and number of processes in each
direction.
To trace the current domain better an integer tag can be provided, which is
used in the domain output, with
void ALL::ALL<T,W>::setProcTag ( int tag )
and an observed minimal domain size in each direction can be set using
void ALL::ALL<T,W>::setMinDomainSize (
const std::vector<T> &minSize )
Once these parameters are set call
void ALL::ALL<T,W>::setup()
so internal set up routines are run.
### Balancing
To create the new domain vertices make sure the current vertices are set
using `setVertices` and the work is set using `setWork` then call
void ALL::ALL<T,W>::balance()
and then the new vertices can be retrieved by using
std::vector<ALL::Point<T>> &ALL::ALL<T,W>::getVertices ()
or new neighbors with
std::vector<int> &ALL::ALL<T,W>::getNeighbors ()
### Special considerations for the Fortran interface
The Fortran interface exists in two versions. Either in the form of
ALL_balance(all_object)
where the all object of type `type(ALL_t)` is given as first argument to
each function and as more object oriented functions to the object
itself. So the alternative call would be
all_object%balance()
The function names themselves are similar, i.e. instead of camel case
they are all lowercase except the first `ALL_` with underscores between
words. So `setProcGridParams` becomes `ALL_set_proc_grid_parms`. For
more usage info please check the Fortran example programs.
One important difference is the handling of MPI types, which change
between the MPI Fortran 2008 interface as given by the `mpi_f08` module
and previous interfaces. In previous interfaces all MPI types are
`INTEGER`s, but using the `mpi_f08` module the communicator is of
`type(MPI_Comm)`. To allow using `ALL_set_communicator` directly with
the communicator used in the user's application, build the library with
enabled Fortran 2008 features and this communicator type is expected.
Retrieving information from the balancer is also different, since most
getter return (a reference to) an object itself. The Fortran subroutines
set the values of its arguments. As an example
int work = all.getWork();
becomes
integer(c_int) :: work
call ALL_get_work(work) !or
!call all%get_work(work)
Since there is no exception handling in Fortran, the error handling from
the libc is borrowed. So there are additional function that retrieve and
reset an error number as well as an additional error message describing
the error. These must be queried and handled by the user. An example of
this usage is shown in the `ALL_Staggered_f` example when printing the
VTK outlines.
### Details of the balancing methods
#### Tensor based
In order to equalize the load of individual domains, the assumption is
made that this can be achieved by equalizing the work in each cartesian
direction, i.e. the work of all domains having the same coordinate in a
cartesian direction is collected and the width of all these domains in
this direction is adjusted by comparing this collective work with the
collective work of the neighboring domains. This is done independently
for each cartesian direction in the system.
Required number of vertices:
- two, one describing the lower left front point and one describing the
upper right back point of the domain
Advantages:
- topology of the system is maintained (orthogonal domains, neighbor
relations)
- no update for the neighbor relations is required in the calling
code
- if a code was able to deal with orthogonal domains, only small
changes are expected to include this strategy
Disadvantages:
- due to the comparison of collective work loads in cartesian layers
and restrictions resulting from this construction, the final result
might lead to a sub-optimal domain distribution
#### Staggered grid
The staggered grid approach is a hierarchical one. In a first step the
work of all domains sharing the same cartesian coordinate with respect
to the highest dimension (z in three dimensions) is collected. Then,
like in the TENSOR strategy the layer width in this dimension is
adjusted based on comparison of the collected work with the collected
work of the neighboring domains in the same dimension. As a second step
each of these planes is divided into a set of columns, where all domains
share the same cartesian coordinate in the next lower dimension (y in
three dimensions). For each of these columns the before described
procedure is repeated, i.e. work collected and the width of the columns
adjusted accordingly. Finally, in the last step the work of individual
domains is compared to direct neighbors and the width in the last
dimension (x in three dimensions) adjusted. This leads to a staggered
grid, that is much better suited to describe inhomogeneous work
distributions than the TENSOR strategy.
Required number of vertices:
- two, one describing the lower left front point and one describing the
upper right back point of the domain
Advantages:
- very good equalization results for the work load
- maintains orthogonal domains
Disadvantages:
- changes topology of the domains and requires adjustment of neighbor
relations
- communication pattern in the calling code might require adjustment to
deal with changing neighbors
#### Histogram based staggered grid
The histogram-based method works differently than the first two methods
in such a way, that it is a global method that requires three distinct
steps for a single adjustment. In each of these steps the following
takes place: a partial histogram needs to be created over the workload,
e.g. number of particles, along one direction on each domain, then these
are supplied to the method, where a global histogram is computed. With
this global histogram a distribution function is created. This is used
to compute the optimal (possible) width of domains in that direction.
For the second and third steps the computation of the global histograms
and distribution functions take place in subsystems, being the results
of the previous step. The result is the most optimal distribution of
domains according to the Staggered-grid method, at the cost of global
exchange of work, due to the global adjustment, which makes the method
not well suited to highly dynamic systems, due to the need of frequent
updates. On the other hand the method is well suited for static
problems, e.g. grid-based simulations.
Note: Currently the order of dimensions is: z-y-x.
Required number of vertices:
- two, one describing the lower left front point and one describing the
upper right back point of the domain
Additional requirements:
- partial histogram created over the workload on the local domain in
the direction of the current correction step
Advantages:
- supplies an optimal distribution of domains (restricted by width of
bins used for the histogram)
- only three steps needed to acquire result
Disadvantages:
- expensive in cost of communication, due to global shifts
- requires preparation of histogram and shift of work between each of
the three correction steps
## Example programs
In the `/examples` sub directory several C++ and Fortran example
programs are placed.
### `ALL_test`
MPI C++ code that generates a particle distribution over the domains. At
program start the domains form an orthogonal decomposition of the cubic
3d system. Each domain has the same volume as each other domain.
Depending on the cartesian coordinates of the domain in this
decomposition, a number of points is created on each domain. The points
are then distributed uniformly over the domain. For the number of points
a polynomial formula is used to create a non-uniform point distribution.
As an estimation of the work load in this program the number of points
within a domain was chosen. There is no particle movement included in
the current version of the example code, therefore particle
communication only takes place due to the shift of domain borders.
The program creates three output files in its basic version:
- `minmax.dat`:
- **Columns**: `<iteration count> <W_min/W_avg> <W_max_W_avg>
<(W_max-W_min)/(W_max+W_min)>`
- **Explanation**: In order to try to give a comparable metric for
the success of the load balancing procedure the relatative
difference of the minimum and maximum work loads in the system in
relation to the average work load of the system are given. The
last column gives an indicator for the inequality of work in the
system, in a perfectly balanced system, this value should be equal
to zero.
- `stddev.txt`:
- **Columns**: `<iteration count> <std_dev>`
- **Explanation**: The standard deviation of work load over all
domains.
- `domain_data.dat`:
- **Columns**: `<rank> <x_min> <x_max> <y_min> <y_max> <z_min>
<z_max> <W>`
- **Explanation**: There are two blocks of rows, looking like this,
the first block is the starting configuration of the domains,
which should be a uniform grid of orthogonal domains. The second
block is the configuration after 200 load balancing steps. In each
line the MPI rank of the domain and the extension in each
cartesian direction is given, as well as the work load (the number
of points).
If the library is compiled with VTK support, `ALL_test` also creates a VTK
based output of the domains in each iteration step. This output will be
placed in a `vtk_outline` sub directory, which is created if it does not
already exist. The resulting output can be visualized with tools like
ParaView or VisIt.
### `ALL_test_f` and `ALL_test_f_obj`
The Fortran example provides a more basic version of the test program
`ALL_test`, as its main goal is to show the functionality of the Fortran
interface. The code creates a basic uniform orthogonal domain
decomposition and creates an inhomogeneous particle distribution over
these. Only one load balancing step is executed and the program prints
out the domain distribution of the start configuration and of the final
configuration.
### `ALL_Staggered` and `ALL_Staggered_f`
These create a very simple load balancing situation with fixed load and
domains placed along the z direction. The C++ and Fortran versions are
functionally the same, so the differences between the interfaces can be
checked.
<!-- vim: set tw=72 et ts=4 spell spelllang=en_us ft=markdown: --> <!-- vim: set tw=72 et ts=4 spell spelllang=en_us ft=markdown: -->
.. _examples:
Examples
========
In the ``/examples`` sub directory several C++ and Fortran example
programs are placed.
``ALL_test``
------------
MPI C++ code that generates a particle distribution over the domains. At
program start the domains form an orthogonal decomposition of the cubic 3d
system. Each domain has the same volume as each other domain. Depending
on the cartesian coordinates of the domain in this decomposition, a number
of points is created on each domain. The points are then distributed
uniformly over the domain. For the number of points a polynomial formula
is used to create a non-uniform point distribution. As an estimation of
the work load in this program the number of points within a domain was
chosen. There is no particle movement included in the current version of
the example code, therefore particle communication only takes place due to
the shift of domain borders.
The program creates three output files in its basic version:
``minmax.dat``
Columns
``<iteration count> <W_min/W_avg> <W_max_W_avg>
<(W_max-W_min)/(W_max+W_min)>``
Explanation
In order to try to give a comparable metric for the success of the
load balancing procedure the relative difference of the minimum and
maximum work loads in the system in relation to the average work load
of the system are given. The last column gives an indicator for the
inequality of work in the system, in a perfectly balanced system, this
value should be equal to zero.
``stddev.txt``
Columns
``<iteration count> <std_dev>``
Explanation
The standard deviation of work load over all domains.
``domain_data.dat``
Columns
``<rank> <x_min> <x_max> <y_min> <y_max> <z_min> <z_max> <W>``
Explanation
There are two blocks of rows, looking like this, the first block is
the starting configuration of the domains, which should be a uniform
grid of orthogonal domains. The second block is the configuration
after 200 load balancing steps. In each line the MPI rank of the
domain and the extension in each cartesian direction is given, as well
as the work load (the number of points).
If the library is compiled with VTK support, ``ALL_test`` also creates a
VTK based output of the domains in each iteration step. This output will
be placed in a ``vtk_outline`` sub directory, which is created if it does
not already exist. The resulting output can be visualized with tools like
ParaView or VisIt.
``ALL_test_f`` and ``ALL_test_f_obj``
-------------------------------------
The Fortran example provides a more basic version of the test program
``ALL_test``, as its main goal is to show the functionality of the Fortran
interface. The code creates a basic uniform orthogonal domain
decomposition and creates an inhomogeneous particle distribution over
these. Only one load balancing step is executed and the program prints
out the domain distribution of the start configuration and of the final
configuration.
``ALL_Staggered`` and ``ALL_Staggered_f``
-----------------------------------------
These create a very simple load balancing situation with fixed load and
domains placed along the z direction. The C++ and Fortran versions are
functionally the same, so the differences between the interfaces can be
checked.
.. vim: et sw=2 ts=2 tw=74 spell spelllang=en_us:
.. _install:
Installation
============
Requirements
------------
Base requirements
- C++11 capable compiler
- MPI support
- CMake v.3.14 or higher
Optional requirements
- Fortran 2003 capable compiler (Fortran interface)
- Fortran 2008 capable compiler (Usage of the ``mpi_f08`` interface)
- VTK 7.1 or higher (Domain output)
- Boost testing utilities
- Doxygen and Sphinx with ``breathe`` (Documentation)
Installation
------------
1. Clone the library from
``https://gitlab.version.fz-juelich.de/SLMS/loadbalancing`` into
``$ALL_ROOT_DIR``.
2. Create the build directory ``$ALL_BUILD_DIR`` some place else.
3. Call ``cmake -S "$ALL_ROOT_DIR" -B "$ALL_BUILD_DIR"`` to set up the
installation. Other compile time features should be set here. To use a
specific compiler and Boost installation use: ``CC=gcc CXX=g++
BOOST_ROOT=$BOOST_DIR cmake [...]``.
4. To build and install the library then run: ``cmake --build
"$ALL_BUILD_DIR"`` and ``cmake --install "$ALL_BUILD_DIR"``. Afterwards,
the built examples and library files are placed in ``$ALL_INSTALL_DIR``.
Compile time features
*********************
``-DCMAKE_INSTALL_PREFIX=$ALL_INSTALL_DIR`` (default: system dependent)
Install into ``$ALL_INSTALL_DIR`` after compiling.
``-DCM_ALL_VTK_OUTPUT=ON`` (default: ``OFF``)
Enable VTK output of domains. Requires VTK 7.1 or higher.
``-DCM_ALL_FORTRAN=ON`` (default: ``OFF``)
Enable Fortran interface. Requires Fortran 2003 capable compiler.
``-DCM_ALL_USE_F08=ON`` (default: ``OFF``)
Enable usage of ``mpi_f08`` module for MPI. Requires Fortran 2008
capable compiler and compatible MPI installation.
``-DCM_ALL_FORTRAN_ERROR_ABORT=ON`` (default: ``OFF``)
Abort execution on any error when using the Fortran interface instead of
setting the error number and leaving error handling to the user.
``-DCM_ALL_VORONOI=ON`` (default: ``OFF``)
Enable Voronoi mesh method and subsequently compilation and linkage of
Voro++.
``-DCM_ALL_TESTING=ON`` (default: ``OFF``)
Turns on unit and feature tests. Can be run from the build directory
with ``ctest``. Requires the Boost test utilities.
``-DCMAKE_BUILD_TYPE=Debug`` (default: ``Release``)
Enable library internal debugging features. Using ``DebugWithOpt`` also
turns on some optimizations.
``-DCM_ALL_DEBUG=ON`` (default: ``OFF``)
Enable self consistency checks within the library itself.
``-DVTK_DIR=$VTK_DIR`` (default: system dependent)
Path to an explicit VTK installation. Make sure that VTK is compiled
with the same MPI as the library and subsequently your code.
.. vim: et sw=2 ts=2 tw=74 spell spelllang=en_us:
.. _usage:
Usage
=====
The library is presented as a single load balancing object ``ALL::ALL``
from ``ALL.hpp``. All classes are encapsulated in the ``ALL`` name space.
Simple (and not so simple) examples are available in the ``/examples`` sub
directory. The simple examples are also documented at length.
Errors are treated as exceptions that are thrown of a (sub) class of
``ALL::CustomException``. Likely candidates that may throw exceptions are
``balance``,``setup``, ``printVTKoutlines``.
ALL object
----------
The ALL object can be constructed with
.. code-block:: c++
ALL::ALL<T,W> (
const ALL::LB_t method,
const int dimension,
const T gamma)
Where the type of the boundaries is given by ``T`` and the type of the
work as ``W``, which are usually float or double. The load balancing
method must be one of ``ALL::LB_t::TENSOR``, ``...STAGGERED``,
``...FORCEBASED``, ``...VORONOI``, ``...HISTOGRAM``. Where the Voronoi
method must be enabled at compile time. There is also a second form where
the initial domain vertices are already provided:
.. code-block:: c++
ALL::ALL<T,W> (
const ALL::LB_t method,
const int dimension,
const std::vector<ALL::Point<T>> &initialDomain,
const T gamma)
Point object
------------
The domain boundaries are given and retrieved as a vector of
``ALL::Point`` s. These can be constructed via
.. code-block:: c++
ALL::Point<T> ( const int dimension )
ALL::Point<T> ( const int dimension, const T *values )
ALL::Point<T> ( const std::vector<T> &values )
and its elements can be accessed through the ``[]`` operator like a
``std::vector``.
Set up
------
A few additional parameters must be set before the domains can be
balanced. Which exactly need to be set, is dependent on the method used,
but in general the following are necessary
.. code-block:: c++
void ALL::ALL<T,W>::setVertices ( std::vector<ALL::Point<T>> &vertices )
void ALL::ALL<T,W>::setCommunicator ( MPI_Comm comm )
void ALL::ALL<T,W>::setWork ( const W work )
If the given communicator is not a cartesian communicator the process grid
parameters must also be set beforehand (!) using
.. code-block:: c++
void ALL::ALL<T,W>::setProcGridParams (
const std::vector<int> &location,
const std::vector<int> &size)
with the location of the current process and number of processes in each
direction.
To trace the current domain better an integer tag can be provided, which
is used in the domain output, with
.. code-block:: c++
void ALL::ALL<T,W>::setProcTag ( int tag )
and an observed minimal domain size in each direction can be set using
.. code-block:: c++
void ALL::ALL<T,W>::setMinDomainSize (
const std::vector<T> &minSize )
Once these parameters are set call
.. code-block:: c++
void ALL::ALL<T,W>::setup()
so internal set up routines are run.
Balancing
---------
To create the new domain vertices make sure the current vertices are set
using ``setVertices`` and the work is set using ``setWork`` then call
.. code-block:: c++
void ALL::ALL<T,W>::balance()
and then the new vertices can be retrieved by using
.. code-block:: c++
std::vector<ALL::Point<T>> &ALL::ALL<T,W>::getVertices ()
or new neighbors with
.. code-block:: c++
std::vector<int> &ALL::ALL<T,W>::getNeighbors ()
Special considerations for the Fortran interface
------------------------------------------------
The Fortran interface exists in two versions. Either in the form of
.. code-block:: fortran
ALL_balance(all_object)
where the all object of type ``type(ALL_t)`` is given as first argument to
each function and as more object oriented functions to the object itself.
So the alternative call would be
.. code-block:: fortran
all_object%balance()
The function names themselves are similar, i.e. instead of camel case they
are all lowercase except the first ``ALL_`` with underscores between
words. So ``setProcGridParams`` becomes ``ALL_set_proc_grid_parms``. For
more usage info please check the Fortran example programs.
One important difference is the handling of MPI types, which change
between the MPI Fortran 2008 interface as given by the ``mpi_f08`` module
and previous interfaces. In previous interfaces all MPI types are
``INTEGER`` s, but using the ``mpi_f08`` module the communicator is of
``type(MPI_Comm)``. To allow using ``ALL_set_communicator`` directly with
the communicator used in the user's application, build the library with
enabled Fortran 2008 features and this communicator type is expected.
Retrieving information from the balancer is also different, since most
getter return (a reference to) an object itself. The Fortran subroutines
set the values of its arguments. As an example
.. code-block:: c++
int work = all.getWork();
becomes
.. code-block:: fortran
integer(c_int) :: work
call ALL_get_work(work) !or
!call all%get_work(work)
Since there is no exception handling in Fortran, the error handling from
the libc is borrowed. So there are additional function that retrieve and
reset an error number as well as an additional error message describing
the error. These must be queried and handled by the user. An example of
this usage is shown in the ``ALL_Staggered_f`` example when printing the
VTK outlines.
.. vim: et sw=2 ts=2 tw=74 spell spelllang=en_us:
...@@ -11,9 +11,15 @@ Welcome to the documentation of ALL ...@@ -11,9 +11,15 @@ Welcome to the documentation of ALL
:caption: Contents: :caption: Contents:
:includehidden: :includehidden:
Install.rst
Usage.rst
Examples.rst
api/index.rst api/index.rst
methods/index.rst
todo(s.schulz): Link to doxygen documentation.
The full (developer) documentation of the source code as provided by
doxygen can be found `here <../doxygen/index.html>`_.
Indices and tables Indices and tables
================== ==================
......
.. _histogramDetails:
Histogram method
================
The histogram-based method works differently than the first two methods
in such a way, that it is a global method that requires three distinct
steps for a single adjustment. In each of these steps the following
takes place: a partial histogram needs to be created over the workload,
e.g. number of particles, along one direction on each domain, then these
are supplied to the method, where a global histogram is computed. With
this global histogram a distribution function is created. This is used
to compute the optimal (possible) width of domains in that direction.
For the second and third steps the computation of the global histograms
and distribution functions take place in subsystems, being the results
of the previous step. The result is the most optimal distribution of
domains according to the Staggered-grid method, at the cost of global
exchange of work, due to the global adjustment, which makes the method
not well suited to highly dynamic systems, due to the need of frequent
updates. On the other hand the method is well suited for static
problems, e.g. grid-based simulations.
Note: Currently the order of dimensions is: z-y-x.
Required number of vertices
- two, one describing the lower left front point and one describing the
upper right back point of the domain
Additional requirements
- partial histogram created over the workload on the local domain in
the direction of the current correction step
Advantages
- supplies an optimal distribution of domains (restricted by width of
bins used for the histogram)
- only three steps needed to acquire result
Disadvantages
- expensive in cost of communication, due to global shifts
- requires preparation of histogram and shift of work between each of
the three correction steps
.. vim: et sw=2 ts=2 tw=74 spell spelllang=en_us:
.. _staggeredDetails:
Staggered method
================
The staggered grid approach is a hierarchical one. In a first step the
work of all domains sharing the same cartesian coordinate with respect
to the highest dimension (z in three dimensions) is collected. Then,
like in the TENSOR strategy the layer width in this dimension is
adjusted based on comparison of the collected work with the collected
work of the neighboring domains in the same dimension. As a second step
each of these planes is divided into a set of columns, where all domains
share the same cartesian coordinate in the next lower dimension (y in
three dimensions). For each of these columns the before described
procedure is repeated, i.e. work collected and the width of the columns
adjusted accordingly. Finally, in the last step the work of individual
domains is compared to direct neighbors and the width in the last
dimension (x in three dimensions) adjusted. This leads to a staggered
grid, that is much better suited to describe inhomogeneous work
distributions than the TENSOR strategy.
Required number of vertices
- two, one describing the lower left front point and one describing the
upper right back point of the domain
Advantages
- very good equalization results for the work load
- maintains orthogonal domains
Disadvantages
- changes topology of the domains and requires adjustment of neighbor
relations
- communication pattern in the calling code might require adjustment to
deal with changing neighbors
.. vim: et sw=2 ts=2 tw=74 spell spelllang=en_us:
.. _tensorDetails:
Tensor method
=============
In order to equalize the load of individual domains, the assumption is
made that this can be achieved by equalizing the work in each cartesian
direction, i.e. the work of all domains having the same coordinate in a
cartesian direction is collected and the width of all these domains in
this direction is adjusted by comparing this collective work with the
collective work of the neighboring domains. This is done independently
for each cartesian direction in the system.
Required number of vertices
- two, one describing the lower left front point and one describing the
upper right back point of the domain
Advantages
- topology of the system is maintained (orthogonal domains, neighbor
relations)
- no update for the neighbor relations is required in the calling
code
- if a code was able to deal with orthogonal domains, only small
changes are expected to include this strategy
Disadvantages
- due to the comparison of collective work loads in cartesian layers
and restrictions resulting from this construction, the final result
might lead to a sub-optimal domain distribution
.. vim: et sw=2 ts=2 tw=74 spell spelllang=en_us:
The Loadbalancing Methods
=========================
.. toctree::
:hidden:
Tensor.rst
Staggered.rst
Histogram.rst
A short overview of the methods is given below and more details can be
found on their respective page.
:ref:`Tensor product<tensorDetails>`
The work on all processes is reduced over the cartesian planes in the
systems. This work is then equalized by adjusting the borders of the
cartesian planes.
:ref:`Staggered grid<staggeredDetails>`
A 3-step hierarchical approach is applied, where:
1. work over the cartesian planes is reduced, before the borders of
these planes are adjusted
2. in each of the cartesian planes the work is reduced for each
cartesian column. These columns are then adjusted to each other to
homogenize the work in each column
3. the work between neighboring domains in each column is adjusted.
Each adjustment is done locally with the neighboring planes,
columns or domains by adjusting the adjacent boundaries.
:ref:`Topological mesh<topologicalDetails>` (WIP)
In contrast to the previous methods this method adjusts domains not by
moving boundaries but vertices, i.e. corner points, of domains. For
each vertex a force, based on the differences in work of the neighboring
domains, is computed and the vertex is shifted in a way to equalize the
work between these neighboring domains.
:ref:`Voronoi mesh<voronoiDetails>` (WIP)
Similar to the topological mesh method, this method computes a force,
based on work differences. In contrast to the topological mesh method,
the force acts on a Voronoi point rather than a vertex, i.e. a point
defining a Voronoi cell, which describes the domain. Consequently, the
number of neighbors is not a conserved quantity, i.e. the topology may
change over time. ALL uses the Voro++ library published by the Lawrence
Berkeley National Laboratory for the generation of the Voronoi mesh.
:ref:`Histogram based staggered grid<histogramDetails>`
Resulting in the same grid, as the staggered grid scheme, this scheme
uses the cumulative work function in each of the three cartesian
directions in order to generate this grid. Using histograms and the
previously defined distribution of process domains in a cartesian grid,
this scheme generates in three steps a staggered-grid result, in which
the work is distributed as evenly as the resolution of the underlying
histogram allows. In contrast to the previously mentioned schemes this
scheme depends on a global exchange of work between processes.
:ref:`Orthogonal recursive bisection<orthogonalDetails>` (Planned)
Comparable to the histogram based staggered grid scheme, this scheme
uses cumulative work functions evaluated by histograms in order to find
a new distribution of workload in a hierarchical manner. In contrast to
the histogram based staggered grid scheme, the subdivision of domains is
not based on a cartesian division of processes, but on the prime
factorization of the total number of processes. During each step, the
current slice of the system is distributed into smaller sub-slices along
the longest edge, that roughly contain the same workload. The number of
sub-slices in then determined by the corresponding prime number of that
step, i.e. when trying to establish a distribution for 24 (3 * 2 * 2 *
2) processes, the bisection would use four steps. In the first step the
system would be subdivided into 3 sub domains along the largest edge,
each containing roughly the same workload. Each of these sub domains
then is divided into two sub sub domains independently. This procedure
is then repeated for all the prime numbers in the prime factorization.
.. vim: et sw=2 ts=2 tw=74 spell spelllang=en_us:
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment