Skip to content
Snippets Groups Projects
Select Git revision
  • a019dfb3c88efbd2a900a62b74bdcf70b7e6a0bb
  • devel default
  • 107-compilation-error-when-building-maestro-core-on-m1-apple-processors
  • 108-implement-cpu-id-query-for-apple-m1-hardware
  • 58-scripting-interface-to-maestro-core
  • 101-need-ci-test-using-installed-maestro
  • 57-sphinx-documentation
  • 105-memory-leak-in-pm-message-envelope-handling
  • 104-permit-disabling-memory-pool
  • 103-liberl-installation-issue-on-devel
  • 94-maestro-rdma-transport-ignores-max_msg_size-2
  • main protected
  • 102-possible-race-in-check_pm_redundant_interlock-test
  • 97-check-if-shm-provider-can-be-enabled-after-libfabric-1-14-is-in-our-tree-2
  • 100-include-maestro-attributes-h-cannot-include-mamba-header-from-deps-path
  • 97-check-if-shm-provider-can-be-enabled-after-libfabric-1-14-is-in-our-tree
  • 17-job-failed-282354-needs-update-of-mio-interface-and-build-rules
  • 96-test-libfabric-update-to-1-13-or-1-14
  • feature/stop-telemetry-after-all-left
  • 94-maestro-rdma-transport-ignores-max_msg_size
  • 93-improve-performance-of-mstro_attribute_val_cmp_str
  • v0.3_rc1
  • maestro_d65
  • d65_experiments_20211113
  • v0.2
  • v0.2_rc1
  • d3.3
  • d3.3-review
  • d5.5
  • d5.5-review
  • v0.1
  • d3.2
  • d3.2-draft
  • v0.0
34 results

maestro-core

Utz-Uwe Haus's avatar
Utz-Uwe Haus authored
a694f99d Bump version to 0.1.8
6a177af0 Merge branch '106-implement-irregular-1d-layout-for-distributed-cdos' into devel
fc3abf31 Replace error checking with CHECK_STATUS; fix return-without-unlock bug;
f67a4d51 Replace unnecessary tile iterator with explicit indexing in copy
daa2a932 Update readme for new config options and misc new content
7822bd75 Update topology backend to discover rocm devices when available, and to create unlimited size spaces when hwloc size attribute is not available rather than failing to create a space
d05353d5 Fix default execution context for GPU memory when hip is enabled; Add environment variable to supply extra args to hipcc during configure time when necessary; Update readme for new hip env variable
506ca545 Merge in OpenCL fpga support:     Update readme for opencl/fpga common issues     Remove tile-duplicate-opencl example, will be returned when device tile metadata is supported for OpenCL     Update opencl buffer copy example to read binary file from env variable if we cannot jit compile the kernel     Replace MMB_GPU_OPENCL execution context with MMB_OPENCL generic context     Add interop headers to install targets; remove unnecessary mmb_ wrappers to cl types which required users to build with WITH_OPENCL defined
10140260 implement dist irregular 1d create and compare
b5c61457 Revert "add index to layout struct and implement mmb_layout_create_dist_irregular_1d" it has many unwanted changes from mamba version used at maestro that got updated here.
cf0b324e add index to layout struct and implement mmb_layout_create_dist_irregular_1d
d7ff1d26 Fix ifdef -> if for ENABLE_DISCOVERY check on hwloc include
594f26c5 Add nvdimm / pmem support via Memkind. Improve hwloc integration, via interop/mmb_hwloc.h.
f9cbd3ec Bump version to 0.1.7
ab79a195 Fix compiler warnings re. unused variable and invalid check on provider validity
df524357 Replace STOP with ERROR STOP in fortran examples so the process exits with a non-zero code and triggers the make check utility
95b24793 Fix warning in system.c
b1763ff2 Remove erroneous notices from OpenCL config macro
6468f73d Set execution context explicitly in GPU cuda examples
cd9cc48c Restructure interoperability layer: provider interoperability files are now in ./interop;
4327201f Merge branch 'tile-create-update' into devel
df20d1a5 Merge branch 'fortran-interface' into devel
2cb03c9d Add 1d/2d/3d variants of tile create and update
f86e98dd Draft implementation of tile create/update
b2da6d1b Fix numa sizes type mismatch and remove malloc.h include from system provider (not found on OSX, malloc should be defined in stdlib.h)
31a62587 Updated Fortran interface to be consistent with devel changes
dad1f90e Fix typo in numa_utils.c and added unistd.h to memory/providers/system.c
a8ed452d Test allocation for all numa nodes available.
b663dcc4 Remove extra functions for NUMA allocations.
01a61890 Add memory access to ensure the page is allocated when checking the numa node id
9576c99c Start with invalid node_id to better test errors
580ef0f7 Check the memory amount records in space
a8a7e48b Use posix_memalign instead of memalign in provider/system.c
7d3cfd8a Merge branch 'numa-aware-alloc' into devel
c7794693 Add unit tests for NUMA aware spaces
bd31b709 Fix overflow in tests
a2a02ff2 Add numa aware allocations
b6e443c3 Deactivate unit test for GDRAM if no GPU execution context is available
59502450 Remove warning as OpenCL is now supported
869944f9 Silence warning for unused parameter when building without OpenCL
e5f68cef Add lock functions for numa sizes in spaces
79524017 Adapt provider supports function to NUMA awareness
10c5d499 Change ordering of stats attribute in space for more coherency
21855fb9 Get per numa node memory availability
2970a546 Merge branch 'devel' into numa-id
96dd1699 Remove useless mask and extra iteration
58a3834a Create function to check the coherency of a space internal state
863fa46a Use single function for layer check
151c13db Fix unit test to have it test the GDRAM
7a6fcb8e Simplify maintainability of the unit test (layer and ex. con.)
a9ef1b3b Use check_and_set_mmbexecutioncontext for boundary check.
8d9ad782 Add empty placeholder for statistics strategy building block.
fecf6ebc Use check_and_set_* functions when validating the default_ctx in memory manager
dab34cd6 Wrap into a function call the access to provider specific data size
c03f9c2d Merge branch 'tile_copy_nd' into devel
f9d5cd8d Use the system copy instead of duplicating code for SICM uselessly.
5354aca5 Add proper reporting of error on copy for opencl provider
9f61818c Fix nd_to_1d computation
1c4380c3 Explicit parameter order for dimension, index and sets
791e0108 Fix dimension traversal order
4331d329 Fix copy padding in layout
5f9ec30e Simplify setting of attributes when duplicating tile
31c82e4c Change example from a 2d to a 3D example
9a58d878 Add 3D indexing macros
dc88d685 Change tiles copy support rule
03e5738a Add example using 2D tiles
b1740039 Use copy_nd for tile movement
26b6d235 Remove internal for mmb_copy() short prototype function
8e49a5de Replace all copy functions with the generic copy_nd
be882c64 Fix 1d_array_tile example
2daeb2f1 Silence warning unused argc argv in examples
d9d5dc2d Silence warning for unused variable when compiled with -DNDEBUG
f3e08ed2 Update .gitignore for missing examples. Remove bad entries
f78f720e Retrieve numa node ids at init
95c8265a Add nodesets
REVERT: f2eb565d Merge branch 'devel': Release v0.1.6
REVERT: ccf12a85 Merge branch 'devel': Release v0.1.5
REVERT: 471b0cd4 Merge branch 'devel': Bump version to v0.1.4
REVERT: 070c11f3 Merge branch 'devel': Release 0.1.4 - Add mmb options construction and configuration API for mamba initialisation - Replace compile time logging overrides with callback supplied during initialisation - Extend unit tests
REVERT: 1beb84fb Merge branch 'devel': Release v0.1.3
REVERT: 121ca2bf Merge branch 'devel' into master
REVERT: 9a9b9886 Merge branch 'devel'.

git-subtree-dir: deps/mamba
git-subtree-split: a694f99dcf5d804f6916ed083ccfb0857b8c8853
a019dfb3
History

Mamba: Managed Abstracted Memory Array

A library-based programming model for C, C++ and Fortran based on Managed Abstract Memory Arrays, aiming to deliver simplified and efficient usage of diverse memory systems to application developers in a performance portable way. MAMBA arrays exploit a unified memory interface to abstract memory from both traditional memory devices, accelerators and storage. This library aims to achieve good performance portability with easy to use approach that requires minimal code intrusion. See docs/MambaIntroduction-vX.Y.Z.pdf for an extended introduction, with accompanying slide decks.

How to build and run

mkdir build;
cd build;
# if loop analysis module is required, run autogen.sh instead of autoreconf -i
autoreconf -i
../configure [--prefix=/path/to/install/dir]                   \
             [--enable-discovery[=yes|no|default]];            \
             [--with-fortran]                                  \
             [--with-fortran-ISO-bindings-includedir=/p/a/t/h] \
             [--enable-embedded]                               \
             [--enable-cuda[=yes|no|<arch>]]                   \
             [--enable-hip-rocm[=yes|no]]                      \
             [--enable-opencl[=yes|no]]                        \
             [--with-opencl=/path/to/opencl/install]           \
             [--enable-pmem[=yes|no]]                          \
             [--with-memkind=/path/to/libmemkind/install]      \
             [--with-numa[=/path/to/libnuma/install]]          \
             [--with-loop-analysis]                            \
             [--with-cost-model[=/path/to/costmodel/install]]  \
             [--with-sicm=/path/to/sicm/install]               \
             [--with-umpire=/path/to/umpire/install]           \
             [--with-jemalloc=/path/to/jemalloc/install]       \
             [--with-jemalloc-prefix=<prefix>]                 
make;
make check-tests;
make check-examples;
make install; (optional)

Configure

autogen.sh

Only required to use --with-loop-analysis This will get and update mamba loop analysis dependencies as submodules, and is an optional step if you have already recursively cloned the repository using git clone --recursive. In this case, you may use autoreconf -i instead.

--prefix

Set the directory prefix for make install

--enable-discovery

Enable discovery mode, where Mamba will use hwloc to analyse the memory topology and construct a set of appropriate memory spaces during initialisation. This requires hwloc>=2.0 to be installed. default behaviour is to look for a suitable version of hwloc, and enable discovery if found, otherwise disable and issue a warning message at configure time.

--with-fortran

Build the Fortran Mamba library.

--with-fortran-ISO-bindings-includedir

Specify a non-standard path to the location of ISO_Fortran_binding.h to use the C/Fortran ISO bindings (required for the fortran build)

--enable-embedded

Enable the embedded support generating the libtool convenience libraries to easily import the library and its dependencies into your own project.

--enable-cuda

Enable the CUDA support in the memory manager. The configure lists all the pkg-config module files containing the sub-string 'cuda' and test each until one provides the support requested.

--enable-hip-rocm

Enable HIP support for AMD devices (via ROCM) in the memory manager. We use hipconfig to determine appropriate CFLAGS, see common issues section for info on passing additional hipcc flags

--enable-opencl

Enable OpenCL support, currently tested on AMD and NVIDIA GPU devices and Xilinx FPGA devices

--with-opencl

Provide a non-standard path to your OpenCL installation

--enable-pmem

Enable persistent memory support, such as Intel Optane non-volatile DIMMs. Requires the memkind library.

--with-memkind

Build with libmemkind support, which allows HBM (e.g. Intel KNL MCDRAM) and persistent memory allocation (e.g. Intel Optane NV-DIMMs). Disabled by default;

--with-numa

Build with libnuma support for numa-aware memory spaces.

--with-loop-analysis

Build with loop analysis features. The loop analysis module depends on external loop analysis libraries; during autogen, the appropriate libraries will be downloaded as git submodules. A dependency on LLVM is also introduced, if you have trouble building the loopanalyzer library, refer to the build instructions in the loopanalyzer repository. If you have previously built without this option you will also need to make clean. To test the support libraries, make check will run tests for all dependencies integrated into the Mamba build system.

--with-cost-model

Build with cost model library support for automatic tile sizing features.

--with-sicm

Experimental external library support. Allows underlying memory allocation using the LANL/SICM memory manager.

--with-umpire

Experimental external library support. Allows underlying memory allocation using the LLNL/Umpire memory manager.

--with-jemalloc and --with-jemalloc-prefix

Allows underlying memory allocation using the jemalloc malloc implementation. The default prefix of the jemalloc functions namespace is je_.

Additional options

To change the compiler used, set CC=..., CXX=... and/or FTN=... during configure.

Cray (CCE)

On a Cray system, it is typical to use the compiler wrappers to manage the compilation environment correctly:

./configure CC=cc CXX=CC FTN=ftn ...

GNU

Add -std=gnu11 to get C11 std with gnu extensions, required for posix pthread lock structures.

./configure CFLAGS="-std=gnu11" ...

Configuration variables

In order to modify the default behavior in order to make it fit better your usage, additional compile-time and run-time variables can be set.

Compile-time

The following variables can be set at compile time (or during the call to configure when provided to CPPFLAGS). In order to set their value, use the format -D<name>=<value>.

  • MMB_LOG_LEVEL: Compile-time max log level cut-off, default MMB_LOG_DEBUG
  • MMB_CONFIG_PROVIDER_DEFAULT: Default memory provider to use to allocate memory when none is requested. Default: MMB_NATIVE.
  • MMB_CONFIG_STRATEGY_DEFAULT: Default memory allocation strategy to use when none is requested. Default: MMB_STRATEGY_NONE.
  • MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT: Default execution context to use when allocating and copying memory to/from GPUs. Default: MMB_GPU_CUDA.
  • MMB_CONFIG_PROVIDER_DEFAULT_ENV_NAME: Environment variable's name to look for when setting default provider. Default: MMB_CONFIG_PROVIDER_DEFAULT.
  • MMB_CONFIG_STRATEGY_DEFAULT_ENV_NAME: Environment variable's name to look for when setting default strategy. Default: MMB_CONFIG_STRATEGY_DEFAULT.
  • MMB_CONFIG_INTERFACE_NAME_DEFAULT_ENV_NAME: Environment variable's name to look for when setting default strategy. Default: MMB_CONFIG_INTERFACE_NAME_DEFAULT.
  • MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT_ENV_NAME: Environment variable's name to look for when setting default execution context for the GPU. Default: MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT.

The following variables can be set in the environment at compile time to modify compilation behaviour, use format export <name>=<value> or ./configure <name>=<value>

  • MMB_CONFIG_HIPCC_EXTRA_CPPFLAGS: Extra flags to pass to hipcc compiler during compilation of .hip files.

Run-time

The following variables can be set in the environment at run-time in order to modify some of the compile-time defined behaviors. These variable are read only once, during the library initialization.

  • MMB_CONFIG_PROVIDER_DEFAULT
  • MMB_CONFIG_STRATEGY_DEFAULT
  • MMB_CONFIG_INTERFACE_NAME_DEFAULT
  • MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT

The variables defaults to the compile-time values. The name of these of these variable can be changed at compile time by setting MMB_CONFIG_PROVIDER_DEFAULT_ENV_NAME, MMB_CONFIG_STRATEGY_DEFAULT_ENV_NAME, MMB_CONFIG_INTERFACE_NAME_DEFAULT_ENV_NAME and MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT_ENV_NAME respectively. For simplicity, MMB_CONFIG_EXECUTION_CONTEXT_GPU_DEFAULT also accepts NONE as a valid choice.

The following variable can modify the log level at run-time, up to the max compile-time cutoff, and overrides API log level setting.

  • MMB_LOG_LEVEL: Run-time log level setting, cannot override max cutoff defined at compile time.

Examples

Examples are found in mamba/build/examples/, or /path/to/install/dir/examples. Each example is shown in C and fortran, and briefly described here with instructions on use.

1d_array_copy

This shows the construction, tiled initialisation, and copy of a 1d mamba array to another 1d mamba array with matching layout and size, with full error checking.

Source file: examples/c/1d_array_copy.c | examples/fortran/1d_array_copy.f90

Usage: ./1d_array_copy | ./1d_array_copy_f

1d_array_copy_wrapped

The same as 1d_array_copy but using arrays contructed from existing user pointers.

Source file: examples/c/1d_array_copy_wrapped.c | examples/fortran/1d_array_copy_wrapped.f90

Usage: ./1d_array_copy_wrapped | ./1d_array_copy_wrapped_f

tile_duplicate

This shows construction of a 1d array, tiling, duplication and merging of tiles.

Source file: examples/c/tile_duplicate.c

Usage: ./tile_duplicate

matrix_multiply

This demonstrates a tiled matrix multiply using 3 mamba arrays constructed on top of pre-initialised (with random or identity values) matrix buffers.

Source file: examples/c/matrix_multiply.c

Usage: (all args optional): ./matrix_multiply -v (for verbose mode) -t N (for tile size NxN) -m N (for matrix size NxN) -i (use identity for matrix B)

matrix_multiply_cuda (C only)

This demonstrates a tiled matrix multiply using multiple mamba arrays constructed on top of pre-initialised (with random or identity values) matrix buffers. This example also present how to allocate and use memory on different memory devices (DRAM, GPU, HBM, ...), and how to copy from one memory tier to an other. This example shows as well how to use different strategies and/or different memory providers.

This example works the same as the matrix_multiply.c example, excepted that it requires extra steps to pass the data to the actual kernel (in addition to allocate the data on the GPU memory, the tiling information needs to be forwarded as well). The CUDA file only deals with this forwarding (the packing is done in examples/c/matrix_multiply_cuda.c). For now the tiles are not executed in parallel, but it is a work in progress.

Source files: examples/c/matrix_multiply_cuda.c, examples/c/matrix_multiply_cuda_ker.cu, examples/c/matrix_multiply_cuda.h

Usage: (all args optional): ./matrix_multiply_cuda -v (for verbose mode) -t N (for tile size NxN) -m N (for matrix size NxN) -i (use identity for matrix B)

loop description (C only)

This example demonstrates the description of a loop using the loop description, followed by PET/ISL based polyhedral analysis of the loop with dependence computation. The loop description, auxiliary analysis information and calculated loop dependencies are output to the terminal.

Source files: examples/c/loop_description.c

Usage: ./loop_description

report_mem_state (C only)

This example show the output of the function mmb_dump_memory_state that dump to the FILE * given as parameter the current state of the memory system as retained by the MAMBA Memory Manager.

Source file: examples/c/report_mem_state.c

Usage: ./report_mem_state

Common Issues

C standard

If you force standard conformance, with e.g. -std=c11, you may also need to pass something like -D_XOPEN_SOURCE=500 to get required POSIX features. Alternatively use -std=gnu11.

HIP ROCM Support

If you see the following error:

.../hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

You may not have got appropriate HIP ARCH definitions during compilation. This can, for example, occur when compiling on a login node without GPUs attached. If you have appropriate environment/module resolution for this, use that, otherwise you can forward extra cpp args to the hipcc compiler during the Mamba build via the following environment variable, which you need to export prior to configuration:

// Valid for AMD mi60, export before configure
export MMB_CONFIG_HIPCC_EXTRA_CPPFLAGS="-D__HIP_ARCH_GFX906__=1 --cuda-gpu-arch=gfx906"

To check, you can run hipcc --cxxflags and check for something like the above. Setting HIPCC_VERBOSE=7 will additionally provide verbose info from the hipcc compiler.

Furthermore, discovery of AMD GPUs via hwloc is currently not able to find the available memory size, and so memory spaces created automatically during discovery will be of unlimited size (i.e. limited by hip runtime, rather than Mamba).

CUDA

If you see the following error:

no kernel image is available for execution on the device.

You may be using the wrong CUDA architecture for the GPU device available on your node. You can change the architecture used by setting it on your configure line with ./configure --enable-cuda=<arch>. The default architecture we are using is sm_60. It this value is too high you may want to try sm_30.

OpenCL/FPGA

The buffer_copy_opencl example (run automatically during make check or make check-examples) will try to build a kernel at run-time; for most FPGA platforms OpenCL does not have access to a compiler, as such this will likely fail. In order to have this example run, you must build a bitstream for your specifc FPGA that matches the example kernel in examples/c/buffer_copy_opencl.c, and export the path to this bitstream via the environment variable MMB_CONFIG_BUFFER_COPY_OPENCL_BINARY before running the example.