Workflow applications using Maestro Core offer and demand CDOs to the
:ref:`Maestro Pool<cdo_management>`, a conceptual entity that represents the
set of resources contributed to Maestro, in particular the set of offered CDOs.
One typical use of Maestro Core is bypassing persistent storage for application
coupling, by way of the Maestro Pool.
:ref:`CDOs<cdo>` are basically Maestro Core currency, they contain data and metadata,
including user-defined metadata plus data semantics such as data layout
information and other information on the data usage the transfer scheduler may
take advantage of.
A minimal Maestro-enabled multi-app setup would consist in one producer
application, one consumer application and one Pool Manager application, the
latter being shipped with Maestro Core and alternatively invokable via the
Maestro Core API. The :ref:`Pool Manager<pm>` is a Maestro Core-produced
application that handles networking, transport scheduling and propagates
pool events for inspection.
.. image:: img/maestro_core_components.png
:alt: Maestro core components overview
:ref:`Events<events>` allow for higher-level control of the Maestro-enabled
workflow, and interfacing with many useful components. In particular, they
allow the implementation of data-driven Workflow Managers or more generally
Execution Frameworks, meaning they can schedule jobs based on CDO availability
and location which is indeed notified by events. :ref:`Data
management<cdo_management>` (and :ref:`memory management<mamba>` as well) is
then effectively delegated by Execution Frameworks and applications to Maestro
Core.
As an example of useful workflow components, staging and saving CDOs of choice
may be implemented by users via Librarian and Archiver components, the latter
relying especially on events to be notified of the availability of relevant
CDOs for archiving purposes.
A keep-alive proxy may allow to retain CDOs from being withdrawn from the pool,
which would be before they have been properly handled by all the components
that might have needed them.
Maestro-enabled components natively produce logs as well as a telemetry report
at finalize time, nonetheless Events allow to implement more specific
telemetry/logger components, with :ref:`selectors<events>` allowing
:ref:`cherry-picking<story_cherry>` of events and CDOs.
CDOs convey a great deal of data and memory semantics.
Data usage information such as CDO dependences or an eager transfer policy for
instance *(both yet to be implemented)* helps the scheduler and thus general
performance.
Data layout information lets Maestro make the transformations between a given
offered CDO in a given data layout L1 (say row-major) to the demanded CDO on
the consumer side that may have a layout L2 (say column-major). This allows
user to abstract away data layout, especially if tools / source-to-source
edition fills the data layout attribute (semi) automatically. Data layouts may
be distributed *(WIP)* and Maestro Core *will soon* handle the redistribution
complying with producer and consumer layout and dsitribution scheme
requirements.
.. image:: img/core_data_object.png
:alt: Anatomy of the Core Data Object
CDO location is expressed as a so-called Memory Object, which of course can mean a certain number of layers (DRAM, GDRAM, Parallel Filesystem, Object Store, etc) and associated access methods (pointer, path, object ID, etc.), which is then used by Maestro Core to handle the transport between layers and nodes.
Collections
- Group
- Selector (get/set, select, subscribe)
user metadata
Demos
-----
A few demonstrators are shipped with Maestro Core, which are compiled and run
through
.. code-block:: shell
make check
- Local multi-threaded setup ``demo_mvp_d3_2``: with a few parameters such as number of producer, consumer and archiver threads.
- Multi-application setup ``check_pm_interlock``: The interlock demo is a minimal workflow with Pool Manager and two applications, exchanging a CDO under the Pool Manager supervision, using filesystem and object store transport, RDMA is used if any transport unavailabilities.