Skip to content
Snippets Groups Projects
Select Git revision
  • master
1 result

nas_bt_mz

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    Marti Torrents authored
    0e099345
    History
    Name Last commit Last update
    src
    README
    NPB3.3-MZ (Multi-Zone) MPI+OmpSs Versions
    -----------------------------------------
    
    For details, see
       http://www.nas.nasa.gov/Research/Reports/Techreports/2003/
       nas-03-010-abstract.html
    
    NPB Development Team
       npb@nas.nasa.gov
    
    
    Compilation
    ------------
    
    1. copy one of the 'make.def' examples in config/NAS.samples or the
       file config/make.def.template to config/make.def
    
    2. modify/specify the compiler and compilation flags in config/make.def
    
    3. use the following line to compile
       make benchmark CLASS=class NPROCS=number_of_procs [VERSION=VEC]
    
       [benchmark] is one of [bt-mz, sp-mz, lu-mz]
       [class] is one of [S, W, A through F]
       [number_of_procs] is the number of MPI processes requested
       The "VERSION=VEC" option can be used to compile the "vectorizable"
       version of the selected benchmark.
    
       However, successful vectorization highly depends on the compiler 
       used.  Some changes to compiler directives for vectorization in 
       the current codes (see *_vec.f files) may be required.
    
    
    Execution
    ----------
    
    The executable is named <benchmark-name>.<class>.<nprocs> and is 
    placed in the bin directory (or in the directory BINDIR specified 
    in the make.def).  The method for running the hybrid MPI+OmpSs
    program depends on your local system.  Typically, you should first 
    specify the number of threads and define any needed runtime 
    parameters (see below for details), then use 'mpirun' (or similar 
    procedure available on your system) to run the program.
    
    Here is an example of running bt-mz, class A with 4 MPI processes
    and 2 OmpSs threads per process (in csh):
    
       setenv OMP_NUM_THREADS 2
       mpirun -np 4 bin/bt-mz.A.4
    
    
    Runtime Environment Variables and Setups
    -----------------------------------------
    
    1. Specify the number of threads
    
    There are two ways to specify the number of threads:
       - using a load data file for different benchmark
          loadbt-mz.data, loadsp-mz.data, or loadlu-mz.data
       - using environment variables.
    
    1.1 load data file
    
    A load data file defines the number of threads used for each process
    and has the following format:
       <proc#>  <number_of_threads>  <proc_group>
       <proc#>  <number_of_threads>  <proc_group>
       ...
    
    <proc#> starts from 0, up to <number_of_procs>-1.  The <proc_group>
    column can be used to specify a processor group to which a process
    belongs.  This field is useful for running NPB-MZ across several
    SMP boxes, where each individual SMP box could be load-balanced
    with OmpSs threads up to the number of CPUs available in the box.
    The file will be read by the master process from the current working 
    directory.
    
    The load data file precedes the environment variables as described in
    Section 1.2, in particular, the load balancing with threads movement
    will not be applied if all values of <proc_group> are less than one.
    
    1.2 environment variables
    
    OMP_NUM_THREADS
      - the number of threads per process.  If this variable is not specified,
        one thread is assumed for each process.
    
        The total number of threads is number_of_procs * num_threads.
    
    NPB_MZ_BLOAD
      - flag for load balancing with threads.  By default, load balancing
        with threads is on (with a value of 1, ON or TRUE).
        For SP-MZ and LU-MZ, the default scheme is block distribution.
        To use the bin-packing scheme for these two benchmarks, select
        a value of 2.
    
    NPB_MAX_THREADS
      - the maximum number of threads that can be used on each process for 
        load balancing.  This value is checked only if NPB_MZ_BLOAD is enabled.
        By default (with a value of 0), there is no limitation on the number
        of threads.  If NPB_MAX_THREADS is specified, the value must be either
        0 (for no limit) or no less than OMP_NUM_THREADS.
    
    2. Other environment variables
    
    NPB_VERBOSE
      - flag to control verbose output.  By default (a value of 0), a summary
        is printed.  With a value of 1 (or larger), more detailed information 
        on zones is printed.
    
    3. Use the built-in profile timers
    
    An input parameter file may be used to change several parameters at 
    runtime, one of which is the timer flag that controls the collection of 
    some timing statistics.  In order to use this feature, one needs to create 
    an "input<benchmark>.data" file (<benchmark>=bt-mz, sp-mz or lu-mz) in 
    the working directory from the sample input files included in the source 
    directory.  To collect timing statistics, switch on (with a value of 1)
    the timer flag in the input file.
    
    
    Known Problems
    ---------------
      - On SGI Origins using the Message Passing Toolkit (MPT) versions from 
        1.4.0.3 to 1.7.0.0, the hybrid MPI+OmpSs version crashes due to the
        use of the THREADPRIVATE directive in the codes.  As a workaround,
        you can either use another MPT module (such as mpt.1.4.0.0) or set
        the following environment variable:
    
        "setenv MPI_STATIC_NO_MAP" (csh) or 
        "export MPI_STATIC_NO_MAP" (sh)
    
      - On SGI Altix, if undefined symbols (for getenv) are encountered when
        compiling with the Intel 7.x FORTRAN compiler, add "-Vaxlib" to
        F_LIB in config/make.def.
    
        With some versions of Intel compilers, the following runtime variable
        may be required to avoid a stacksize overflow:
    
        "setenv KMP_MONITOR_STACKSIZE 200k" (csh) or
        "export KMP_MONITOR_STACKSIZE=200k" (sh)