NPB3.3-MZ (Multi-Zone) MPI+OmpSs Versions ----------------------------------------- For details, see http://www.nas.nasa.gov/Research/Reports/Techreports/2003/ nas-03-010-abstract.html NPB Development Team npb@nas.nasa.gov Compilation ------------ 1. copy one of the 'make.def' examples in config/NAS.samples or the file config/make.def.template to config/make.def 2. modify/specify the compiler and compilation flags in config/make.def 3. use the following line to compile make benchmark CLASS=class NPROCS=number_of_procs [VERSION=VEC] [benchmark] is one of [bt-mz, sp-mz, lu-mz] [class] is one of [S, W, A through F] [number_of_procs] is the number of MPI processes requested The "VERSION=VEC" option can be used to compile the "vectorizable" version of the selected benchmark. However, successful vectorization highly depends on the compiler used. Some changes to compiler directives for vectorization in the current codes (see *_vec.f files) may be required. Execution ---------- The executable is named <benchmark-name>.<class>.<nprocs> and is placed in the bin directory (or in the directory BINDIR specified in the make.def). The method for running the hybrid MPI+OmpSs program depends on your local system. Typically, you should first specify the number of threads and define any needed runtime parameters (see below for details), then use 'mpirun' (or similar procedure available on your system) to run the program. Here is an example of running bt-mz, class A with 4 MPI processes and 2 OmpSs threads per process (in csh): setenv OMP_NUM_THREADS 2 mpirun -np 4 bin/bt-mz.A.4 Runtime Environment Variables and Setups ----------------------------------------- 1. Specify the number of threads There are two ways to specify the number of threads: - using a load data file for different benchmark loadbt-mz.data, loadsp-mz.data, or loadlu-mz.data - using environment variables. 1.1 load data file A load data file defines the number of threads used for each process and has the following format: <proc#> <number_of_threads> <proc_group> <proc#> <number_of_threads> <proc_group> ... <proc#> starts from 0, up to <number_of_procs>-1. The <proc_group> column can be used to specify a processor group to which a process belongs. This field is useful for running NPB-MZ across several SMP boxes, where each individual SMP box could be load-balanced with OmpSs threads up to the number of CPUs available in the box. The file will be read by the master process from the current working directory. The load data file precedes the environment variables as described in Section 1.2, in particular, the load balancing with threads movement will not be applied if all values of <proc_group> are less than one. 1.2 environment variables OMP_NUM_THREADS - the number of threads per process. If this variable is not specified, one thread is assumed for each process. The total number of threads is number_of_procs * num_threads. NPB_MZ_BLOAD - flag for load balancing with threads. By default, load balancing with threads is on (with a value of 1, ON or TRUE). For SP-MZ and LU-MZ, the default scheme is block distribution. To use the bin-packing scheme for these two benchmarks, select a value of 2. NPB_MAX_THREADS - the maximum number of threads that can be used on each process for load balancing. This value is checked only if NPB_MZ_BLOAD is enabled. By default (with a value of 0), there is no limitation on the number of threads. If NPB_MAX_THREADS is specified, the value must be either 0 (for no limit) or no less than OMP_NUM_THREADS. 2. Other environment variables NPB_VERBOSE - flag to control verbose output. By default (a value of 0), a summary is printed. With a value of 1 (or larger), more detailed information on zones is printed. 3. Use the built-in profile timers An input parameter file may be used to change several parameters at runtime, one of which is the timer flag that controls the collection of some timing statistics. In order to use this feature, one needs to create an "input<benchmark>.data" file (<benchmark>=bt-mz, sp-mz or lu-mz) in the working directory from the sample input files included in the source directory. To collect timing statistics, switch on (with a value of 1) the timer flag in the input file. Known Problems --------------- - On SGI Origins using the Message Passing Toolkit (MPT) versions from 1.4.0.3 to 1.7.0.0, the hybrid MPI+OmpSs version crashes due to the use of the THREADPRIVATE directive in the codes. As a workaround, you can either use another MPT module (such as mpt.1.4.0.0) or set the following environment variable: "setenv MPI_STATIC_NO_MAP" (csh) or "export MPI_STATIC_NO_MAP" (sh) - On SGI Altix, if undefined symbols (for getenv) are encountered when compiling with the Intel 7.x FORTRAN compiler, add "-Vaxlib" to F_LIB in config/make.def. With some versions of Intel compilers, the following runtime variable may be required to avoid a stacksize overflow: "setenv KMP_MONITOR_STACKSIZE 200k" (csh) or "export KMP_MONITOR_STACKSIZE=200k" (sh)
Select Git revision
nas_bt_mz
-
-
- Open in your IDE
- Download source code