Hands-On: Performance Counters

This Notebook is part of the exercises for the SC19 Tutorial »Application Porting and Optimization on GPU-accelerated POWER Architectures«. It is to be run on a POWER9 machine; in the tutorial: on Ascent, the POWER9 training cluster of Oak Ridge National Lab.

This Notebook can be run interactively on Ascent. If this capability is unavailable to you, use it as a description for executing the tasks on Ascent via a shell access. During data evaluation, the Notebook mentions the corresponding commands to execute in case you are not able to run the Notebook interactively directly on Ascent.

Table of Contents

Task 1: Measuring Cycles and Instructions

Throughout this exercise, the core loop of the Jacobi algorithm is instrumented and analyzed. The part in question is

for (int iy = iy_start; iy < iy_end; iy++)
{
    for( int ix = ix_start; ix < ix_end; ix++ )
    {
        Anew[iy*nx+ix] = -0.25 * (rhs[iy*nx+ix] - (A[ iy   *nx+ix+1] + A[ iy   *nx+ix-1]
                                                +  A[(iy-1)*nx+ix  ] + A[(iy+1)*nx+ix  ]));
        error = fmaxr( error, fabsr(Anew[iy*nx+ix]-A[iy*nx+ix]));
    }
}

The code is instrumented using PAPI. The API routine PAPI_add_named_event() is used to add named PMU events outside of the relaxation iteration. After that, calls to PAPI_start() and PAPI_stop() can be used to count how often a PMU event is incremented.

For the first task, we will measure quantities often used to characterize an application: cycles and instructions.

TASK: Please measure counters for completed instructions and run cycles. See the TODOs in file poisson2d.ins_cyc.c. You can either edit the files with Jupyter capabilities by clicking on the link of the file or selecting it in the file drawer on the left; or use a dedicated editor on the system(vim is available). The names of the counters to be implemented are PM_INST_CMPL and PM_RUN_CYC.

After changing the source code, compile it with make task1 or by executing the following cell (we need to change directories first, though).
(Using the Makefile we have hidden quite a few intricacies from you in order to focus on the relevant content at hand. Don't worry too much about it right now – we'll un-hide it gradually during the course of the tutorial.)

Back to top

In [1]:
!pwd
/autofs/nccsopen-svm1_home/aherten/OpenPOWER-SC19/Prototyping/2-Performance_Counters/Handson/Solutions
In [1]:
%cd Tasks/
# Use `%cd Solutions` to look at the solutions for each task
/autofs/nccsopen-svm1_home/aherten/OpenPOWER-SC19/2-PAPI/Compiling/Solutions
In [2]:
!make task1
gcc -DUSE_DOUBLE -Ofast -std=c99 -lm -lpapi  poisson2d.ins_cyc.c -o poisson2d.ins_cyc.bin

Before we launch our measurement campaign we should make sure that the program is measuring correctly. Let's invoking it, for instance, with these arguments: ./poisson2d.ins_cyc.bin 100 64 32 – see the next cell. The 100 specifies the number of iterations to perform, 64 and 32 are the size of the grid in y and x direction, respectively.

In [1]:
!./poisson2d.ins_cyc.bin 100 64 32
# alternatively call !make run_task1
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
100,64,32,0.0011,3324225,33235,33960,1859440,18357,25033

Alright! That should return a comma-seperated list of measurements.

For the following runs, we are going to use Ascent's compute backend nodes which are not shared amongst users and also have six GPUs available (each!). We use the available batch scheduler IBM Spectrum LSF for this. For convenience, a call to the batch submission system is stored in the environment variable $SC19_SUBMIT_CMD. You are welcome to adapt it once you get more familiar with the system.

For now, we want to run our first benchmarking run and measure cycles and instructions for different data sizes, as a function of nx. The Makefile holds a target for this, call it with make bench_task1:

In [2]:
!make bench_task1
bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.ins_cyc.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.ins_cyc.bin.csv
Job <24059> is submitted to default queue <batch>.
<<Waiting for dispatch ...>>
<<Starting on login1>>
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,4,0.0012,572978,2861,3639,261330,1235,4684
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,8,0.0014,1082978,5411,6189,601962,2914,5099
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,12,0.0014,1442978,7211,7989,811603,3992,5761
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,16,0.0014,1802978,9011,9789,1017305,4988,7017
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,20,0.0015,2162978,10811,11589,1221559,6002,7999
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,24,0.0016,2522978,12611,13389,1435167,7037,9259
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,28,0.0016,2882978,14411,15189,1633061,8054,9789
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,32,0.0017,3242978,16211,16989,1842895,9092,10889
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,36,0.0018,3602978,18011,18789,2042894,10108,12457
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,40,0.0019,3962978,19811,20589,2261332,11191,14233
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,44,0.0020,4322978,21611,22389,2458267,12112,14375
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,48,0.0020,4682978,23411,24189,2658621,13164,15613
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,52,0.0020,5042978,25211,25989,2866175,14190,16864
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,56,0.0021,5402978,27011,27789,3080357,15237,21565
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,60,0.0022,5762978,28811,29589,3283103,16278,18799
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,64,0.0022,6122978,30611,31389,3587582,17820,19681
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,68,0.0025,6482978,32411,33189,3893368,19284,20847
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,72,0.0025,6842978,34211,34989,4289441,21278,22715
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,76,0.0024,7202978,36011,36789,4208700,20936,22677
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,80,0.0025,7562978,37811,38589,4409613,21897,23855
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,84,0.0026,7922978,39611,40389,4611755,22921,24910
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,88,0.0026,8282978,41411,42189,4821904,23974,26087
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,92,0.0028,8642978,43211,43989,5104722,25036,38488
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,96,0.0028,9002978,45011,45789,5238952,26060,27927
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,100,0.0028,9362978,46811,47589,5441545,27049,29275
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,104,0.0030,9722978,48611,49389,5920763,28136,72679
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,108,0.0030,10082978,50411,51189,5853554,29106,31403
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,112,0.0030,10442978,52211,52989,6053498,30123,32279
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,116,0.0031,10802978,54011,54789,6296056,31338,33377
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,120,0.0033,11162978,55811,56589,6468115,32146,33869
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,124,0.0032,11522978,57611,58389,6675248,33233,35075
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,128,0.0033,11882978,59411,60189,6894325,34338,36207
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,132,0.0034,12242978,61211,61989,7093543,35299,37463
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,136,0.0034,12602978,63011,63789,7312105,36353,48105
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,140,0.0035,12962978,64811,65589,7503757,37375,39247
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,144,0.0036,13322978,66611,67389,7692611,38277,40419
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,148,0.0037,13682978,68411,69189,7968094,39656,42113
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,152,0.0037,14042978,70211,70989,8122466,40468,42706
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,156,0.0038,14402978,72011,72789,8328043,41484,45104
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,160,0.0040,14762978,73811,74589,8547674,42493,54216
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,164,0.0039,15122978,75611,76389,8738805,43542,45427
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,168,0.0040,15482978,77411,78189,8948025,44560,46819
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,172,0.0040,15842978,79211,79989,9186567,45735,47659
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,176,0.0041,16202978,81011,81789,9391949,46573,70131
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,180,0.0042,16562978,82811,83589,9549568,47559,54271
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,184,0.0042,16922978,84611,85389,9766306,48609,58645
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,188,0.0043,17282978,86411,87189,9974165,49613,56721
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,192,0.0044,17642978,88211,88989,10187263,50734,52953
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,196,0.0044,18002978,90011,90789,10386920,51763,53773
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,200,0.0045,18362978,91811,92589,10593326,52744,54962
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,204,0.0045,18722978,93611,94389,10791966,53796,55775
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,208,0.0046,19082978,95411,96189,10993938,54691,56692
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,212,0.0047,19442978,97211,97989,11183564,55716,57663
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,216,0.0047,19802978,99011,99789,11413409,56842,65317
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,220,0.0049,20162978,100811,101589,11747337,57952,85917
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,224,0.0049,20522978,102611,103389,11967444,58993,147575
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,228,0.0050,20882978,104411,105189,12176974,59986,107137
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,232,0.0051,21242978,106211,106989,12243039,61011,62843
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,236,0.0051,21602978,108011,108789,12454738,61985,74677
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,240,0.0051,21962978,109811,110589,12632612,62912,64911
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,244,0.0052,22322978,111611,112389,12844679,63954,74316
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,248,0.0053,22682978,113411,114189,13049050,65048,67067
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,252,0.0054,23042978,115211,115989,13274577,66113,68093
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,256,0.0054,23402978,117011,117789,13479975,67191,69232
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,260,0.0055,23762978,118811,119589,13702476,68321,70257
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,264,0.0055,24122978,120611,121389,13885554,69178,71473
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,268,0.0056,24482978,122411,123189,14091173,70236,72538
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,272,0.0057,24842978,124211,124989,14277355,71142,73153
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,276,0.0057,25202978,126011,126789,14477479,72149,74585
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,280,0.0058,25562978,127811,128589,14807542,73365,106386
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,284,0.0059,25922978,129611,130389,14919273,74349,83988
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,288,0.0060,26282978,131411,132189,15262342,75369,108903
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,292,0.0061,26642978,133211,133989,15457489,76550,112579
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,296,0.0061,27002978,135011,135789,15587890,77470,113796
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,300,0.0063,27362978,136811,137589,15736737,78474,80976
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,304,0.0062,27722978,138611,139389,15931699,79424,85309
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,308,0.0064,28082978,140411,141189,16127895,80426,82181
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,312,0.0063,28442978,142211,142989,16353667,81487,91316
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,316,0.0064,28802978,144011,144789,16544730,82526,84583
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,320,0.0064,29162978,145811,146589,16778054,83692,85621
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,324,0.0065,29522978,147611,148389,16975790,84670,86933
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,328,0.0066,29882978,149411,150189,17193806,85651,95908
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,332,0.0067,30242978,151211,151989,17391042,86658,92746
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,336,0.0067,30602978,153011,153789,17579650,87566,101073
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,340,0.0068,30962978,154811,155589,17823659,88601,131503
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,344,0.0069,31322978,156611,157389,18045749,89720,131352
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,348,0.0069,31682978,158411,159189,18233228,90790,129666
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,352,0.0070,32042978,160211,160989,18429938,91908,93827
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,356,0.0071,32402978,162011,162789,18723870,92891,169000
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,360,0.0071,32762978,163811,164589,18839189,93872,104313
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,364,0.0072,33122978,165611,166389,19052230,94828,108456
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,368,0.0072,33482978,167411,168189,19224348,95828,106832
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,372,0.0073,33842978,169211,169989,19409746,96825,98825
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,376,0.0074,34202978,171011,171789,19635914,97934,100015
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,380,0.0075,34562978,172811,173589,19901265,99194,108856
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,384,0.0075,34922978,174611,175389,20087150,100132,113306
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,388,0.0076,35282978,176411,177189,20289560,101187,111225
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,392,0.0076,35642978,178211,178989,20478069,102158,104431
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,396,0.0077,36002978,180011,180789,20703541,103136,118462
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,400,0.0078,36362978,181811,182589,20889687,104097,116051
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,404,0.0078,36722978,183611,184389,21103371,105019,150497
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,408,0.0079,37082978,185411,186189,21343392,106235,146574
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,412,0.0080,37442978,187211,187989,21499750,107213,116228
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,416,0.0081,37802978,189011,189789,21769516,108354,153304
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,420,0.0082,38162978,190811,191589,22016040,109333,166344
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,424,0.0082,38522978,192611,193389,22124948,110298,112586
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,428,0.0083,38882978,194411,195189,22375892,111391,164691
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,432,0.0083,39242978,196211,196989,22605417,112244,161120
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,436,0.0084,39602978,198011,198789,22698406,113231,115888
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,440,0.0084,39962978,199811,200589,22946025,114347,124840
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,444,0.0085,40322978,201611,202389,23138571,115404,122324
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,448,0.0086,40682978,203411,204189,23382319,116666,118990
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,452,0.0086,41042978,205211,205989,23582320,117634,123005
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,456,0.0087,41402978,207011,207789,23777586,118606,121054
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,460,0.0088,41762978,208811,209589,24021078,119638,157473
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,464,0.0089,42122978,210611,211389,24177273,120536,137152
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,468,0.0089,42482978,212411,213189,24354431,121510,124378
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,472,0.0090,42842978,214211,214989,24680874,122798,163001
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,476,0.0092,43202978,216011,216789,24806941,123695,126112
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,480,0.0091,43562978,217811,218589,25036974,124855,131240
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,484,0.0092,43922978,219611,220389,25277560,125834,159926
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,488,0.0093,44282978,221411,222189,25492002,126931,169890
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,492,0.0094,44642978,223211,223989,25799993,127811,292316
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,496,0.0094,45002978,225011,225789,25879076,128748,186367
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,500,0.0094,45362978,226811,227589,26021482,129705,143377
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,504,0.0095,45722978,228611,229389,26309697,130875,185497
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,508,0.0096,46082978,230411,231189,26445482,131853,134810
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,512,0.0097,46442978,232211,232989,26722882,133313,135480
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,516,0.0097,46802978,234011,234789,26902984,134116,143429
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,520,0.0098,47162978,235811,236589,27143327,135173,182663
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,524,0.0101,47522978,237611,238389,27899728,139067,143412
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,528,0.0099,47882978,239411,240189,27539695,137281,153792
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,532,0.0100,48242978,241211,241989,27665652,137957,156345
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,536,0.0102,48602978,243011,243789,27888664,139123,142069
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,540,0.0102,48962978,244811,245589,28116288,140162,167093
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,544,0.0102,49322978,246611,247389,28395864,141365,191687
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,548,0.0105,49682978,248411,249189,28539300,142352,144923
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,552,0.0104,50042978,250211,250989,28772000,143499,153080
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,556,0.0104,50402978,252011,252789,28943938,144344,160802
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,560,0.0105,50762978,253811,254589,29192011,145318,205574
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,564,0.0106,51122978,255611,256389,29371768,146296,173660
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,568,0.0107,51482978,257411,258189,29607085,147402,185216
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,572,0.0109,51842978,259211,259989,29760468,148529,150992
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,576,0.0108,52202978,261011,261789,30001693,149671,152448
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,580,0.0109,52562978,262811,263589,30194219,150474,161954
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,584,0.0110,52922978,264611,265389,30465237,151575,196784
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,588,0.0112,53282978,266411,267189,30866027,152658,345805
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,592,0.0112,53642978,268211,268989,30806266,153631,162459
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,596,0.0112,54002978,270011,270789,31013348,154624,161083
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,600,0.0113,54362978,271811,272589,31227644,155782,158034
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,604,0.0115,54722978,273611,274389,31534633,156837,219588
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,608,0.0114,55082978,275411,276189,31675474,157869,168332
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,612,0.0115,55442978,277211,277989,31953436,158989,218652
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,616,0.0116,55802978,279011,279789,32108644,160138,180416
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,620,0.0116,56162978,280811,281589,32277424,160849,182393
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,624,0.0118,56522978,282611,283389,32423394,161797,164245
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,628,0.0117,56882978,284411,285189,32609412,162678,167394
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,632,0.0118,57242978,286211,286989,32869379,163975,168634
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,636,0.0119,57602978,288011,288789,33151217,165037,223167
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,640,0.0119,57962978,289811,290589,33341299,166215,181218
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,644,0.0121,58322978,291611,292389,33649260,167751,199967
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,648,0.0121,58682978,293411,294189,33719599,168221,178799
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,652,0.0122,59042978,295211,295989,34067206,169536,235514
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,656,0.0122,59402978,297011,297789,34164102,170144,235618
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,660,0.0123,59762978,298811,299589,34456636,171594,235316
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,664,0.0124,60122978,300611,301389,34541178,172177,211827
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,668,0.0124,60482978,302411,303189,34905159,173832,222673
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,672,0.0126,60842978,304211,304989,34988298,174422,188003
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,676,0.0126,61202978,306011,306789,35263092,175911,185984
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,680,0.0127,61562978,307811,308589,35503073,176323,305860
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,684,0.0128,61922978,309611,310389,35672483,178036,180851
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,688,0.0128,62282978,311411,312189,35790039,178289,217803
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,692,0.0128,62642978,313211,313989,36045752,179866,188983
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,696,0.0130,63002978,315011,315789,36175144,180438,195986
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,700,0.0131,63362978,316811,317589,36529049,182248,184897
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,704,0.0130,63722978,318611,319389,36611747,182765,185703
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,708,0.0130,64082978,320411,321189,36811496,183626,191140
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,712,0.0131,64442978,322211,322989,37060383,184588,255521
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,716,0.0132,64802978,324011,324789,37267356,185684,240236
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,720,0.0132,65162978,325811,326589,37393434,186562,204926
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,724,0.0133,65522978,327611,328389,37611724,187635,203956
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,728,0.0135,65882978,329411,330189,37844476,188685,217329
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,732,0.0136,66242978,331211,331989,38097715,189879,238003
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,736,0.0136,66602978,333011,333789,38249665,190960,193797
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,740,0.0137,66962978,334811,335589,38496135,191882,202980
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,744,0.0136,67322978,336611,337389,38643004,192776,211409
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,748,0.0138,67682978,338411,339189,38834497,193752,204307
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,752,0.0139,68042978,340211,340989,39026422,194674,207102
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,756,0.0139,68402978,342011,342789,39292510,195755,242534
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,760,0.0140,68762978,343811,344589,39445808,196904,199749
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,764,0.0140,69122978,345611,346389,39707448,198140,208159
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,768,0.0141,69482978,347411,348189,39961335,199314,213386
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,772,0.0142,69842978,349211,349989,40195551,200268,262442
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,776,0.0143,70202978,351011,351789,40369481,201262,243178
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,780,0.0143,70562978,352811,353589,40454251,201889,204769
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,784,0.0143,70922978,354611,355389,40804167,203132,292206
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,788,0.0144,71282978,356411,357189,40880258,203888,220805
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,792,0.0145,71642978,358211,358989,41141375,205195,222680
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,796,0.0145,72002978,360011,360789,41346667,205890,276619
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,800,0.0146,72362978,361811,362589,41586665,207290,248916
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,804,0.0147,72722978,363611,364389,41696398,208106,211465
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,808,0.0148,73082978,365411,366189,41978951,209272,255137
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,812,0.0148,73442978,367211,367989,42187366,209918,283393
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,816,0.0149,73802978,369011,369789,42482639,211214,322437
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,820,0.0149,74162978,370811,371589,42512865,212010,227823
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,824,0.0151,74522978,372611,373389,42861251,213412,278868
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,828,0.0151,74882978,374411,375189,42979335,214191,262439
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,832,0.0152,75242978,376211,376989,43402619,215543,296991
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,836,0.0152,75602978,378011,378789,43382253,216450,232179
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,840,0.0154,75962978,379811,380589,43665001,217538,261020
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,844,0.0154,76322978,381611,382389,43762162,218196,232967
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,848,0.0156,76682978,383411,384189,44077885,219619,233562
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,852,0.0155,77042978,385211,385989,44269902,220266,357562
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,856,0.0156,77402978,387011,387789,44458368,221658,275183
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,860,0.0156,77762978,388811,389589,44599845,222530,244104
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,864,0.0158,78122978,390611,391389,44856987,223898,229495
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,868,0.0157,78482978,392411,393189,45070339,224667,268426
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,872,0.0158,78842978,394211,394989,45243346,225686,238504
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,876,0.0160,79202978,396011,396789,45425044,226467,285843
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,880,0.0160,79562978,397811,398589,45637897,227585,255503
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,884,0.0163,79922978,399611,400389,45922301,228540,294854
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,888,0.0161,80282978,401411,402189,46210377,229936,317062
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,892,0.0161,80642978,403211,403989,46224897,230736,244030
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,896,0.0163,81002978,405011,405789,46706945,232252,393574
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,900,0.0163,81362978,406811,407589,46846573,233803,243774
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,904,0.0165,81722978,408611,409389,47211102,235424,247115
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,908,0.0165,82082978,410411,411189,47420647,236067,308146
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,912,0.0167,82442978,412211,412989,47664515,237299,252663
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,916,0.0166,82802978,414011,414789,47825500,238210,307878
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,920,0.0168,83162978,415811,416589,48024315,239591,249230
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,924,0.0168,83522978,417611,418389,48204506,240348,286103
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,928,0.0168,83882978,419411,420189,48474452,241766,272232
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,932,0.0169,84242978,421211,421989,48643328,242408,310910
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,936,0.0170,84602978,423011,423789,49041567,243670,350571
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,940,0.0171,84962978,424811,425589,49009612,244295,313509
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,944,0.0171,85322978,426611,427389,49257311,245620,259650
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,948,0.0172,85682978,428411,429189,49415667,246533,254714
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,952,0.0172,86042978,430211,430989,49711139,247671,319628
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,956,0.0174,86402978,432011,432789,49856592,248552,271876
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,960,0.0174,86762978,433811,434589,50136102,249978,265617
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,964,0.0176,87122978,435611,436389,50925446,253713,295499
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,968,0.0178,87482978,437411,438189,51035835,253858,318894
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,972,0.0177,87842978,439211,439989,51188317,255334,306288
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,976,0.0178,88202978,441011,441789,51436023,256205,289239
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,980,0.0179,88562978,442811,443589,51703656,257814,300077
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,984,0.0179,88922978,444611,445389,51801305,257947,349721
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,988,0.0181,89282978,446411,447189,52056854,259676,262216
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,992,0.0182,89642978,448211,448989,52237864,260535,269494
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,996,0.0183,90002978,450011,450789,52526126,262024,274178
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,1000,0.0182,90362978,451811,452589,52578843,262284,265526
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,1004,0.0183,90722978,453611,454389,52896370,263840,273834
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,1008,0.0183,91082978,455411,456189,53074476,264385,308471
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,1012,0.0184,91442978,457211,457989,53382079,266422,284446
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,1016,0.0186,91802978,459011,459789,53434221,266486,275700
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,1020,0.0186,92162978,460811,461589,53712164,268036,277528
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max)
200,32,1024,0.0187,92522978,462611,463389,53754294,268076,276795
mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.ins_cyc.bin.csv .

Once the run is completed, let's study the data!

This can be done best in the interactive version of the Jupyter Notebook. In case this version of the description is unavailable to you, call the Makefile target make graph_task1 (either with X forwarding, or download the resulting PDF).

In [1]:
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import common
%matplotlib inline
sns.set()
plt.rcParams['figure.figsize'] = [14, 6]

Execute the following cell if you want to switch to color-blind-safer colors

In [ ]:
sns.set_palette("colorblind")
In [2]:
plt.rcParams['figure.figsize'] = [14, 6]
df = pd.read_csv("poisson2d.ins_cyc.bin.csv", skiprows=range(2, 50000, 2))  # Read in the CSV file from the bench run; parse with Pandas
df["Grid Points"] = df["nx"] * df["ny"]  # Add a new column of the number of grid points (the product of nx and ny)
df.head()  # Display the head of the Pandas dataframe
Out[2]:
iter ny nx Runtime PM_INST_CMPL (total) PM_INST_CMPL (min) PM_INST_CMPL (max) PM_RUN_CYC (total) PM_RUN_CYC (min) PM_RUN_CYC (max) Grid Points
0 200 32 4 0.0012 572978 2861 3639 261330 1235 4684 128
1 200 32 8 0.0014 1082978 5411 6189 601962 2914 5099 256
2 200 32 12 0.0014 1442978 7211 7989 811603 3992 5761 384
3 200 32 16 0.0014 1802978 9011 9789 1017305 4988 7017 512
4 200 32 20 0.0015 2162978 10811 11589 1221559 6002 7999 640

Let's have a look at the counters we've just measured and see how they scaling with increasing number of grid points.

In the following, we are always using the minimal value of the counter (indicated by »(min)«) as this should give us an estimate of the best achievable result of the architecture.

In [3]:
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
df.set_index("Grid Points")["PM_RUN_CYC (min)"].plot(ax=ax1, legend=True);
df.set_index("Grid Points")["PM_INST_CMPL (min)"].plot(ax=ax2, legend=True);

Although some slight variations can be seen for run cycles for many grid points, the correlation looks quite linear (as one would naively expect). Let's test that by fitting a linear function!

The details of the fitting have been extracted into dedicated function, print_and_return_fit(), of the common.py helper file. If you're interested, go have a look at it.

In [4]:
def linear_function(x, a, b):
    return a*x+b
In [25]:
fit_parameters, fit_covariance = common.print_and_return_fit(
    ["PM_RUN_CYC (min)", "PM_INST_CMPL (min)"], 
    df.set_index("Grid Points"), 
    linear_function,
    format_uncertainty=".4f"
)
Counter   PM_RUN_CYC (min) is proportional to the grid points (nx*ny) by a factor of  8.1021 (± 0.0057)
Counter PM_INST_CMPL (min) is proportional to the grid points (nx*ny) by a factor of 14.0630 (± 0.0003)

Let's overlay our fits to the graphs from before.

In [6]:
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
for ax, pmu_counter in zip([ax1, ax2], ["PM_RUN_CYC (min)", "PM_INST_CMPL (min)"]):
    df.set_index("Grid Points")[pmu_counter].plot(ax=ax, legend=True);
    ax.plot(
        df["Grid Points"], 
        linear_function(df["Grid Points"], *fit_parameters[pmu_counter]), 
        linestyle="--", 
        label="Fit: {:.2f} * x + {:.2f}".format(*fit_parameters[pmu_counter])
    )
    ax.legend();

Please execute the next cell to summarize the first task.

In [38]:
print("The algorithm under investigation runs about {:.0f} cycles and executes about {:.0f} instructions per grid point".format(
    *[fit_parameters[pmu_counter][0] for pmu_counter in ["PM_RUN_CYC (min)", "PM_INST_CMPL (min)"]]
))
The algorithm under investigation runs about 8 cycles and executes about 14 instructions per grid point

Bonus:

The linear fits also calculate a y intersection (»b«). How do you interpret this value?

The y axis intersection; that is, b of the linear fit, is the inherent overhead of the program execution. Even if our program would not compute any stencil operation at all for any grid point, it would still complete this many (~1800) instructions and run this many (~680) cycles. Interestingly, it is also the unparallelizable overhead of this (toy) example.

We are revisiting the graph in a little while.

Back to top

Task 2: Measuring Loads and Stores

Looking at the source code, how many loads and stores from / to memory do you expect? Have a look at the loop which we instrumented.

Let's compare your estimate to what the system actually does!

Task A

Please measure counters for loads and stores. See the TODOs in poisson2d.ld_st.c. This time, implement PM_LD_CMPL and PM_ST_CMPL.

Compile with make task2, test your program with a single run with make run_task2, and then finally submit a benchmarking run to the batch system with make bench_task2. The following cell will take care of all this.

Back to top

In [3]:
!make bench_task2
bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.ld_st.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.ld_st.bin.csv
Job <24416> is submitted to default queue <batch>.
<<Waiting for dispatch ...>>
<<Starting on login1>>
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,4,0.0012,119819,598,817,32902,164,266
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,8,0.0013,161819,808,1027,56902,284,386
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,12,0.0014,221819,1108,1327,71902,359,461
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,16,0.0015,281819,1408,1627,86902,434,536
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,20,0.0015,341819,1708,1927,101902,509,611
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,24,0.0016,401819,2008,2227,116902,584,686
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,28,0.0016,461819,2308,2527,131902,659,761
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,32,0.0018,521819,2608,2827,146902,734,836
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,36,0.0018,581819,2908,3127,161902,809,911
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,40,0.0018,641819,3208,3427,176902,884,986
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,44,0.0019,701819,3508,3727,191902,959,1061
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,48,0.0020,761819,3808,4027,206902,1034,1136
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,52,0.0020,821819,4108,4327,221902,1109,1211
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,56,0.0021,881819,4408,4627,236902,1184,1286
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,60,0.0022,941819,4708,4927,251902,1259,1361
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,64,0.0023,1001819,5008,5227,266902,1334,1436
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,68,0.0023,1061819,5308,5527,281902,1409,1511
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,72,0.0025,1121819,5608,5827,296902,1484,1586
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,76,0.0028,1181819,5908,6127,311902,1559,1661
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,80,0.0025,1241819,6208,6427,326902,1634,1736
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,84,0.0026,1301819,6508,6727,341902,1709,1811
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,88,0.0026,1361819,6808,7027,356902,1784,1886
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,92,0.0027,1421819,7108,7327,371902,1859,1961
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,96,0.0028,1481819,7408,7627,386902,1934,2036
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,100,0.0029,1541819,7708,7927,401902,2009,2111
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,104,0.0029,1601819,8008,8227,416902,2084,2186
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,108,0.0031,1661819,8308,8527,431902,2159,2261
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,112,0.0030,1721819,8608,8827,446902,2234,2336
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,116,0.0031,1781819,8908,9127,461902,2309,2411
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,120,0.0032,1841819,9208,9427,476902,2384,2486
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,124,0.0033,1901819,9508,9727,491902,2459,2561
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,128,0.0033,1961819,9808,10027,506902,2534,2636
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,132,0.0034,2021819,10108,10327,521902,2609,2711
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,136,0.0035,2081819,10408,10627,536902,2684,2786
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,140,0.0036,2141819,10708,10927,551902,2759,2861
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,144,0.0036,2201819,11008,11227,566902,2834,2936
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,148,0.0036,2261819,11308,11527,581902,2909,3011
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,152,0.0037,2321819,11608,11827,596902,2984,3086
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,156,0.0038,2381819,11908,12127,611902,3059,3161
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,160,0.0040,2441819,12208,12427,626902,3134,3236
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,164,0.0039,2501819,12508,12727,641902,3209,3311
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,168,0.0040,2561819,12808,13027,656902,3284,3386
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,172,0.0040,2621819,13108,13327,671902,3359,3461
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,176,0.0041,2681819,13408,13627,686902,3434,3536
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,180,0.0041,2741819,13708,13927,701902,3509,3611
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,184,0.0042,2801819,14008,14227,716902,3584,3686
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,188,0.0044,2861819,14308,14527,731902,3659,3761
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,192,0.0044,2921819,14608,14827,746902,3734,3836
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,196,0.0045,2981819,14908,15127,761902,3809,3911
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,200,0.0045,3041819,15208,15427,776902,3884,3986
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,204,0.0045,3101819,15508,15727,791902,3959,4061
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,208,0.0046,3161819,15808,16027,806902,4034,4136
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,212,0.0047,3221819,16108,16327,821902,4109,4211
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,216,0.0047,3281819,16408,16627,836902,4184,4286
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,220,0.0048,3341819,16708,16927,851902,4259,4361
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,224,0.0049,3401819,17008,17227,866902,4334,4436
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,228,0.0050,3461819,17308,17527,881902,4409,4511
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,232,0.0050,3521819,17608,17827,896902,4484,4586
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,236,0.0051,3581819,17908,18127,911902,4559,4661
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,240,0.0051,3641819,18208,18427,926902,4634,4736
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,244,0.0052,3701819,18508,18727,941902,4709,4811
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,248,0.0053,3761819,18808,19027,956902,4784,4886
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,252,0.0053,3821819,19108,19327,971902,4859,4961
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,256,0.0054,3881819,19408,19627,986902,4934,5036
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,260,0.0055,3941819,19708,19927,1001902,5009,5111
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,264,0.0055,4001819,20008,20227,1016902,5084,5186
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,268,0.0056,4061819,20308,20527,1031902,5159,5261
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,272,0.0057,4121819,20608,20827,1046902,5234,5336
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,276,0.0057,4181819,20908,21127,1061902,5309,5411
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,280,0.0058,4241819,21208,21427,1076902,5384,5486
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,284,0.0059,4301819,21508,21727,1091902,5459,5561
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,288,0.0059,4361819,21808,22027,1106902,5534,5636
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,292,0.0060,4421819,22108,22327,1121902,5609,5711
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,296,0.0061,4481819,22408,22627,1136902,5684,5786
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,300,0.0061,4541819,22708,22927,1151902,5759,5861
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,304,0.0062,4601819,23008,23227,1166902,5834,5936
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,308,0.0063,4661819,23308,23527,1181902,5909,6011
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,312,0.0064,4721819,23608,23827,1196902,5984,6086
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,316,0.0066,4781819,23908,24127,1211902,6059,6161
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,320,0.0065,4841819,24208,24427,1226902,6134,6236
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,324,0.0065,4901819,24508,24727,1241902,6209,6311
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,328,0.0069,4961819,24808,25027,1256902,6284,6386
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,332,0.0066,5021819,25108,25327,1271902,6359,6461
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,336,0.0067,5081819,25408,25627,1286902,6434,6536
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,340,0.0068,5141819,25708,25927,1301902,6509,6611
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,344,0.0069,5201819,26008,26227,1316902,6584,6686
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,348,0.0069,5261819,26308,26527,1331902,6659,6761
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,352,0.0070,5321819,26608,26827,1346902,6734,6836
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,356,0.0070,5381819,26908,27127,1361902,6809,6911
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,360,0.0071,5441819,27208,27427,1376902,6884,6986
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,364,0.0072,5501819,27508,27727,1391902,6959,7061
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,368,0.0072,5561819,27808,28027,1406902,7034,7136
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,372,0.0073,5621819,28108,28327,1421902,7109,7211
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,376,0.0074,5681819,28408,28627,1436902,7184,7286
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,380,0.0074,5741819,28708,28927,1451902,7259,7361
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,384,0.0075,5801819,29008,29227,1466902,7334,7436
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,388,0.0076,5861819,29308,29527,1481902,7409,7511
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,392,0.0076,5921819,29608,29827,1496902,7484,7586
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,396,0.0077,5981819,29908,30127,1511902,7559,7661
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,400,0.0078,6041819,30208,30427,1526902,7634,7736
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,404,0.0079,6101819,30508,30727,1541902,7709,7811
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,408,0.0079,6161819,30808,31027,1556902,7784,7886
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,412,0.0080,6221819,31108,31327,1571902,7859,7961
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,416,0.0081,6281819,31408,31627,1586902,7934,8036
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,420,0.0081,6341819,31708,31927,1601902,8009,8111
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,424,0.0082,6401819,32008,32227,1616902,8084,8186
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,428,0.0082,6461819,32308,32527,1631902,8159,8261
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,432,0.0085,6521819,32608,32827,1646902,8234,8336
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,436,0.0084,6581819,32908,33127,1661902,8309,8411
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,440,0.0084,6641819,33208,33427,1676902,8384,8486
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,444,0.0085,6701819,33508,33727,1691902,8459,8561
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,448,0.0087,6761819,33808,34027,1706902,8534,8636
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,452,0.0087,6821819,34108,34327,1721902,8609,8711
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,456,0.0087,6881819,34408,34627,1736902,8684,8786
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,460,0.0088,6941819,34708,34927,1751902,8759,8861
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,464,0.0088,7001819,35008,35227,1766902,8834,8936
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,468,0.0089,7061819,35308,35527,1781902,8909,9011
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,472,0.0090,7121819,35608,35827,1796902,8984,9086
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,476,0.0091,7181819,35908,36127,1811902,9059,9161
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,480,0.0091,7241819,36208,36427,1826902,9134,9236
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,484,0.0092,7301819,36508,36727,1841902,9209,9311
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,488,0.0093,7361819,36808,37027,1856902,9284,9386
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,492,0.0094,7421819,37108,37327,1871902,9359,9461
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,496,0.0095,7481819,37408,37627,1886902,9434,9536
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,500,0.0094,7541819,37708,37927,1901902,9509,9611
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,504,0.0095,7601819,38008,38227,1916902,9584,9686
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,508,0.0096,7661819,38308,38527,1931902,9659,9761
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,512,0.0097,7721819,38608,38827,1946902,9734,9836
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,516,0.0098,7781819,38908,39127,1961902,9809,9911
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,520,0.0098,7841819,39208,39427,1976902,9884,9986
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,524,0.0099,7901819,39508,39727,1991902,9959,10061
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,528,0.0099,7961819,39808,40027,2006902,10034,10136
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,532,0.0100,8021819,40108,40327,2021902,10109,10211
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,536,0.0101,8081819,40408,40627,2036902,10184,10286
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,540,0.0101,8141819,40708,40927,2051902,10259,10361
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,544,0.0103,8201819,41008,41227,2066902,10334,10436
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,548,0.0103,8261819,41308,41527,2081902,10409,10511
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,552,0.0104,8321819,41608,41827,2096902,10484,10586
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,556,0.0106,8381819,41908,42127,2111902,10559,10661
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,560,0.0106,8441819,42208,42427,2126902,10634,10736
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,564,0.0106,8501819,42508,42727,2141902,10709,10811
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,568,0.0107,8561819,42808,43027,2156902,10784,10886
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,572,0.0108,8621819,43108,43327,2171902,10859,10961
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,576,0.0109,8681819,43408,43627,2186902,10934,11036
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,580,0.0110,8741819,43708,43927,2201902,11009,11111
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,584,0.0110,8801819,44008,44227,2216902,11084,11186
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,588,0.0110,8861819,44308,44527,2231902,11159,11261
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,592,0.0111,8921819,44608,44827,2246902,11234,11336
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,596,0.0113,8981819,44908,45127,2261902,11309,11411
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,600,0.0113,9041819,45208,45427,2276902,11384,11486
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,604,0.0114,9101819,45508,45727,2291902,11459,11561
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,608,0.0115,9161819,45808,46027,2306902,11534,11636
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,612,0.0115,9221819,46108,46327,2321902,11609,11711
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,616,0.0115,9281819,46408,46627,2336902,11684,11786
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,620,0.0116,9341819,46708,46927,2351902,11759,11861
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,624,0.0117,9401819,47008,47227,2366902,11834,11936
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,628,0.0117,9461819,47308,47527,2381902,11909,12011
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,632,0.0118,9521819,47608,47827,2396902,11984,12086
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,636,0.0119,9581819,47908,48127,2411902,12059,12161
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,640,0.0119,9641819,48208,48427,2426902,12134,12236
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,644,0.0121,9701819,48508,48727,2441902,12209,12311
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,648,0.0121,9761819,48808,49027,2456902,12284,12386
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,652,0.0121,9821819,49108,49327,2471902,12359,12461
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,656,0.0122,9881819,49408,49627,2486902,12434,12536
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,660,0.0123,9941819,49708,49927,2501902,12509,12611
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,664,0.0123,10001819,50008,50227,2516902,12584,12686
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,668,0.0124,10061819,50308,50527,2531902,12659,12761
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,672,0.0124,10121819,50608,50827,2546902,12734,12836
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,676,0.0126,10181819,50908,51127,2561902,12809,12911
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,680,0.0126,10241819,51208,51427,2576902,12884,12986
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,684,0.0127,10301819,51508,51727,2591902,12959,13061
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,688,0.0128,10361819,51808,52027,2606902,13034,13136
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,692,0.0128,10421819,52108,52327,2621902,13109,13211
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,696,0.0129,10481819,52408,52627,2636902,13184,13286
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,700,0.0131,10541819,52708,52927,2651902,13259,13361
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,704,0.0131,10601819,53008,53227,2666902,13334,13436
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,708,0.0130,10661819,53308,53527,2681902,13409,13511
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,712,0.0131,10721819,53608,53827,2696902,13484,13586
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,716,0.0132,10781819,53908,54127,2711902,13559,13661
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,720,0.0132,10841819,54208,54427,2726902,13634,13736
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,724,0.0134,10901819,54508,54727,2741902,13709,13811
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,728,0.0134,10961819,54808,55027,2756902,13784,13886
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,732,0.0134,11021819,55108,55327,2771902,13859,13961
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,736,0.0135,11081819,55408,55627,2786902,13934,14036
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,740,0.0137,11141819,55708,55927,2801902,14009,14111
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,744,0.0138,11201819,56008,56227,2816902,14084,14186
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,748,0.0137,11261819,56308,56527,2831902,14159,14261
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,752,0.0138,11321819,56608,56827,2846902,14234,14336
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,756,0.0139,11381819,56908,57127,2861902,14309,14411
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,760,0.0140,11441819,57208,57427,2876902,14384,14486
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,764,0.0140,11501819,57508,57727,2891902,14459,14561
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,768,0.0141,11561819,57808,58027,2906902,14534,14636
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,772,0.0141,11621819,58108,58327,2921902,14609,14711
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,776,0.0142,11681819,58408,58627,2936902,14684,14786
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,780,0.0143,11741819,58708,58927,2951902,14759,14861
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,784,0.0144,11801819,59008,59227,2966902,14834,14936
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,788,0.0144,11861819,59308,59527,2981902,14909,15011
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,792,0.0145,11921819,59608,59827,2996902,14984,15086
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,796,0.0145,11981819,59908,60127,3011902,15059,15161
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,800,0.0147,12041819,60208,60427,3026902,15134,15236
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,804,0.0147,12101819,60508,60727,3041902,15209,15311
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,808,0.0148,12161819,60808,61027,3056902,15284,15386
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,812,0.0148,12221819,61108,61327,3071902,15359,15461
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,816,0.0150,12281819,61408,61627,3086902,15434,15536
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,820,0.0149,12341819,61708,61927,3101902,15509,15611
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,824,0.0150,12401819,62008,62227,3116902,15584,15686
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,828,0.0151,12461819,62308,62527,3131902,15659,15761
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,832,0.0152,12521819,62608,62827,3146902,15734,15836
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,836,0.0152,12581819,62908,63127,3161902,15809,15911
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,840,0.0153,12641819,63208,63427,3176902,15884,15986
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,844,0.0153,12701819,63508,63727,3191902,15959,16061
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,848,0.0154,12761819,63808,64027,3206902,16034,16136
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,852,0.0155,12821819,64108,64327,3221902,16109,16211
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,856,0.0156,12881819,64408,64627,3236902,16184,16286
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,860,0.0156,12941819,64708,64927,3251902,16259,16361
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,864,0.0157,13001819,65008,65227,3266902,16334,16436
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,868,0.0158,13061819,65308,65527,3281902,16409,16511
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,872,0.0159,13121819,65608,65827,3296902,16484,16586
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,876,0.0159,13181819,65908,66127,3311902,16559,16661
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,880,0.0160,13241819,66208,66427,3326902,16634,16736
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,884,0.0160,13301819,66508,66727,3341902,16709,16811
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,888,0.0161,13361819,66808,67027,3356902,16784,16886
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,892,0.0162,13421819,67108,67327,3371902,16859,16961
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,896,0.0163,13481819,67408,67627,3386902,16934,17036
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,900,0.0164,13541819,67708,67927,3401902,17009,17111
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,904,0.0165,13601819,68008,68227,3416902,17084,17186
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,908,0.0165,13661819,68308,68527,3431902,17159,17261
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,912,0.0166,13721819,68608,68827,3446902,17234,17336
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,916,0.0166,13781819,68908,69127,3461902,17309,17411
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,920,0.0167,13841819,69208,69427,3476902,17384,17486
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,924,0.0168,13901819,69508,69727,3491902,17459,17561
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,928,0.0169,13961819,69808,70027,3506902,17534,17636
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,932,0.0175,14021819,70108,70327,3521902,17609,17711
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,936,0.0170,14081819,70408,70627,3536902,17684,17786
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,940,0.0171,14141819,70708,70927,3551902,17759,17861
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,944,0.0171,14201819,71008,71227,3566902,17834,17936
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,948,0.0172,14261819,71308,71527,3581902,17909,18011
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,952,0.0172,14321819,71608,71827,3596902,17984,18086
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,956,0.0173,14381819,71908,72127,3611902,18059,18161
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,960,0.0174,14441819,72208,72427,3626902,18134,18236
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,964,0.0176,14501819,72508,72727,3641902,18209,18311
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,968,0.0178,14561819,72808,73027,3656902,18284,18386
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,972,0.0177,14621819,73108,73327,3671902,18359,18461
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,976,0.0178,14681819,73408,73627,3686902,18434,18536
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,980,0.0179,14741819,73708,73927,3701902,18509,18611
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,984,0.0179,14801819,74008,74227,3716902,18584,18686
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,988,0.0180,14861819,74308,74527,3731902,18659,18761
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,992,0.0181,14921819,74608,74827,3746902,18734,18836
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,996,0.0182,14981819,74908,75127,3761902,18809,18911
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,1000,0.0182,15041819,75208,75427,3776902,18884,18986
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,1004,0.0183,15101819,75508,75727,3791902,18959,19061
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,1008,0.0183,15161819,75808,76027,3806902,19034,19136
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,1012,0.0184,15221819,76108,76327,3821902,19109,19211
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,1016,0.0185,15281819,76408,76627,3836902,19184,19286
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,1020,0.0185,15341819,76708,76927,3851902,19259,19361
iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max)
200,32,1024,0.0186,15401819,77008,77227,3866902,19334,19436
mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.ld_st.bin.csv .

Once the run finished, let's plot it again in the course of the following cells (non-interactive: make graph_task2a).

In [8]:
df_ldst = pd.read_csv("poisson2d.ld_st.bin.csv", skiprows=range(2, 50000, 2))
df_ldst["Grid Points"] = df_ldst["nx"] * df_ldst["ny"] 
df_ldst.head()
Out[8]:
iter ny nx Runtime PM_LD_CMPL (total) PM_LD_CMPL (min) PM_LD_CMPL (max) PM_ST_CMPL (total) PM_ST_CMPL (min) PM_ST_CMPL (max) Grid Points
0 200 32 4 0.0012 119819 598 817 32902 164 266 128
1 200 32 8 0.0013 161819 808 1027 56902 284 386 256
2 200 32 12 0.0014 221819 1108 1327 71902 359 461 384
3 200 32 16 0.0015 281819 1408 1627 86902 434 536 512
4 200 32 20 0.0015 341819 1708 1927 101902 509 611 640
In [9]:
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
df_ldst.set_index("Grid Points")["PM_LD_CMPL (min)"].plot(ax=ax1, legend=True);
df_ldst.set_index("Grid Points")["PM_ST_CMPL (min)"].plot(ax=ax2, legend=True);

Also this behaviour looks – at a first glance – linear. We can again fit a first-order polynom (and re-use our previously defined function curve_fit)!

In [29]:
_fit, _cov = common.print_and_return_fit(
    ["PM_LD_CMPL (min)", "PM_ST_CMPL (min)"], 
    df_ldst.set_index("Grid Points"), 
    linear_function,
    format_value=".4f"
)
fit_parameters = {**fit_parameters, **_fit}
fit_covariance = {**fit_covariance, **_cov}
Counter PM_LD_CMPL (min) is proportional to the grid points (nx*ny) by a factor of 2.3437 (± 0.000037)
Counter PM_ST_CMPL (min) is proportional to the grid points (nx*ny) by a factor of 0.5860 (± 0.000019)

Let's overlay this in one common plot:

In [28]:
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
for ax, pmu_counter in zip([ax1, ax2], ["PM_LD_CMPL (min)", "PM_ST_CMPL (min)"]):
    df_ldst.set_index("Grid Points")[pmu_counter].plot(ax=ax, legend=True);
    ax.plot(
        df_ldst["Grid Points"], 
        linear_function(df["Grid Points"], *fit_parameters[pmu_counter]), 
        linestyle="--", 
        label="Fit: {:.2f} * x + {:.2f}".format(*fit_parameters[pmu_counter])
    )
    ax.legend();

Did you expect more?

The reason is simple: Among the load and store instructions counted by PM_LD_CMPL and PM_ST_CMPL are vector instructions which can load and store multiple (in this case: two) values at a time. To see how many bytes are loaded and stored, we need to measure counters for vectorized loads and stores as well.

TASK B

Please measure counters for vectorized loads and vectorized stores. See the TODOs in poisson2d.vld.c and poisson2d.vst.c (Note: These vector counters can not be measured together and need separate files and runs). Can you find out the name of the counters yourself, using papi_native_avail | grep VECTOR_?

Compile, test, and bench-run your program again.

Back to top

In [9]:
!papi_native_avail | grep VECTOR_
| PM_VECTOR_FLOP_CMPL                                                          |
| PM_VECTOR_LD_CMPL                                                            |
| PM_VECTOR_ST_CMPL                                                            |

make bench_task3 will submit benchmark runs of both vectorized counters to the batch system (as two subsequent runs of the individual files).

In [1]:
!make bench_task3
bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.vld.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.vld.bin.csv
Job <24641> is submitted to default queue <batch>.
<<Waiting for dispatch ...>>
<<Starting on login1>>
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,4,0.0010,0,0,0
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,8,0.0011,114000,570,570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,12,0.0012,174000,870,870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,16,0.0012,234000,1170,1170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,20,0.0013,294000,1470,1470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,24,0.0014,354000,1770,1770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,28,0.0014,414000,2070,2070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,32,0.0015,474000,2370,2370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,36,0.0016,534000,2670,2670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,40,0.0016,594000,2970,2970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,44,0.0017,654000,3270,3270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,48,0.0018,714000,3570,3570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,52,0.0018,774000,3870,3870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,56,0.0019,834000,4170,4170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,60,0.0020,894000,4470,4470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,64,0.0021,954000,4770,4770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,68,0.0022,1014000,5070,5070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,72,0.0022,1074000,5370,5370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,76,0.0022,1134000,5670,5670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,80,0.0023,1194000,5970,5970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,84,0.0024,1254000,6270,6270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,88,0.0024,1314000,6570,6570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,92,0.0025,1374000,6870,6870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,96,0.0027,1434000,7170,7170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,100,0.0026,1494000,7470,7470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,104,0.0029,1554000,7770,7770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,108,0.0027,1614000,8070,8070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,112,0.0028,1674000,8370,8370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,116,0.0029,1734000,8670,8670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,120,0.0029,1794000,8970,8970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,124,0.0030,1854000,9270,9270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,128,0.0032,1914000,9570,9570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,132,0.0031,1974000,9870,9870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,136,0.0032,2034000,10170,10170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,140,0.0033,2094000,10470,10470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,144,0.0033,2154000,10770,10770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,148,0.0034,2214000,11070,11070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,152,0.0036,2274000,11370,11370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,156,0.0035,2334000,11670,11670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,160,0.0036,2394000,11970,11970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,164,0.0037,2454000,12270,12270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,168,0.0037,2514000,12570,12570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,172,0.0038,2574000,12870,12870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,176,0.0039,2634000,13170,13170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,180,0.0039,2694000,13470,13470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,184,0.0040,2754000,13770,13770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,188,0.0041,2814000,14070,14070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,192,0.0041,2874000,14370,14370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,196,0.0042,2934000,14670,14670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,200,0.0042,2994000,14970,14970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,204,0.0043,3054000,15270,15270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,208,0.0045,3114000,15570,15570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,212,0.0045,3174000,15870,15870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,216,0.0045,3234000,16170,16170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,220,0.0046,3294000,16470,16470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,224,0.0048,3354000,16770,16770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,228,0.0047,3414000,17070,17070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,232,0.0048,3474000,17370,17370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,236,0.0048,3534000,17670,17670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,240,0.0049,3594000,17970,17970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,244,0.0050,3654000,18270,18270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,248,0.0052,3714000,18570,18570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,252,0.0051,3774000,18870,18870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,256,0.0052,3834000,19170,19170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,260,0.0052,3894000,19470,19470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,264,0.0053,3954000,19770,19770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,268,0.0054,4014000,20070,20070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,272,0.0054,4074000,20370,20370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,276,0.0055,4134000,20670,20670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,280,0.0056,4194000,20970,20970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,284,0.0056,4254000,21270,21270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,288,0.0057,4314000,21570,21570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,292,0.0058,4374000,21870,21870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,296,0.0058,4434000,22170,22170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,300,0.0059,4494000,22470,22470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,304,0.0059,4554000,22770,22770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,308,0.0060,4614000,23070,23070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,312,0.0061,4674000,23370,23370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,316,0.0062,4734000,23670,23670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,320,0.0062,4794000,23970,23970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,324,0.0063,4854000,24270,24270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,328,0.0063,4914000,24570,24570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,332,0.0064,4974000,24870,24870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,336,0.0065,5034000,25170,25170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,340,0.0065,5094000,25470,25470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,344,0.0066,5154000,25770,25770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,348,0.0069,5214000,26070,26070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,352,0.0068,5274000,26370,26370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,356,0.0070,5334000,26670,26670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,360,0.0069,5394000,26970,26970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,364,0.0070,5454000,27270,27270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,368,0.0070,5514000,27570,27570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,372,0.0071,5574000,27870,27870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,376,0.0073,5634000,28170,28170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,380,0.0073,5694000,28470,28470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,384,0.0073,5754000,28770,28770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,388,0.0074,5814000,29070,29070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,392,0.0074,5874000,29370,29370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,396,0.0076,5934000,29670,29670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,400,0.0075,5994000,29970,29970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,404,0.0076,6054000,30270,30270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,408,0.0077,6114000,30570,30570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,412,0.0078,6174000,30870,30870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,416,0.0079,6234000,31170,31170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,420,0.0079,6294000,31470,31470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,424,0.0079,6354000,31770,31770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,428,0.0080,6414000,32070,32070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,432,0.0080,6474000,32370,32370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,436,0.0081,6534000,32670,32670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,440,0.0082,6594000,32970,32970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,444,0.0083,6654000,33270,33270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,448,0.0084,6714000,33570,33570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,452,0.0084,6774000,33870,33870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,456,0.0084,6834000,34170,34170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,460,0.0085,6894000,34470,34470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,464,0.0086,6954000,34770,34770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,468,0.0087,7014000,35070,35070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,472,0.0088,7074000,35370,35370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,476,0.0088,7134000,35670,35670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,480,0.0089,7194000,35970,35970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,484,0.0090,7254000,36270,36270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,488,0.0091,7314000,36570,36570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,492,0.0091,7374000,36870,36870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,496,0.0091,7434000,37170,37170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,500,0.0094,7494000,37470,37470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,504,0.0093,7554000,37770,37770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,508,0.0095,7614000,38070,38070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,512,0.0096,7674000,38370,38370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,516,0.0095,7734000,38670,38670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,520,0.0095,7794000,38970,38970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,524,0.0097,7854000,39270,39270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,528,0.0097,7914000,39570,39570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,532,0.0098,7974000,39870,39870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,536,0.0098,8034000,40170,40170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,540,0.0099,8094000,40470,40470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,544,0.0100,8154000,40770,40770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,548,0.0101,8214000,41070,41070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,552,0.0101,8274000,41370,41370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,556,0.0104,8334000,41670,41670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,560,0.0103,8394000,41970,41970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,564,0.0103,8454000,42270,42270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,568,0.0106,8514000,42570,42570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,572,0.0105,8574000,42870,42870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,576,0.0106,8634000,43170,43170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,580,0.0108,8694000,43470,43470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,584,0.0109,8754000,43770,43770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,588,0.0108,8814000,44070,44070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,592,0.0109,8874000,44370,44370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,596,0.0109,8934000,44670,44670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,600,0.0110,8994000,44970,44970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,604,0.0111,9054000,45270,45270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,608,0.0112,9114000,45570,45570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,612,0.0112,9174000,45870,45870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,616,0.0114,9234000,46170,46170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,620,0.0113,9294000,46470,46470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,624,0.0114,9354000,46770,46770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,628,0.0117,9414000,47070,47070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,632,0.0116,9474000,47370,47370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,636,0.0116,9534000,47670,47670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,640,0.0117,9594000,47970,47970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,644,0.0119,9654000,48270,48270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,648,0.0118,9714000,48570,48570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,652,0.0119,9774000,48870,48870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,656,0.0119,9834000,49170,49170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,660,0.0121,9894000,49470,49470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,664,0.0122,9954000,49770,49770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,668,0.0123,10014000,50070,50070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,672,0.0122,10074000,50370,50370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,676,0.0123,10134000,50670,50670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,680,0.0123,10194000,50970,50970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,684,0.0125,10254000,51270,51270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,688,0.0125,10314000,51570,51570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,692,0.0127,10374000,51870,51870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,696,0.0126,10434000,52170,52170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,700,0.0127,10494000,52470,52470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,704,0.0128,10554000,52770,52770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,708,0.0129,10614000,53070,53070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,712,0.0128,10674000,53370,53370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,716,0.0131,10734000,53670,53670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,720,0.0130,10794000,53970,53970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,724,0.0130,10854000,54270,54270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,728,0.0132,10914000,54570,54570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,732,0.0133,10974000,54870,54870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,736,0.0135,11034000,55170,55170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,740,0.0135,11094000,55470,55470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,744,0.0135,11154000,55770,55770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,748,0.0134,11214000,56070,56070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,752,0.0135,11274000,56370,56370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,756,0.0136,11334000,56670,56670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,760,0.0137,11394000,56970,56970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,764,0.0137,11454000,57270,57270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,768,0.0138,11514000,57570,57570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,772,0.0139,11574000,57870,57870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,776,0.0141,11634000,58170,58170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,780,0.0140,11694000,58470,58470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,784,0.0142,11754000,58770,58770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,788,0.0141,11814000,59070,59070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,792,0.0142,11874000,59370,59370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,796,0.0143,11934000,59670,59670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,800,0.0143,11994000,59970,59970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,804,0.0145,12054000,60270,60270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,808,0.0145,12114000,60570,60570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,812,0.0145,12174000,60870,60870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,816,0.0148,12234000,61170,61170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,820,0.0148,12294000,61470,61470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,824,0.0148,12354000,61770,61770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,828,0.0148,12414000,62070,62070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,832,0.0149,12474000,62370,62370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,836,0.0150,12534000,62670,62670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,840,0.0150,12594000,62970,62970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,844,0.0151,12654000,63270,63270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,848,0.0153,12714000,63570,63570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,852,0.0153,12774000,63870,63870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,856,0.0153,12834000,64170,64170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,860,0.0154,12894000,64470,64470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,864,0.0154,12954000,64770,64770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,868,0.0155,13014000,65070,65070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,872,0.0157,13074000,65370,65370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,876,0.0156,13134000,65670,65670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,880,0.0157,13194000,65970,65970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,884,0.0157,13254000,66270,66270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,888,0.0158,13314000,66570,66570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,892,0.0159,13374000,66870,66870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,896,0.0160,13434000,67170,67170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,900,0.0160,13494000,67470,67470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,904,0.0162,13554000,67770,67770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,908,0.0162,13614000,68070,68070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,912,0.0163,13674000,68370,68370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,916,0.0163,13734000,68670,68670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,920,0.0164,13794000,68970,68970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,924,0.0165,13854000,69270,69270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,928,0.0166,13914000,69570,69570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,932,0.0166,13974000,69870,69870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,936,0.0167,14034000,70170,70170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,940,0.0167,14094000,70470,70470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,944,0.0168,14154000,70770,70770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,948,0.0170,14214000,71070,71070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,952,0.0171,14274000,71370,71370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,956,0.0171,14334000,71670,71670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,960,0.0171,14394000,71970,71970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,964,0.0175,14454000,72270,72270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,968,0.0176,14514000,72570,72570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,972,0.0176,14574000,72870,72870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,976,0.0175,14634000,73170,73170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,980,0.0178,14694000,73470,73470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,984,0.0180,14754000,73770,73770
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,988,0.0178,14814000,74070,74070
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,992,0.0179,14874000,74370,74370
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,996,0.0181,14934000,74670,74670
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,1000,0.0180,14994000,74970,74970
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,1004,0.0182,15054000,75270,75270
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,1008,0.0181,15114000,75570,75570
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,1012,0.0183,15174000,75870,75870
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,1016,0.0183,15234000,76170,76170
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,1020,0.0186,15294000,76470,76470
iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max)
200,32,1024,0.0182,15354000,76770,76770
mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.vld.bin.csv .
bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.vst.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.vst.bin.csv
Job <24642> is submitted to default queue <batch>.
<<Waiting for dispatch ...>>
<<Starting on login1>>
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,4,0.0010,200,1,1
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,8,0.0011,18200,91,91
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,12,0.0012,30200,151,151
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,16,0.0012,42200,211,211
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,20,0.0013,54200,271,271
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,24,0.0013,66200,331,331
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,28,0.0014,78200,391,391
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,32,0.0015,90200,451,451
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,36,0.0015,102200,511,511
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,40,0.0016,114200,571,571
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,44,0.0017,126200,631,631
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,48,0.0017,138200,691,691
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,52,0.0018,150200,751,751
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,56,0.0019,162200,811,811
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,60,0.0020,174200,871,871
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,64,0.0020,186200,931,931
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,68,0.0022,198200,991,991
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,72,0.0023,210200,1051,1051
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,76,0.0022,222200,1111,1111
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,80,0.0023,234200,1171,1171
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,84,0.0024,246200,1231,1231
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,88,0.0024,258200,1291,1291
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,92,0.0025,270200,1351,1351
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,96,0.0025,282200,1411,1411
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,100,0.0026,294200,1471,1471
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,104,0.0027,306200,1531,1531
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,108,0.0028,318200,1591,1591
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,112,0.0028,330200,1651,1651
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,116,0.0029,342200,1711,1711
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,120,0.0030,354200,1771,1771
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,124,0.0030,366200,1831,1831
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,128,0.0031,378200,1891,1891
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,132,0.0032,390200,1951,1951
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,136,0.0032,402200,2011,2011
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,140,0.0033,414200,2071,2071
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,144,0.0033,426200,2131,2131
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,148,0.0035,438200,2191,2191
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,152,0.0035,450200,2251,2251
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,156,0.0035,462200,2311,2311
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,160,0.0036,474200,2371,2371
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,164,0.0038,486200,2431,2431
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,168,0.0037,498200,2491,2491
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,172,0.0038,510200,2551,2551
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,176,0.0038,522200,2611,2611
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,180,0.0039,534200,2671,2671
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,184,0.0040,546200,2731,2731
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,188,0.0040,558200,2791,2791
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,192,0.0041,570200,2851,2851
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,196,0.0042,582200,2911,2911
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,200,0.0044,594200,2971,2971
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,204,0.0043,606200,3031,3031
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,208,0.0044,618200,3091,3091
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,212,0.0044,630200,3151,3151
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,216,0.0045,642200,3211,3211
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,220,0.0046,654200,3271,3271
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,224,0.0046,666200,3331,3331
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,228,0.0047,678200,3391,3391
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,232,0.0048,690200,3451,3451
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,236,0.0048,702200,3511,3511
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,240,0.0049,714200,3571,3571
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,244,0.0050,726200,3631,3631
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,248,0.0050,738200,3691,3691
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,252,0.0051,750200,3751,3751
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,256,0.0052,762200,3811,3811
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,260,0.0052,774200,3871,3871
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,264,0.0053,786200,3931,3931
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,268,0.0054,798200,3991,3991
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,272,0.0054,810200,4051,4051
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,276,0.0055,822200,4111,4111
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,280,0.0055,834200,4171,4171
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,284,0.0056,846200,4231,4231
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,288,0.0057,858200,4291,4291
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,292,0.0057,870200,4351,4351
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,296,0.0058,882200,4411,4411
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,300,0.0059,894200,4471,4471
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,304,0.0059,906200,4531,4531
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,308,0.0060,918200,4591,4591
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,312,0.0061,930200,4651,4651
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,316,0.0061,942200,4711,4711
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,320,0.0062,954200,4771,4771
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,324,0.0063,966200,4831,4831
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,328,0.0063,978200,4891,4891
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,332,0.0064,990200,4951,4951
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,336,0.0065,1002200,5011,5011
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,340,0.0066,1014200,5071,5071
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,344,0.0066,1026200,5131,5131
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,348,0.0067,1038200,5191,5191
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,352,0.0069,1050200,5251,5251
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,356,0.0068,1062200,5311,5311
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,360,0.0068,1074200,5371,5371
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,364,0.0069,1086200,5431,5431
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,368,0.0070,1098200,5491,5491
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,372,0.0071,1110200,5551,5551
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,376,0.0071,1122200,5611,5611
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,380,0.0072,1134200,5671,5671
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,384,0.0073,1146200,5731,5731
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,388,0.0073,1158200,5791,5791
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,392,0.0074,1170200,5851,5851
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,396,0.0075,1182200,5911,5911
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,400,0.0075,1194200,5971,5971
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,404,0.0076,1206200,6031,6031
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,408,0.0077,1218200,6091,6091
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,412,0.0077,1230200,6151,6151
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,416,0.0080,1242200,6211,6211
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,420,0.0078,1254200,6271,6271
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,424,0.0079,1266200,6331,6331
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,428,0.0080,1278200,6391,6391
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,432,0.0081,1290200,6451,6451
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,436,0.0082,1302200,6511,6511
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,440,0.0082,1314200,6571,6571
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,444,0.0083,1326200,6631,6631
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,448,0.0083,1338200,6691,6691
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,452,0.0084,1350200,6751,6751
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,456,0.0085,1362200,6811,6811
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,460,0.0085,1374200,6871,6871
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,464,0.0087,1386200,6931,6931
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,468,0.0086,1398200,6991,6991
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,472,0.0087,1410200,7051,7051
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,476,0.0088,1422200,7111,7111
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,480,0.0090,1434200,7171,7171
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,484,0.0089,1446200,7231,7231
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,488,0.0090,1458200,7291,7291
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,492,0.0092,1470200,7351,7351
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,496,0.0092,1482200,7411,7411
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,500,0.0092,1494200,7471,7471
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,504,0.0093,1506200,7531,7531
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,508,0.0094,1518200,7591,7591
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,512,0.0095,1530200,7651,7651
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,516,0.0096,1542200,7711,7711
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,520,0.0096,1554200,7771,7771
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,524,0.0096,1566200,7831,7831
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,528,0.0097,1578200,7891,7891
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,532,0.0097,1590200,7951,7951
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,536,0.0098,1602200,8011,8011
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,540,0.0100,1614200,8071,8071
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,544,0.0099,1626200,8131,8131
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,548,0.0100,1638200,8191,8191
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,552,0.0101,1650200,8251,8251
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,556,0.0102,1662200,8311,8311
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,560,0.0102,1674200,8371,8371
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,564,0.0105,1686200,8431,8431
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,568,0.0104,1698200,8491,8491
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,572,0.0105,1710200,8551,8551
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,576,0.0105,1722200,8611,8611
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,580,0.0108,1734200,8671,8671
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,584,0.0108,1746200,8731,8731
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,588,0.0109,1758200,8791,8791
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,592,0.0109,1770200,8851,8851
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,596,0.0109,1782200,8911,8911
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,600,0.0111,1794200,8971,8971
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,604,0.0111,1806200,9031,9031
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,608,0.0112,1818200,9091,9091
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,612,0.0112,1830200,9151,9151
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,616,0.0114,1842200,9211,9211
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,620,0.0113,1854200,9271,9271
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,624,0.0114,1866200,9331,9331
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,628,0.0114,1878200,9391,9391
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,632,0.0116,1890200,9451,9451
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,636,0.0116,1902200,9511,9511
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,640,0.0117,1914200,9571,9571
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,644,0.0118,1926200,9631,9631
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,648,0.0118,1938200,9691,9691
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,652,0.0121,1950200,9751,9751
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,656,0.0121,1962200,9811,9811
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,660,0.0121,1974200,9871,9871
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,664,0.0121,1986200,9931,9931
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,668,0.0122,1998200,9991,9991
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,672,0.0122,2010200,10051,10051
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,676,0.0124,2022200,10111,10111
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,680,0.0123,2034200,10171,10171
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,684,0.0124,2046200,10231,10231
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,688,0.0126,2058200,10291,10291
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,692,0.0127,2070200,10351,10351
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,696,0.0126,2082200,10411,10411
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,700,0.0128,2094200,10471,10471
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,704,0.0127,2106200,10531,10531
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,708,0.0128,2118200,10591,10591
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,712,0.0129,2130200,10651,10651
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,716,0.0130,2142200,10711,10711
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,720,0.0130,2154200,10771,10771
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,724,0.0131,2166200,10831,10831
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,728,0.0131,2178200,10891,10891
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,732,0.0132,2190200,10951,10951
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,736,0.0134,2202200,11011,11011
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,740,0.0134,2214200,11071,11071
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,744,0.0134,2226200,11131,11131
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,748,0.0135,2238200,11191,11191
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,752,0.0136,2250200,11251,11251
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,756,0.0136,2262200,11311,11311
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,760,0.0137,2274200,11371,11371
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,764,0.0138,2286200,11431,11431
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,768,0.0138,2298200,11491,11491
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,772,0.0139,2310200,11551,11551
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,776,0.0139,2322200,11611,11611
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,780,0.0140,2334200,11671,11671
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,784,0.0141,2346200,11731,11731
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,788,0.0142,2358200,11791,11791
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,792,0.0142,2370200,11851,11851
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,796,0.0144,2382200,11911,11911
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,800,0.0144,2394200,11971,11971
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,804,0.0144,2406200,12031,12031
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,808,0.0146,2418200,12091,12091
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,812,0.0146,2430200,12151,12151
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,816,0.0146,2442200,12211,12211
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,820,0.0147,2454200,12271,12271
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,824,0.0148,2466200,12331,12331
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,828,0.0149,2478200,12391,12391
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,832,0.0149,2490200,12451,12451
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,836,0.0150,2502200,12511,12511
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,840,0.0151,2514200,12571,12571
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,844,0.0152,2526200,12631,12631
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,848,0.0151,2538200,12691,12691
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,852,0.0152,2550200,12751,12751
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,856,0.0153,2562200,12811,12811
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,860,0.0154,2574200,12871,12871
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,864,0.0155,2586200,12931,12931
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,868,0.0155,2598200,12991,12991
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,872,0.0156,2610200,13051,13051
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,876,0.0156,2622200,13111,13111
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,880,0.0157,2634200,13171,13171
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,884,0.0158,2646200,13231,13231
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,888,0.0159,2658200,13291,13291
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,892,0.0159,2670200,13351,13351
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,896,0.0160,2682200,13411,13411
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,900,0.0160,2694200,13471,13471
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,904,0.0162,2706200,13531,13531
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,908,0.0162,2718200,13591,13591
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,912,0.0163,2730200,13651,13651
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,916,0.0163,2742200,13711,13711
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,920,0.0164,2754200,13771,13771
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,924,0.0165,2766200,13831,13831
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,928,0.0166,2778200,13891,13891
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,932,0.0168,2790200,13951,13951
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,936,0.0167,2802200,14011,14011
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,940,0.0169,2814200,14071,14071
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,944,0.0169,2826200,14131,14131
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,948,0.0169,2838200,14191,14191
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,952,0.0170,2850200,14251,14251
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,956,0.0170,2862200,14311,14311
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,960,0.0171,2874200,14371,14371
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,964,0.0175,2886200,14431,14431
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,968,0.0175,2898200,14491,14491
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,972,0.0176,2910200,14551,14551
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,976,0.0176,2922200,14611,14611
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,980,0.0178,2934200,14671,14671
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,984,0.0178,2946200,14731,14731
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,988,0.0179,2958200,14791,14791
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,992,0.0178,2970200,14851,14851
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,996,0.0181,2982200,14911,14911
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,1000,0.0180,2994200,14971,14971
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,1004,0.0181,3006200,15031,15031
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,1008,0.0182,3018200,15091,15091
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,1012,0.0183,3030200,15151,15151
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,1016,0.0183,3042200,15211,15211
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,1020,0.0184,3054200,15271,15271
iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max)
200,32,1024,0.0182,3066200,15331,15331
mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.vst.bin.csv .

Let's plot it again, as soon as the run finishes! Non-interactively, call graph_task2b.

Because we couldn't measure the two vector counters at the same time, we have two CSV files to read in now. We combine them into one common dataframe df_vldvst in the following.

In [31]:
df_vld = pd.read_csv("poisson2d.vld.bin.csv", skiprows=range(2, 50000, 2))
df_vst = pd.read_csv("poisson2d.vst.bin.csv", skiprows=range(2, 50000, 2))
df_vldvst = pd.concat([df_vld.set_index("nx"), df_vst.set_index("nx")[['PM_VECTOR_ST_CMPL (total)', 'PM_VECTOR_ST_CMPL (min)', ' PM_VECTOR_ST_CMPL (max)']]], axis=1).reset_index()
In [32]:
df_vldvst["Grid Points"] = df_vldvst["nx"] * df_vldvst["ny"] 
df_vldvst.head()
Out[32]:
nx iter ny Runtime PM_VECTOR_LD_CMPL (total) PM_VECTOR_LD_CMPL (min) PM_VECTOR_LD_CMPL (max) PM_VECTOR_ST_CMPL (total) PM_VECTOR_ST_CMPL (min) PM_VECTOR_ST_CMPL (max) Grid Points
0 4 200 32 0.0010 0 0 0 200 1 1 128
1 8 200 32 0.0011 114000 570 570 18200 91 91 256
2 12 200 32 0.0012 174000 870 870 30200 151 151 384
3 16 200 32 0.0012 234000 1170 1170 42200 211 211 512
4 20 200 32 0.0013 294000 1470 1470 54200 271 271 640
In [33]:
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
df_vldvst.set_index("Grid Points")["PM_VECTOR_LD_CMPL (min)"].plot(ax=ax1, legend=True);
df_vldvst.set_index("Grid Points")["PM_VECTOR_ST_CMPL (min)"].plot(ax=ax2, legend=True);

Also here seems to be a linear correlation. Let's do our fitting and plot directly.

In [34]:
_fit, _cov = common.print_and_return_fit(
    ["PM_VECTOR_LD_CMPL (min)", "PM_VECTOR_ST_CMPL (min)"], 
    df_vldvst.set_index("Grid Points"), 
    linear_function,
    format_value=".4f",
)
fit_parameters = {**fit_parameters, **_fit}
fit_covariance = {**fit_covariance, **_cov}
Counter PM_VECTOR_LD_CMPL (min) is proportional to the grid points (nx*ny) by a factor of 2.3439 (± 0.000111)
Counter PM_VECTOR_ST_CMPL (min) is proportional to the grid points (nx*ny) by a factor of 0.4688 (± 0.000012)
In [35]:
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
for ax, pmu_counter in zip([ax1, ax2], ["PM_VECTOR_LD_CMPL (min)", "PM_VECTOR_ST_CMPL (min)"]):
    df_vldvst.set_index("Grid Points")[pmu_counter].plot(ax=ax, legend=True);
    ax.plot(
        df_vldvst["Grid Points"], 
        linear_function(df["Grid Points"], *fit_parameters[pmu_counter]), 
        linestyle="--", 
        label="Fit: {:.2f} * x + {:.2f}".format(*fit_parameters[pmu_counter])
    )
    ax.legend();

Let's try to make sense of those numbers.

Vector loads and vector stores use two 8 Byte values at a time. When we measured loads and stores with LD_CMPL and ST_CMPL in part A of this task, we measured total number of stores and loads; that is: vector and scalar versions of the instructions. In order to convert the load and store instructions into bytes loaded and stored, we need to separate them. The difference of total instructions and vector instructions yield scalar instructions. We multiply the scalar instructions by 8 Byte (double precision) and the vector instructions by 16 Byte (two loads or stores of double precision). That yields the loaded or stored data (or, more precisely, the instruction-equivalent data).

To formualize it, see the following equations, as an example for load ($ld$), with $b$ denoting data loaded in bytes and $n$ denoting the number of instructions.

\begin{align} b_\text{ld} &= b_\text{ld}^\text{scalar} + b_\text{ld}^\text{vector}\\ b_\text{ld}^\text{scalar} &= n_\text{ld}^\text{scalar} * 8\,\text{Byte} \\ b_\text{ld}^\text{vector} &= n_\text{ld}^\text{vector} * 16\,\text{Byte} \\ n_\text{ld}^\text{scalar} &= n_\text{ld}^\text{total} - n_\text{ld}^\text{vector}\\ \Rightarrow b_\text{ld} &= n_\text{ld}^\text{scalar}* 8 \,\text{Byte} + n_\text{ld}^\text{vector} * 16\,\text{Byte} \\ & = (n_\text{ld}^\text{scalar}+2 n_\text{ld}^\text{vector}) * 8\,Byte \\ & = (n_\text{ld}^\text{total} - n_\text{ld}^\text{vector} + 2 n_\text{ld}^\text{vector}) * 8\,Byte \\ & = (n_\text{ld}^\text{total} + n_\text{ld}^\text{vector}) *8\,Byte \end{align}

We are going to print this in the next cell. In case you look at this Notebook non-interactively, call graph_task2b-2.

In [37]:
df_byte = pd.DataFrame()
df_byte["Loads"]  = (df_vldvst.set_index("Grid Points")["PM_VECTOR_LD_CMPL (min)"] + df_ldst.set_index("Grid Points")["PM_LD_CMPL (min)"])*8
df_byte["Stores"] = (df_vldvst.set_index("Grid Points")["PM_VECTOR_ST_CMPL (min)"] + df_ldst.set_index("Grid Points")["PM_ST_CMPL (min)"])*8
ax = df_byte.plot()
ax.set_ylabel("Bytes");

Let's quantify the difference by, again, fitting a linear function to the data.

In [38]:
_fit, _cov = common.print_and_return_fit(
    ["Loads", "Stores"], 
    df_byte, 
    linear_function
)
fit_parameters = {**fit_parameters, **_fit}
fit_covariance = {**fit_covariance, **_cov}
Counter  Loads is proportional to the grid points (nx*ny) by a factor of 37.5010 (± 0.000592)
Counter Stores is proportional to the grid points (nx*ny) by a factor of  8.4379 (± 0.000247)

Analagously to the proportionality factors, this much is loaded/stored per grid point.

Not really a TASK C: We can combine this information with the cycles measured in Task 1 to create a bandwidth of exchanged bytes per cycle.

In [50]:
df_bandwidth = pd.DataFrame()
df_bandwidth["Bandwidth / Byte/Cycle"] = (df_byte["Loads"] + df_byte["Stores"]) / df.set_index("Grid Points")["PM_RUN_CYC (min)"]

Let's display it as a function of grid points. And also compare it to the available L1 cache bandwidth in a second (sub-)plot. Non-interactive users, call make graph_task2c.

In [51]:
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
for ax in [ax1, ax2]:
    df_bandwidth["Bandwidth / Byte/Cycle"].plot(ax=ax, legend=True, label="Jacobi Bandwidth")
    ax.set_ylabel("Byte/Cycle")
ax2.axhline(2*16, color=sns.color_palette()[1], label="L1 Bandwidth");
ax2.legend();

As you can see, we are quite a bit away from the available L1 cache bandwidth. Can you think of reasons why?

Task E1: Measuring FlOps

If you still have time, feel free to work on the following extended task.

TASK: Please measure counters for vectorized floating point operations and scalar floating point operations. The two counters can also not be measured during the same run. So please see the TODOs in poisson2d.sflops.c and poisson2d.vflops.c. By now you should be able to find out the names of the counters by yourself (Hint: they include the words »scalar« and »vector«…).

As usual, compile, test, and bench-run your program.

Back to top

In [4]:
!make bench_task4
bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.sflop.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.sflop.bin.csv
Job <24645> is submitted to default queue <batch>.
<<Waiting for dispatch ...>>
<<Starting on login1>>
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,4,0.0010,96000,480,480
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,8,0.0011,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,12,0.0012,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,16,0.0012,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,20,0.0013,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,24,0.0013,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,28,0.0014,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,32,0.0015,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,36,0.0015,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,40,0.0016,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,44,0.0017,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,48,0.0017,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,52,0.0018,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,56,0.0022,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,60,0.0019,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,64,0.0021,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,68,0.0022,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,72,0.0021,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,76,0.0022,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,80,0.0023,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,84,0.0025,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,88,0.0024,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,92,0.0025,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,96,0.0025,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,100,0.0026,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,104,0.0027,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,108,0.0027,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,112,0.0028,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,116,0.0028,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,120,0.0031,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,124,0.0030,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,128,0.0030,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,132,0.0031,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,136,0.0032,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,140,0.0032,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,144,0.0033,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,148,0.0034,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,152,0.0035,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,156,0.0035,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,160,0.0036,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,164,0.0036,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,168,0.0037,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,172,0.0038,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,176,0.0038,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,180,0.0039,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,184,0.0040,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,188,0.0040,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,192,0.0041,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,196,0.0042,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,200,0.0042,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,204,0.0043,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,208,0.0043,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,212,0.0044,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,216,0.0045,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,220,0.0045,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,224,0.0046,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,228,0.0047,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,232,0.0047,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,236,0.0048,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,240,0.0049,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,244,0.0049,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,248,0.0051,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,252,0.0051,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,256,0.0053,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,260,0.0052,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,264,0.0053,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,268,0.0054,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,272,0.0054,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,276,0.0054,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,280,0.0055,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,284,0.0056,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,288,0.0056,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,292,0.0057,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,296,0.0058,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,300,0.0058,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,304,0.0059,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,308,0.0060,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,312,0.0060,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,316,0.0062,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,320,0.0062,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,324,0.0062,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,328,0.0063,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,332,0.0064,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,336,0.0065,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,340,0.0065,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,344,0.0066,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,348,0.0066,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,352,0.0067,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,356,0.0068,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,360,0.0069,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,364,0.0069,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,368,0.0070,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,372,0.0072,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,376,0.0071,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,380,0.0071,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,384,0.0072,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,388,0.0073,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,392,0.0074,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,396,0.0076,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,400,0.0075,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,404,0.0076,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,408,0.0076,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,412,0.0077,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,416,0.0078,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,420,0.0078,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,424,0.0079,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,428,0.0079,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,432,0.0080,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,436,0.0081,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,440,0.0082,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,444,0.0082,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,448,0.0084,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,452,0.0083,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,456,0.0084,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,460,0.0085,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,464,0.0085,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,468,0.0086,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,472,0.0087,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,476,0.0089,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,480,0.0088,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,484,0.0089,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,488,0.0089,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,492,0.0090,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,496,0.0091,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,500,0.0092,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,504,0.0092,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,508,0.0093,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,512,0.0094,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,516,0.0094,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,520,0.0095,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,524,0.0096,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,528,0.0096,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,532,0.0098,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,536,0.0097,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,540,0.0098,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,544,0.0099,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,548,0.0100,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,552,0.0101,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,556,0.0101,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,560,0.0102,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,564,0.0103,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,568,0.0104,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,572,0.0105,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,576,0.0105,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,580,0.0106,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,584,0.0107,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,588,0.0107,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,592,0.0108,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,596,0.0109,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,600,0.0110,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,604,0.0111,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,608,0.0111,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,612,0.0112,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,616,0.0112,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,620,0.0113,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,624,0.0114,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,628,0.0115,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,632,0.0115,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,636,0.0115,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,640,0.0116,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,644,0.0118,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,648,0.0117,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,652,0.0119,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,656,0.0119,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,660,0.0121,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,664,0.0120,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,668,0.0122,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,672,0.0121,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,676,0.0124,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,680,0.0123,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,684,0.0125,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,688,0.0124,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,692,0.0125,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,696,0.0126,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,700,0.0127,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,704,0.0126,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,708,0.0127,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,712,0.0129,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,716,0.0128,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,720,0.0129,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,724,0.0132,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,728,0.0131,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,732,0.0131,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,736,0.0133,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,740,0.0133,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,744,0.0133,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,748,0.0134,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,752,0.0136,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,756,0.0136,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,760,0.0136,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,764,0.0136,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,768,0.0138,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,772,0.0138,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,776,0.0139,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,780,0.0139,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,784,0.0140,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,788,0.0140,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,792,0.0141,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,796,0.0142,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,800,0.0143,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,804,0.0143,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,808,0.0144,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,812,0.0144,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,816,0.0145,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,820,0.0146,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,824,0.0148,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,828,0.0147,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,832,0.0148,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,836,0.0149,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,840,0.0150,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,844,0.0150,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,848,0.0150,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,852,0.0151,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,856,0.0152,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,860,0.0152,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,864,0.0153,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,868,0.0154,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,872,0.0156,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,876,0.0156,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,880,0.0156,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,884,0.0157,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,888,0.0157,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,892,0.0158,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,896,0.0159,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,900,0.0159,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,904,0.0161,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,908,0.0162,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,912,0.0164,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,916,0.0163,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,920,0.0164,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,924,0.0165,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,928,0.0166,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,932,0.0166,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,936,0.0167,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,940,0.0167,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,944,0.0168,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,948,0.0169,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,952,0.0172,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,956,0.0171,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,960,0.0172,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,964,0.0175,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,968,0.0175,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,972,0.0176,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,976,0.0177,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,980,0.0178,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,984,0.0178,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,988,0.0179,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,992,0.0179,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,996,0.0182,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,1000,0.0181,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,1004,0.0182,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,1008,0.0182,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,1012,0.0184,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,1016,0.0184,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,1020,0.0186,0,0,0
iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max)
200,32,1024,0.0182,0,0,0
mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.sflop.bin.csv .
bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.vflop.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.vflop.bin.csv
Job <24646> is submitted to default queue <batch>.
<<Waiting for dispatch ...>>
<<Starting on login1>>
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,4,0.0010,0,0,0
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,8,0.0011,150000,750,750
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,12,0.0012,246000,1230,1230
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,16,0.0012,342000,1710,1710
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,20,0.0013,438000,2190,2190
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,24,0.0013,534000,2670,2670
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,28,0.0014,630000,3150,3150
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,32,0.0015,726000,3630,3630
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,36,0.0016,822000,4110,4110
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,40,0.0016,918000,4590,4590
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,44,0.0017,1014000,5070,5070
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,48,0.0017,1110000,5550,5550
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,52,0.0018,1206000,6030,6030
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,56,0.0019,1302000,6510,6510
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,60,0.0019,1398000,6990,6990
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,64,0.0020,1494000,7470,7470
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,68,0.0022,1590000,7950,7950
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,72,0.0021,1686000,8430,8430
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,76,0.0022,1782000,8910,8910
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,80,0.0023,1878000,9390,9390
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,84,0.0025,1974000,9870,9870
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,88,0.0024,2070000,10350,10350
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,92,0.0026,2166000,10830,10830
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,96,0.0025,2262000,11310,11310
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,100,0.0026,2358000,11790,11790
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,104,0.0027,2454000,12270,12270
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,108,0.0027,2550000,12750,12750
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,112,0.0029,2646000,13230,13230
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,116,0.0029,2742000,13710,13710
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,120,0.0029,2838000,14190,14190
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,124,0.0030,2934000,14670,14670
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,128,0.0031,3030000,15150,15150
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,132,0.0031,3126000,15630,15630
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,136,0.0032,3222000,16110,16110
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,140,0.0032,3318000,16590,16590
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,144,0.0033,3414000,17070,17070
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,148,0.0036,3510000,17550,17550
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,152,0.0035,3606000,18030,18030
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,156,0.0035,3702000,18510,18510
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,160,0.0036,3798000,18990,18990
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,164,0.0036,3894000,19470,19470
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,168,0.0037,3990000,19950,19950
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,172,0.0038,4086000,20430,20430
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,176,0.0038,4182000,20910,20910
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,180,0.0039,4278000,21390,21390
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,184,0.0040,4374000,21870,21870
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,188,0.0041,4470000,22350,22350
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,192,0.0041,4566000,22830,22830
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,196,0.0042,4662000,23310,23310
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,200,0.0042,4758000,23790,23790
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,204,0.0043,4854000,24270,24270
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,208,0.0044,4950000,24750,24750
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,212,0.0044,5046000,25230,25230
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,216,0.0045,5142000,25710,25710
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,220,0.0046,5238000,26190,26190
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,224,0.0046,5334000,26670,26670
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,228,0.0048,5430000,27150,27150
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,232,0.0049,5526000,27630,27630
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,236,0.0048,5622000,28110,28110
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,240,0.0049,5718000,28590,28590
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,244,0.0049,5814000,29070,29070
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,248,0.0050,5910000,29550,29550
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,252,0.0051,6006000,30030,30030
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,256,0.0051,6102000,30510,30510
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,260,0.0052,6198000,30990,30990
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,264,0.0053,6294000,31470,31470
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,268,0.0054,6390000,31950,31950
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,272,0.0054,6486000,32430,32430
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,276,0.0054,6582000,32910,32910
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,280,0.0055,6678000,33390,33390
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,284,0.0056,6774000,33870,33870
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,288,0.0057,6870000,34350,34350
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,292,0.0057,6966000,34830,34830
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,296,0.0058,7062000,35310,35310
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,300,0.0059,7158000,35790,35790
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,304,0.0059,7254000,36270,36270
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,308,0.0060,7350000,36750,36750
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,312,0.0062,7446000,37230,37230
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,316,0.0061,7542000,37710,37710
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,320,0.0062,7638000,38190,38190
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,324,0.0062,7734000,38670,38670
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,328,0.0063,7830000,39150,39150
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,332,0.0064,7926000,39630,39630
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,336,0.0065,8022000,40110,40110
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,340,0.0065,8118000,40590,40590
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,344,0.0066,8214000,41070,41070
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,348,0.0066,8310000,41550,41550
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,352,0.0067,8406000,42030,42030
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,356,0.0068,8502000,42510,42510
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,360,0.0068,8598000,42990,42990
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,364,0.0069,8694000,43470,43470
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,368,0.0070,8790000,43950,43950
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,372,0.0070,8886000,44430,44430
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,376,0.0071,8982000,44910,44910
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,380,0.0072,9078000,45390,45390
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,384,0.0072,9174000,45870,45870
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,388,0.0073,9270000,46350,46350
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,392,0.0074,9366000,46830,46830
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,396,0.0074,9462000,47310,47310
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,400,0.0075,9558000,47790,47790
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,404,0.0075,9654000,48270,48270
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,408,0.0076,9750000,48750,48750
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,412,0.0077,9846000,49230,49230
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,416,0.0079,9942000,49710,49710
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,420,0.0078,10038000,50190,50190
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,424,0.0080,10134000,50670,50670
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,428,0.0080,10230000,51150,51150
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,432,0.0080,10326000,51630,51630
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,436,0.0083,10422000,52110,52110
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,440,0.0082,10518000,52590,52590
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,444,0.0083,10614000,53070,53070
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,448,0.0083,10710000,53550,53550
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,452,0.0083,10806000,54030,54030
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,456,0.0084,10902000,54510,54510
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,460,0.0085,10998000,54990,54990
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,464,0.0085,11094000,55470,55470
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,468,0.0086,11190000,55950,55950
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,472,0.0087,11286000,56430,56430
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,476,0.0087,11382000,56910,56910
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,480,0.0088,11478000,57390,57390
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,484,0.0089,11574000,57870,57870
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,488,0.0089,11670000,58350,58350
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,492,0.0091,11766000,58830,58830
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,496,0.0091,11862000,59310,59310
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,500,0.0091,11958000,59790,59790
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,504,0.0092,12054000,60270,60270
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,508,0.0093,12150000,60750,60750
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,512,0.0094,12246000,61230,61230
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,516,0.0096,12342000,61710,61710
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,520,0.0096,12438000,62190,62190
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,524,0.0095,12534000,62670,62670
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,528,0.0098,12630000,63150,63150
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,532,0.0097,12726000,63630,63630
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,536,0.0097,12822000,64110,64110
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,540,0.0098,12918000,64590,64590
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,544,0.0100,13014000,65070,65070
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,548,0.0102,13110000,65550,65550
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,552,0.0102,13206000,66030,66030
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,556,0.0101,13302000,66510,66510
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,560,0.0103,13398000,66990,66990
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,564,0.0103,13494000,67470,67470
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,568,0.0104,13590000,67950,67950
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,572,0.0105,13686000,68430,68430
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,576,0.0105,13782000,68910,68910
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,580,0.0107,13878000,69390,69390
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,584,0.0108,13974000,69870,69870
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,588,0.0107,14070000,70350,70350
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,592,0.0108,14166000,70830,70830
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,596,0.0109,14262000,71310,71310
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,600,0.0110,14358000,71790,71790
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,604,0.0110,14454000,72270,72270
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,608,0.0111,14550000,72750,72750
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,612,0.0114,14646000,73230,73230
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,616,0.0112,14742000,73710,73710
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,620,0.0113,14838000,74190,74190
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,624,0.0114,14934000,74670,74670
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,628,0.0116,15030000,75150,75150
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,632,0.0115,15126000,75630,75630
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,636,0.0117,15222000,76110,76110
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,640,0.0116,15318000,76590,76590
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,644,0.0118,15414000,77070,77070
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,648,0.0117,15510000,77550,77550
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,652,0.0119,15606000,78030,78030
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,656,0.0119,15702000,78510,78510
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,660,0.0120,15798000,78990,78990
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,664,0.0120,15894000,79470,79470
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,668,0.0121,15990000,79950,79950
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,672,0.0121,16086000,80430,80430
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,676,0.0123,16182000,80910,80910
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,680,0.0122,16278000,81390,81390
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,684,0.0125,16374000,81870,81870
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,688,0.0124,16470000,82350,82350
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,692,0.0126,16566000,82830,82830
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,696,0.0125,16662000,83310,83310
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,700,0.0127,16758000,83790,83790
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,704,0.0128,16854000,84270,84270
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,708,0.0128,16950000,84750,84750
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,712,0.0128,17046000,85230,85230
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,716,0.0128,17142000,85710,85710
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,720,0.0129,17238000,86190,86190
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,724,0.0130,17334000,86670,86670
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,728,0.0130,17430000,87150,87150
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,732,0.0132,17526000,87630,87630
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,736,0.0132,17622000,88110,88110
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,740,0.0133,17718000,88590,88590
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,744,0.0133,17814000,89070,89070
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,748,0.0134,17910000,89550,89550
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,752,0.0134,18006000,90030,90030
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,756,0.0136,18102000,90510,90510
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,760,0.0136,18198000,90990,90990
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,764,0.0136,18294000,91470,91470
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,768,0.0137,18390000,91950,91950
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,772,0.0139,18486000,92430,92430
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,776,0.0139,18582000,92910,92910
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,780,0.0139,18678000,93390,93390
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,784,0.0140,18774000,93870,93870
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,788,0.0140,18870000,94350,94350
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,792,0.0142,18966000,94830,94830
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,796,0.0142,19062000,95310,95310
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,800,0.0144,19158000,95790,95790
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,804,0.0143,19254000,96270,96270
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,808,0.0144,19350000,96750,96750
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,812,0.0145,19446000,97230,97230
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,816,0.0145,19542000,97710,97710
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,820,0.0146,19638000,98190,98190
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,824,0.0147,19734000,98670,98670
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,828,0.0147,19830000,99150,99150
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,832,0.0148,19926000,99630,99630
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,836,0.0151,20022000,100110,100110
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,840,0.0150,20118000,100590,100590
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,844,0.0150,20214000,101070,101070
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,848,0.0151,20310000,101550,101550
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,852,0.0152,20406000,102030,102030
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,856,0.0152,20502000,102510,102510
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,860,0.0152,20598000,102990,102990
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,864,0.0153,20694000,103470,103470
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,868,0.0154,20790000,103950,103950
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,872,0.0155,20886000,104430,104430
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,876,0.0155,20982000,104910,104910
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,880,0.0157,21078000,105390,105390
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,884,0.0157,21174000,105870,105870
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,888,0.0158,21270000,106350,106350
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,892,0.0158,21366000,106830,106830
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,896,0.0159,21462000,107310,107310
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,900,0.0161,21558000,107790,107790
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,904,0.0162,21654000,108270,108270
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,908,0.0161,21750000,108750,108750
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,912,0.0163,21846000,109230,109230
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,916,0.0164,21942000,109710,109710
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,920,0.0165,22038000,110190,110190
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,924,0.0164,22134000,110670,110670
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,928,0.0166,22230000,111150,111150
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,932,0.0166,22326000,111630,111630
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,936,0.0167,22422000,112110,112110
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,940,0.0168,22518000,112590,112590
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,944,0.0168,22614000,113070,113070
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,948,0.0169,22710000,113550,113550
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,952,0.0170,22806000,114030,114030
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,956,0.0170,22902000,114510,114510
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,960,0.0171,22998000,114990,114990
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,964,0.0176,23094000,115470,115470
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,968,0.0176,23190000,115950,115950
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,972,0.0177,23286000,116430,116430
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,976,0.0177,23382000,116910,116910
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,980,0.0178,23478000,117390,117390
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,984,0.0178,23574000,117870,117870
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,988,0.0179,23670000,118350,118350
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,992,0.0180,23766000,118830,118830
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,996,0.0181,23862000,119310,119310
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,1000,0.0182,23958000,119790,119790
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,1004,0.0182,24054000,120270,120270
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,1008,0.0182,24150000,120750,120750
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,1012,0.0184,24246000,121230,121230
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,1016,0.0185,24342000,121710,121710
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,1020,0.0184,24438000,122190,122190
iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max)
200,32,1024,0.0182,24534000,122670,122670
mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.vflop.bin.csv .
In [39]:
df_sflop = pd.read_csv("poisson2d.sflop.bin.csv", skiprows=range(2, 50000, 2))
df_vflop = pd.read_csv("poisson2d.vflop.bin.csv", skiprows=range(2, 50000, 2))
df_flop = pd.concat([df_sflop.set_index("nx"), df_vflop.set_index("nx")[['PM_VECTOR_FLOP_CMPL (total)', 'PM_VECTOR_FLOP_CMPL (min)', ' PM_VECTOR_FLOP_CMPL (max)']]], axis=1).reset_index()
df_flop.head()
Out[39]:
nx iter ny Runtime PM_SCALAR_FLOP_CMPL (total) PM_SCALAR_FLOP_CMPL (min) PM_SCALAR_FLOP_CMPL (max) PM_VECTOR_FLOP_CMPL (total) PM_VECTOR_FLOP_CMPL (min) PM_VECTOR_FLOP_CMPL (max)
0 4 200 32 0.0010 96000 480 480 0 0 0
1 8 200 32 0.0011 0 0 0 150000 750 750
2 12 200 32 0.0012 0 0 0 246000 1230 1230
3 16 200 32 0.0012 0 0 0 342000 1710 1710
4 20 200 32 0.0013 0 0 0 438000 2190 2190

Again, the name of the vector counter is a bit misleading; not floating point operations are measured but floating point instructions. To get real floating point operations, each value needs to be multiplied by the vector width (2). We can plot the values afterwards (non-interactive: make graph_task4).

In [40]:
df_flop["Grid Points"] = df_flop["nx"] * df_flop["ny"]
df_flop["Vector FlOps (min)"] = df_flop["PM_VECTOR_FLOP_CMPL (min)"] * 2
df_flop["Scalar FlOps (min)"] = df_flop["PM_SCALAR_FLOP_CMPL (min)"]
In [41]:
df_flop.set_index("Grid Points")[["Scalar FlOps (min)", "Vector FlOps (min)"]].plot();
In [43]:
_fit, _cov = common.print_and_return_fit(
    ["Scalar FlOps (min)", "Vector FlOps (min)"], 
    df_flop.set_index("Grid Points"), 
    linear_function
)
fit_parameters = {**fit_parameters, **_fit}
fit_covariance = {**fit_covariance, **_cov}
Counter Scalar FlOps (min) is proportional to the grid points (nx*ny) by a factor of -0.0003 (± 0.0002)
Counter Vector FlOps (min) is proportional to the grid points (nx*ny) by a factor of  7.5004 (± 0.0002)

Interesting! We seem to be using the vector registers of our system very well. Basically all operations are vector operations!

With that measured, we can determine the Arithmetic Intensity; the balance of floating point operations to bytes transmitted:

\begin{align} \text{AI}^\text{emp} = I_\text{flop} / I_\text{mem} \text{,} \end{align}

with $I$ denoting the respective amount. This is the emperically determined Arithmetic Intensity.

In the non-interactive version of the Notebook, please plot the graph calling make graph_task4-2 in the terminal.

In [56]:
I_flop_scalar = df_flop.set_index("Grid Points")["Scalar FlOps (min)"]
I_flop_vector = df_flop.set_index("Grid Points")["Vector FlOps (min)"]
I_mem_load    = df_byte["Loads"]
I_mem_store   = df_byte["Stores"]
In [57]:
df_ai = pd.DataFrame()
df_ai["Arithmetic Intensity"] = (I_flop_scalar + I_flop_vector) / (I_mem_load + I_mem_store)
ax = df_ai.plot();
ax.set_ylabel("Byte/FlOp");

Thinking back to the first lecture of the tutorial, what Arithemtic Intensity did you expect?

Task E2: Measuring a Larger Range

If you still still have time, you might venture into your own benchmarking adventure.

Maybe you noticed already, for instance in Task 2 C: At the very right to very large numbers of grid points, the behaviour of the graph changed. Something is happening there!

TASK: Revisit the counters measured above for a larger range of nx. Right now, we only studied nx until 1000. New effects appear above that value – partly only well above, though ($nx > 15000$).

You're on your own here. Edit the bench.sh script to change the range and the stepping increments.

Good luck!

Back to top