This Notebook is part of the exercises for the SC19 Tutorial »Application Porting and Optimization on GPU-accelerated POWER Architectures«. It is to be run on a POWER9 machine; in the tutorial: on Ascent, the POWER9 training cluster of Oak Ridge National Lab.
This Notebook can be run interactively on Ascent. If this capability is unavailable to you, use it as a description for executing the tasks on Ascent via a shell access. During data evaluation, the Notebook mentions the corresponding commands to execute in case you are not able to run the Notebook interactively directly on Ascent.
Throughout this exercise, the core loop of the Jacobi algorithm is instrumented and analyzed. The part in question is
for (int iy = iy_start; iy < iy_end; iy++)
{
for( int ix = ix_start; ix < ix_end; ix++ )
{
Anew[iy*nx+ix] = -0.25 * (rhs[iy*nx+ix] - (A[ iy *nx+ix+1] + A[ iy *nx+ix-1]
+ A[(iy-1)*nx+ix ] + A[(iy+1)*nx+ix ]));
error = fmaxr( error, fabsr(Anew[iy*nx+ix]-A[iy*nx+ix]));
}
}
The code is instrumented using PAPI. The API routine PAPI_add_named_event()
is used to add named PMU events outside of the relaxation iteration. After that, calls to PAPI_start()
and PAPI_stop()
can be used to count how often a PMU event is incremented.
For the first task, we will measure quantities often used to characterize an application: cycles and instructions.
TASK: Please measure counters for completed instructions and run cycles. See the TODOs in file poisson2d.ins_cyc.c
. You can either edit the files with Jupyter capabilities by clicking on the link of the file or selecting it in the file drawer on the left; or use a dedicated editor on the system(vim
is available). The names of the counters to be implemented are PM_INST_CMPL
and PM_RUN_CYC
.
After changing the source code, compile it with make task1
or by executing the following cell (we need to change directories first, though).
(Using the Makefile
we have hidden quite a few intricacies from you in order to focus on the relevant content at hand. Don't worry too much about it right now – we'll un-hide it gradually during the course of the tutorial.)
!pwd
/autofs/nccsopen-svm1_home/aherten/OpenPOWER-SC19/Prototyping/2-Performance_Counters/Handson/Solutions
%cd Tasks/
# Use `%cd Solutions` to look at the solutions for each task
/autofs/nccsopen-svm1_home/aherten/OpenPOWER-SC19/2-PAPI/Compiling/Solutions
!make task1
gcc -DUSE_DOUBLE -Ofast -std=c99 -lm -lpapi poisson2d.ins_cyc.c -o poisson2d.ins_cyc.bin
Before we launch our measurement campaign we should make sure that the program is measuring correctly. Let's invoking it, for instance, with these arguments: ./poisson2d.ins_cyc.bin 100 64 32
– see the next cell. The 100
specifies the number of iterations to perform, 64
and 32
are the size of the grid in y and x direction, respectively.
!./poisson2d.ins_cyc.bin 100 64 32
# alternatively call !make run_task1
iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 100,64,32,0.0011,3324225,33235,33960,1859440,18357,25033
Alright! That should return a comma-seperated list of measurements.
For the following runs, we are going to use Ascent's compute backend nodes which are not shared amongst users and also have six GPUs available (each!). We use the available batch scheduler IBM Spectrum LSF for this. For convenience, a call to the batch submission system is stored in the environment variable $SC19_SUBMIT_CMD
. You are welcome to adapt it once you get more familiar with the system.
For now, we want to run our first benchmarking run and measure cycles and instructions for different data sizes, as a function of nx
. The Makefile holds a target for this, call it with make bench_task1
:
!make bench_task1
bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.ins_cyc.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.ins_cyc.bin.csv Job <24059> is submitted to default queue <batch>. <<Waiting for dispatch ...>> <<Starting on login1>> iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,4,0.0012,572978,2861,3639,261330,1235,4684 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,8,0.0014,1082978,5411,6189,601962,2914,5099 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,12,0.0014,1442978,7211,7989,811603,3992,5761 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,16,0.0014,1802978,9011,9789,1017305,4988,7017 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,20,0.0015,2162978,10811,11589,1221559,6002,7999 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,24,0.0016,2522978,12611,13389,1435167,7037,9259 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,28,0.0016,2882978,14411,15189,1633061,8054,9789 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,32,0.0017,3242978,16211,16989,1842895,9092,10889 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,36,0.0018,3602978,18011,18789,2042894,10108,12457 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,40,0.0019,3962978,19811,20589,2261332,11191,14233 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,44,0.0020,4322978,21611,22389,2458267,12112,14375 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,48,0.0020,4682978,23411,24189,2658621,13164,15613 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,52,0.0020,5042978,25211,25989,2866175,14190,16864 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,56,0.0021,5402978,27011,27789,3080357,15237,21565 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,60,0.0022,5762978,28811,29589,3283103,16278,18799 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,64,0.0022,6122978,30611,31389,3587582,17820,19681 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,68,0.0025,6482978,32411,33189,3893368,19284,20847 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,72,0.0025,6842978,34211,34989,4289441,21278,22715 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,76,0.0024,7202978,36011,36789,4208700,20936,22677 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,80,0.0025,7562978,37811,38589,4409613,21897,23855 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,84,0.0026,7922978,39611,40389,4611755,22921,24910 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,88,0.0026,8282978,41411,42189,4821904,23974,26087 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,92,0.0028,8642978,43211,43989,5104722,25036,38488 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,96,0.0028,9002978,45011,45789,5238952,26060,27927 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,100,0.0028,9362978,46811,47589,5441545,27049,29275 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,104,0.0030,9722978,48611,49389,5920763,28136,72679 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,108,0.0030,10082978,50411,51189,5853554,29106,31403 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,112,0.0030,10442978,52211,52989,6053498,30123,32279 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,116,0.0031,10802978,54011,54789,6296056,31338,33377 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,120,0.0033,11162978,55811,56589,6468115,32146,33869 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,124,0.0032,11522978,57611,58389,6675248,33233,35075 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,128,0.0033,11882978,59411,60189,6894325,34338,36207 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,132,0.0034,12242978,61211,61989,7093543,35299,37463 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,136,0.0034,12602978,63011,63789,7312105,36353,48105 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,140,0.0035,12962978,64811,65589,7503757,37375,39247 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,144,0.0036,13322978,66611,67389,7692611,38277,40419 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,148,0.0037,13682978,68411,69189,7968094,39656,42113 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,152,0.0037,14042978,70211,70989,8122466,40468,42706 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,156,0.0038,14402978,72011,72789,8328043,41484,45104 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,160,0.0040,14762978,73811,74589,8547674,42493,54216 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,164,0.0039,15122978,75611,76389,8738805,43542,45427 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,168,0.0040,15482978,77411,78189,8948025,44560,46819 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,172,0.0040,15842978,79211,79989,9186567,45735,47659 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,176,0.0041,16202978,81011,81789,9391949,46573,70131 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,180,0.0042,16562978,82811,83589,9549568,47559,54271 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,184,0.0042,16922978,84611,85389,9766306,48609,58645 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,188,0.0043,17282978,86411,87189,9974165,49613,56721 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,192,0.0044,17642978,88211,88989,10187263,50734,52953 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,196,0.0044,18002978,90011,90789,10386920,51763,53773 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,200,0.0045,18362978,91811,92589,10593326,52744,54962 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,204,0.0045,18722978,93611,94389,10791966,53796,55775 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,208,0.0046,19082978,95411,96189,10993938,54691,56692 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,212,0.0047,19442978,97211,97989,11183564,55716,57663 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,216,0.0047,19802978,99011,99789,11413409,56842,65317 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,220,0.0049,20162978,100811,101589,11747337,57952,85917 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,224,0.0049,20522978,102611,103389,11967444,58993,147575 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,228,0.0050,20882978,104411,105189,12176974,59986,107137 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,232,0.0051,21242978,106211,106989,12243039,61011,62843 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,236,0.0051,21602978,108011,108789,12454738,61985,74677 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,240,0.0051,21962978,109811,110589,12632612,62912,64911 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,244,0.0052,22322978,111611,112389,12844679,63954,74316 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,248,0.0053,22682978,113411,114189,13049050,65048,67067 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,252,0.0054,23042978,115211,115989,13274577,66113,68093 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,256,0.0054,23402978,117011,117789,13479975,67191,69232 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,260,0.0055,23762978,118811,119589,13702476,68321,70257 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,264,0.0055,24122978,120611,121389,13885554,69178,71473 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,268,0.0056,24482978,122411,123189,14091173,70236,72538 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,272,0.0057,24842978,124211,124989,14277355,71142,73153 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,276,0.0057,25202978,126011,126789,14477479,72149,74585 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,280,0.0058,25562978,127811,128589,14807542,73365,106386 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,284,0.0059,25922978,129611,130389,14919273,74349,83988 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,288,0.0060,26282978,131411,132189,15262342,75369,108903 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,292,0.0061,26642978,133211,133989,15457489,76550,112579 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,296,0.0061,27002978,135011,135789,15587890,77470,113796 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,300,0.0063,27362978,136811,137589,15736737,78474,80976 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,304,0.0062,27722978,138611,139389,15931699,79424,85309 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,308,0.0064,28082978,140411,141189,16127895,80426,82181 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,312,0.0063,28442978,142211,142989,16353667,81487,91316 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,316,0.0064,28802978,144011,144789,16544730,82526,84583 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,320,0.0064,29162978,145811,146589,16778054,83692,85621 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,324,0.0065,29522978,147611,148389,16975790,84670,86933 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,328,0.0066,29882978,149411,150189,17193806,85651,95908 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,332,0.0067,30242978,151211,151989,17391042,86658,92746 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,336,0.0067,30602978,153011,153789,17579650,87566,101073 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,340,0.0068,30962978,154811,155589,17823659,88601,131503 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,344,0.0069,31322978,156611,157389,18045749,89720,131352 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,348,0.0069,31682978,158411,159189,18233228,90790,129666 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,352,0.0070,32042978,160211,160989,18429938,91908,93827 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,356,0.0071,32402978,162011,162789,18723870,92891,169000 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,360,0.0071,32762978,163811,164589,18839189,93872,104313 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,364,0.0072,33122978,165611,166389,19052230,94828,108456 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,368,0.0072,33482978,167411,168189,19224348,95828,106832 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,372,0.0073,33842978,169211,169989,19409746,96825,98825 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,376,0.0074,34202978,171011,171789,19635914,97934,100015 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,380,0.0075,34562978,172811,173589,19901265,99194,108856 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,384,0.0075,34922978,174611,175389,20087150,100132,113306 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,388,0.0076,35282978,176411,177189,20289560,101187,111225 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,392,0.0076,35642978,178211,178989,20478069,102158,104431 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,396,0.0077,36002978,180011,180789,20703541,103136,118462 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,400,0.0078,36362978,181811,182589,20889687,104097,116051 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,404,0.0078,36722978,183611,184389,21103371,105019,150497 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,408,0.0079,37082978,185411,186189,21343392,106235,146574 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,412,0.0080,37442978,187211,187989,21499750,107213,116228 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,416,0.0081,37802978,189011,189789,21769516,108354,153304 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,420,0.0082,38162978,190811,191589,22016040,109333,166344 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,424,0.0082,38522978,192611,193389,22124948,110298,112586 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,428,0.0083,38882978,194411,195189,22375892,111391,164691 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,432,0.0083,39242978,196211,196989,22605417,112244,161120 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,436,0.0084,39602978,198011,198789,22698406,113231,115888 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,440,0.0084,39962978,199811,200589,22946025,114347,124840 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,444,0.0085,40322978,201611,202389,23138571,115404,122324 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,448,0.0086,40682978,203411,204189,23382319,116666,118990 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,452,0.0086,41042978,205211,205989,23582320,117634,123005 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,456,0.0087,41402978,207011,207789,23777586,118606,121054 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,460,0.0088,41762978,208811,209589,24021078,119638,157473 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,464,0.0089,42122978,210611,211389,24177273,120536,137152 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,468,0.0089,42482978,212411,213189,24354431,121510,124378 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,472,0.0090,42842978,214211,214989,24680874,122798,163001 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,476,0.0092,43202978,216011,216789,24806941,123695,126112 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,480,0.0091,43562978,217811,218589,25036974,124855,131240 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,484,0.0092,43922978,219611,220389,25277560,125834,159926 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,488,0.0093,44282978,221411,222189,25492002,126931,169890 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,492,0.0094,44642978,223211,223989,25799993,127811,292316 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,496,0.0094,45002978,225011,225789,25879076,128748,186367 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,500,0.0094,45362978,226811,227589,26021482,129705,143377 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,504,0.0095,45722978,228611,229389,26309697,130875,185497 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,508,0.0096,46082978,230411,231189,26445482,131853,134810 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,512,0.0097,46442978,232211,232989,26722882,133313,135480 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,516,0.0097,46802978,234011,234789,26902984,134116,143429 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,520,0.0098,47162978,235811,236589,27143327,135173,182663 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,524,0.0101,47522978,237611,238389,27899728,139067,143412 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,528,0.0099,47882978,239411,240189,27539695,137281,153792 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,532,0.0100,48242978,241211,241989,27665652,137957,156345 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,536,0.0102,48602978,243011,243789,27888664,139123,142069 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,540,0.0102,48962978,244811,245589,28116288,140162,167093 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,544,0.0102,49322978,246611,247389,28395864,141365,191687 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,548,0.0105,49682978,248411,249189,28539300,142352,144923 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,552,0.0104,50042978,250211,250989,28772000,143499,153080 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,556,0.0104,50402978,252011,252789,28943938,144344,160802 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,560,0.0105,50762978,253811,254589,29192011,145318,205574 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,564,0.0106,51122978,255611,256389,29371768,146296,173660 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,568,0.0107,51482978,257411,258189,29607085,147402,185216 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,572,0.0109,51842978,259211,259989,29760468,148529,150992 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,576,0.0108,52202978,261011,261789,30001693,149671,152448 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,580,0.0109,52562978,262811,263589,30194219,150474,161954 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,584,0.0110,52922978,264611,265389,30465237,151575,196784 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,588,0.0112,53282978,266411,267189,30866027,152658,345805 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,592,0.0112,53642978,268211,268989,30806266,153631,162459 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,596,0.0112,54002978,270011,270789,31013348,154624,161083 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,600,0.0113,54362978,271811,272589,31227644,155782,158034 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,604,0.0115,54722978,273611,274389,31534633,156837,219588 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,608,0.0114,55082978,275411,276189,31675474,157869,168332 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,612,0.0115,55442978,277211,277989,31953436,158989,218652 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,616,0.0116,55802978,279011,279789,32108644,160138,180416 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,620,0.0116,56162978,280811,281589,32277424,160849,182393 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,624,0.0118,56522978,282611,283389,32423394,161797,164245 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,628,0.0117,56882978,284411,285189,32609412,162678,167394 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,632,0.0118,57242978,286211,286989,32869379,163975,168634 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,636,0.0119,57602978,288011,288789,33151217,165037,223167 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,640,0.0119,57962978,289811,290589,33341299,166215,181218 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,644,0.0121,58322978,291611,292389,33649260,167751,199967 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,648,0.0121,58682978,293411,294189,33719599,168221,178799 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,652,0.0122,59042978,295211,295989,34067206,169536,235514 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,656,0.0122,59402978,297011,297789,34164102,170144,235618 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,660,0.0123,59762978,298811,299589,34456636,171594,235316 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,664,0.0124,60122978,300611,301389,34541178,172177,211827 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,668,0.0124,60482978,302411,303189,34905159,173832,222673 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,672,0.0126,60842978,304211,304989,34988298,174422,188003 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,676,0.0126,61202978,306011,306789,35263092,175911,185984 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,680,0.0127,61562978,307811,308589,35503073,176323,305860 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,684,0.0128,61922978,309611,310389,35672483,178036,180851 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,688,0.0128,62282978,311411,312189,35790039,178289,217803 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,692,0.0128,62642978,313211,313989,36045752,179866,188983 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,696,0.0130,63002978,315011,315789,36175144,180438,195986 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,700,0.0131,63362978,316811,317589,36529049,182248,184897 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,704,0.0130,63722978,318611,319389,36611747,182765,185703 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,708,0.0130,64082978,320411,321189,36811496,183626,191140 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,712,0.0131,64442978,322211,322989,37060383,184588,255521 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,716,0.0132,64802978,324011,324789,37267356,185684,240236 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,720,0.0132,65162978,325811,326589,37393434,186562,204926 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,724,0.0133,65522978,327611,328389,37611724,187635,203956 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,728,0.0135,65882978,329411,330189,37844476,188685,217329 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,732,0.0136,66242978,331211,331989,38097715,189879,238003 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,736,0.0136,66602978,333011,333789,38249665,190960,193797 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,740,0.0137,66962978,334811,335589,38496135,191882,202980 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,744,0.0136,67322978,336611,337389,38643004,192776,211409 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,748,0.0138,67682978,338411,339189,38834497,193752,204307 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,752,0.0139,68042978,340211,340989,39026422,194674,207102 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,756,0.0139,68402978,342011,342789,39292510,195755,242534 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,760,0.0140,68762978,343811,344589,39445808,196904,199749 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,764,0.0140,69122978,345611,346389,39707448,198140,208159 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,768,0.0141,69482978,347411,348189,39961335,199314,213386 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,772,0.0142,69842978,349211,349989,40195551,200268,262442 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,776,0.0143,70202978,351011,351789,40369481,201262,243178 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,780,0.0143,70562978,352811,353589,40454251,201889,204769 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,784,0.0143,70922978,354611,355389,40804167,203132,292206 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,788,0.0144,71282978,356411,357189,40880258,203888,220805 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,792,0.0145,71642978,358211,358989,41141375,205195,222680 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,796,0.0145,72002978,360011,360789,41346667,205890,276619 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,800,0.0146,72362978,361811,362589,41586665,207290,248916 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,804,0.0147,72722978,363611,364389,41696398,208106,211465 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,808,0.0148,73082978,365411,366189,41978951,209272,255137 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,812,0.0148,73442978,367211,367989,42187366,209918,283393 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,816,0.0149,73802978,369011,369789,42482639,211214,322437 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,820,0.0149,74162978,370811,371589,42512865,212010,227823 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,824,0.0151,74522978,372611,373389,42861251,213412,278868 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,828,0.0151,74882978,374411,375189,42979335,214191,262439 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,832,0.0152,75242978,376211,376989,43402619,215543,296991 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,836,0.0152,75602978,378011,378789,43382253,216450,232179 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,840,0.0154,75962978,379811,380589,43665001,217538,261020 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,844,0.0154,76322978,381611,382389,43762162,218196,232967 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,848,0.0156,76682978,383411,384189,44077885,219619,233562 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,852,0.0155,77042978,385211,385989,44269902,220266,357562 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,856,0.0156,77402978,387011,387789,44458368,221658,275183 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,860,0.0156,77762978,388811,389589,44599845,222530,244104 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,864,0.0158,78122978,390611,391389,44856987,223898,229495 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,868,0.0157,78482978,392411,393189,45070339,224667,268426 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,872,0.0158,78842978,394211,394989,45243346,225686,238504 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,876,0.0160,79202978,396011,396789,45425044,226467,285843 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,880,0.0160,79562978,397811,398589,45637897,227585,255503 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,884,0.0163,79922978,399611,400389,45922301,228540,294854 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,888,0.0161,80282978,401411,402189,46210377,229936,317062 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,892,0.0161,80642978,403211,403989,46224897,230736,244030 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,896,0.0163,81002978,405011,405789,46706945,232252,393574 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,900,0.0163,81362978,406811,407589,46846573,233803,243774 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,904,0.0165,81722978,408611,409389,47211102,235424,247115 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,908,0.0165,82082978,410411,411189,47420647,236067,308146 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,912,0.0167,82442978,412211,412989,47664515,237299,252663 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,916,0.0166,82802978,414011,414789,47825500,238210,307878 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,920,0.0168,83162978,415811,416589,48024315,239591,249230 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,924,0.0168,83522978,417611,418389,48204506,240348,286103 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,928,0.0168,83882978,419411,420189,48474452,241766,272232 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,932,0.0169,84242978,421211,421989,48643328,242408,310910 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,936,0.0170,84602978,423011,423789,49041567,243670,350571 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,940,0.0171,84962978,424811,425589,49009612,244295,313509 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,944,0.0171,85322978,426611,427389,49257311,245620,259650 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,948,0.0172,85682978,428411,429189,49415667,246533,254714 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,952,0.0172,86042978,430211,430989,49711139,247671,319628 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,956,0.0174,86402978,432011,432789,49856592,248552,271876 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,960,0.0174,86762978,433811,434589,50136102,249978,265617 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,964,0.0176,87122978,435611,436389,50925446,253713,295499 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,968,0.0178,87482978,437411,438189,51035835,253858,318894 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,972,0.0177,87842978,439211,439989,51188317,255334,306288 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,976,0.0178,88202978,441011,441789,51436023,256205,289239 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,980,0.0179,88562978,442811,443589,51703656,257814,300077 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,984,0.0179,88922978,444611,445389,51801305,257947,349721 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,988,0.0181,89282978,446411,447189,52056854,259676,262216 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,992,0.0182,89642978,448211,448989,52237864,260535,269494 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,996,0.0183,90002978,450011,450789,52526126,262024,274178 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,1000,0.0182,90362978,451811,452589,52578843,262284,265526 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,1004,0.0183,90722978,453611,454389,52896370,263840,273834 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,1008,0.0183,91082978,455411,456189,53074476,264385,308471 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,1012,0.0184,91442978,457211,457989,53382079,266422,284446 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,1016,0.0186,91802978,459011,459789,53434221,266486,275700 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,1020,0.0186,92162978,460811,461589,53712164,268036,277528 iter,ny,nx,Runtime,PM_INST_CMPL (total),PM_INST_CMPL (min), PM_INST_CMPL (max),PM_RUN_CYC (total),PM_RUN_CYC (min), PM_RUN_CYC (max) 200,32,1024,0.0187,92522978,462611,463389,53754294,268076,276795 mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.ins_cyc.bin.csv .
Once the run is completed, let's study the data!
This can be done best in the interactive version of the Jupyter Notebook. In case this version of the description is unavailable to you, call the Makefile target make graph_task1
(either with X forwarding, or download the resulting PDF).
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import common
%matplotlib inline
sns.set()
plt.rcParams['figure.figsize'] = [14, 6]
Execute the following cell if you want to switch to color-blind-safer colors
sns.set_palette("colorblind")
plt.rcParams['figure.figsize'] = [14, 6]
df = pd.read_csv("poisson2d.ins_cyc.bin.csv", skiprows=range(2, 50000, 2)) # Read in the CSV file from the bench run; parse with Pandas
df["Grid Points"] = df["nx"] * df["ny"] # Add a new column of the number of grid points (the product of nx and ny)
df.head() # Display the head of the Pandas dataframe
iter | ny | nx | Runtime | PM_INST_CMPL (total) | PM_INST_CMPL (min) | PM_INST_CMPL (max) | PM_RUN_CYC (total) | PM_RUN_CYC (min) | PM_RUN_CYC (max) | Grid Points | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 200 | 32 | 4 | 0.0012 | 572978 | 2861 | 3639 | 261330 | 1235 | 4684 | 128 |
1 | 200 | 32 | 8 | 0.0014 | 1082978 | 5411 | 6189 | 601962 | 2914 | 5099 | 256 |
2 | 200 | 32 | 12 | 0.0014 | 1442978 | 7211 | 7989 | 811603 | 3992 | 5761 | 384 |
3 | 200 | 32 | 16 | 0.0014 | 1802978 | 9011 | 9789 | 1017305 | 4988 | 7017 | 512 |
4 | 200 | 32 | 20 | 0.0015 | 2162978 | 10811 | 11589 | 1221559 | 6002 | 7999 | 640 |
Let's have a look at the counters we've just measured and see how they scaling with increasing number of grid points.
In the following, we are always using the minimal value of the counter (indicated by »(min)«) as this should give us an estimate of the best achievable result of the architecture.
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
df.set_index("Grid Points")["PM_RUN_CYC (min)"].plot(ax=ax1, legend=True);
df.set_index("Grid Points")["PM_INST_CMPL (min)"].plot(ax=ax2, legend=True);
Although some slight variations can be seen for run cycles for many grid points, the correlation looks quite linear (as one would naively expect). Let's test that by fitting a linear function!
The details of the fitting have been extracted into dedicated function, print_and_return_fit()
, of the common.py
helper file. If you're interested, go have a look at it.
def linear_function(x, a, b):
return a*x+b
fit_parameters, fit_covariance = common.print_and_return_fit(
["PM_RUN_CYC (min)", "PM_INST_CMPL (min)"],
df.set_index("Grid Points"),
linear_function,
format_uncertainty=".4f"
)
Counter PM_RUN_CYC (min) is proportional to the grid points (nx*ny) by a factor of 8.1021 (± 0.0057) Counter PM_INST_CMPL (min) is proportional to the grid points (nx*ny) by a factor of 14.0630 (± 0.0003)
Let's overlay our fits to the graphs from before.
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
for ax, pmu_counter in zip([ax1, ax2], ["PM_RUN_CYC (min)", "PM_INST_CMPL (min)"]):
df.set_index("Grid Points")[pmu_counter].plot(ax=ax, legend=True);
ax.plot(
df["Grid Points"],
linear_function(df["Grid Points"], *fit_parameters[pmu_counter]),
linestyle="--",
label="Fit: {:.2f} * x + {:.2f}".format(*fit_parameters[pmu_counter])
)
ax.legend();
Please execute the next cell to summarize the first task.
print("The algorithm under investigation runs about {:.0f} cycles and executes about {:.0f} instructions per grid point".format(
*[fit_parameters[pmu_counter][0] for pmu_counter in ["PM_RUN_CYC (min)", "PM_INST_CMPL (min)"]]
))
The algorithm under investigation runs about 8 cycles and executes about 14 instructions per grid point
Bonus:
The linear fits also calculate a y intersection (»b
«). How do you interpret this value?
The y axis intersection; that is, b
of the linear fit, is the inherent overhead of the program execution. Even if our program would not compute any stencil operation at all for any grid point, it would still complete this many (~1800) instructions and run this many (~680) cycles. Interestingly, it is also the unparallelizable overhead of this (toy) example.
We are revisiting the graph in a little while.
Looking at the source code, how many loads and stores from / to memory do you expect? Have a look at the loop which we instrumented.
Let's compare your estimate to what the system actually does!
Please measure counters for loads and stores. See the TODOs in poisson2d.ld_st.c
. This time, implement PM_LD_CMPL
and PM_ST_CMPL
.
Compile with make task2
, test your program with a single run with make run_task2
, and then finally submit a benchmarking run to the batch system with make bench_task2
. The following cell will take care of all this.
!make bench_task2
bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.ld_st.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.ld_st.bin.csv Job <24416> is submitted to default queue <batch>. <<Waiting for dispatch ...>> <<Starting on login1>> iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,4,0.0012,119819,598,817,32902,164,266 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,8,0.0013,161819,808,1027,56902,284,386 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,12,0.0014,221819,1108,1327,71902,359,461 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,16,0.0015,281819,1408,1627,86902,434,536 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,20,0.0015,341819,1708,1927,101902,509,611 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,24,0.0016,401819,2008,2227,116902,584,686 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,28,0.0016,461819,2308,2527,131902,659,761 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,32,0.0018,521819,2608,2827,146902,734,836 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,36,0.0018,581819,2908,3127,161902,809,911 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,40,0.0018,641819,3208,3427,176902,884,986 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,44,0.0019,701819,3508,3727,191902,959,1061 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,48,0.0020,761819,3808,4027,206902,1034,1136 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,52,0.0020,821819,4108,4327,221902,1109,1211 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,56,0.0021,881819,4408,4627,236902,1184,1286 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,60,0.0022,941819,4708,4927,251902,1259,1361 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,64,0.0023,1001819,5008,5227,266902,1334,1436 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,68,0.0023,1061819,5308,5527,281902,1409,1511 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,72,0.0025,1121819,5608,5827,296902,1484,1586 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,76,0.0028,1181819,5908,6127,311902,1559,1661 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,80,0.0025,1241819,6208,6427,326902,1634,1736 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,84,0.0026,1301819,6508,6727,341902,1709,1811 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,88,0.0026,1361819,6808,7027,356902,1784,1886 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,92,0.0027,1421819,7108,7327,371902,1859,1961 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,96,0.0028,1481819,7408,7627,386902,1934,2036 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,100,0.0029,1541819,7708,7927,401902,2009,2111 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,104,0.0029,1601819,8008,8227,416902,2084,2186 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,108,0.0031,1661819,8308,8527,431902,2159,2261 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,112,0.0030,1721819,8608,8827,446902,2234,2336 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,116,0.0031,1781819,8908,9127,461902,2309,2411 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,120,0.0032,1841819,9208,9427,476902,2384,2486 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,124,0.0033,1901819,9508,9727,491902,2459,2561 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,128,0.0033,1961819,9808,10027,506902,2534,2636 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,132,0.0034,2021819,10108,10327,521902,2609,2711 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,136,0.0035,2081819,10408,10627,536902,2684,2786 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,140,0.0036,2141819,10708,10927,551902,2759,2861 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,144,0.0036,2201819,11008,11227,566902,2834,2936 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,148,0.0036,2261819,11308,11527,581902,2909,3011 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,152,0.0037,2321819,11608,11827,596902,2984,3086 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,156,0.0038,2381819,11908,12127,611902,3059,3161 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,160,0.0040,2441819,12208,12427,626902,3134,3236 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,164,0.0039,2501819,12508,12727,641902,3209,3311 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,168,0.0040,2561819,12808,13027,656902,3284,3386 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,172,0.0040,2621819,13108,13327,671902,3359,3461 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,176,0.0041,2681819,13408,13627,686902,3434,3536 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,180,0.0041,2741819,13708,13927,701902,3509,3611 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,184,0.0042,2801819,14008,14227,716902,3584,3686 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,188,0.0044,2861819,14308,14527,731902,3659,3761 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,192,0.0044,2921819,14608,14827,746902,3734,3836 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,196,0.0045,2981819,14908,15127,761902,3809,3911 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,200,0.0045,3041819,15208,15427,776902,3884,3986 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,204,0.0045,3101819,15508,15727,791902,3959,4061 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,208,0.0046,3161819,15808,16027,806902,4034,4136 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,212,0.0047,3221819,16108,16327,821902,4109,4211 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,216,0.0047,3281819,16408,16627,836902,4184,4286 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,220,0.0048,3341819,16708,16927,851902,4259,4361 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,224,0.0049,3401819,17008,17227,866902,4334,4436 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,228,0.0050,3461819,17308,17527,881902,4409,4511 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,232,0.0050,3521819,17608,17827,896902,4484,4586 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,236,0.0051,3581819,17908,18127,911902,4559,4661 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,240,0.0051,3641819,18208,18427,926902,4634,4736 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,244,0.0052,3701819,18508,18727,941902,4709,4811 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,248,0.0053,3761819,18808,19027,956902,4784,4886 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,252,0.0053,3821819,19108,19327,971902,4859,4961 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,256,0.0054,3881819,19408,19627,986902,4934,5036 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,260,0.0055,3941819,19708,19927,1001902,5009,5111 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,264,0.0055,4001819,20008,20227,1016902,5084,5186 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,268,0.0056,4061819,20308,20527,1031902,5159,5261 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,272,0.0057,4121819,20608,20827,1046902,5234,5336 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,276,0.0057,4181819,20908,21127,1061902,5309,5411 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,280,0.0058,4241819,21208,21427,1076902,5384,5486 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,284,0.0059,4301819,21508,21727,1091902,5459,5561 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,288,0.0059,4361819,21808,22027,1106902,5534,5636 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,292,0.0060,4421819,22108,22327,1121902,5609,5711 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,296,0.0061,4481819,22408,22627,1136902,5684,5786 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,300,0.0061,4541819,22708,22927,1151902,5759,5861 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,304,0.0062,4601819,23008,23227,1166902,5834,5936 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,308,0.0063,4661819,23308,23527,1181902,5909,6011 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,312,0.0064,4721819,23608,23827,1196902,5984,6086 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,316,0.0066,4781819,23908,24127,1211902,6059,6161 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,320,0.0065,4841819,24208,24427,1226902,6134,6236 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,324,0.0065,4901819,24508,24727,1241902,6209,6311 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,328,0.0069,4961819,24808,25027,1256902,6284,6386 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,332,0.0066,5021819,25108,25327,1271902,6359,6461 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,336,0.0067,5081819,25408,25627,1286902,6434,6536 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,340,0.0068,5141819,25708,25927,1301902,6509,6611 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,344,0.0069,5201819,26008,26227,1316902,6584,6686 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,348,0.0069,5261819,26308,26527,1331902,6659,6761 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,352,0.0070,5321819,26608,26827,1346902,6734,6836 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,356,0.0070,5381819,26908,27127,1361902,6809,6911 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,360,0.0071,5441819,27208,27427,1376902,6884,6986 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,364,0.0072,5501819,27508,27727,1391902,6959,7061 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,368,0.0072,5561819,27808,28027,1406902,7034,7136 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,372,0.0073,5621819,28108,28327,1421902,7109,7211 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,376,0.0074,5681819,28408,28627,1436902,7184,7286 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,380,0.0074,5741819,28708,28927,1451902,7259,7361 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,384,0.0075,5801819,29008,29227,1466902,7334,7436 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,388,0.0076,5861819,29308,29527,1481902,7409,7511 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,392,0.0076,5921819,29608,29827,1496902,7484,7586 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,396,0.0077,5981819,29908,30127,1511902,7559,7661 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,400,0.0078,6041819,30208,30427,1526902,7634,7736 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,404,0.0079,6101819,30508,30727,1541902,7709,7811 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,408,0.0079,6161819,30808,31027,1556902,7784,7886 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,412,0.0080,6221819,31108,31327,1571902,7859,7961 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,416,0.0081,6281819,31408,31627,1586902,7934,8036 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,420,0.0081,6341819,31708,31927,1601902,8009,8111 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,424,0.0082,6401819,32008,32227,1616902,8084,8186 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,428,0.0082,6461819,32308,32527,1631902,8159,8261 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,432,0.0085,6521819,32608,32827,1646902,8234,8336 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,436,0.0084,6581819,32908,33127,1661902,8309,8411 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,440,0.0084,6641819,33208,33427,1676902,8384,8486 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,444,0.0085,6701819,33508,33727,1691902,8459,8561 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,448,0.0087,6761819,33808,34027,1706902,8534,8636 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,452,0.0087,6821819,34108,34327,1721902,8609,8711 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,456,0.0087,6881819,34408,34627,1736902,8684,8786 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,460,0.0088,6941819,34708,34927,1751902,8759,8861 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,464,0.0088,7001819,35008,35227,1766902,8834,8936 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,468,0.0089,7061819,35308,35527,1781902,8909,9011 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,472,0.0090,7121819,35608,35827,1796902,8984,9086 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,476,0.0091,7181819,35908,36127,1811902,9059,9161 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,480,0.0091,7241819,36208,36427,1826902,9134,9236 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,484,0.0092,7301819,36508,36727,1841902,9209,9311 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,488,0.0093,7361819,36808,37027,1856902,9284,9386 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,492,0.0094,7421819,37108,37327,1871902,9359,9461 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,496,0.0095,7481819,37408,37627,1886902,9434,9536 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,500,0.0094,7541819,37708,37927,1901902,9509,9611 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,504,0.0095,7601819,38008,38227,1916902,9584,9686 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,508,0.0096,7661819,38308,38527,1931902,9659,9761 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,512,0.0097,7721819,38608,38827,1946902,9734,9836 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,516,0.0098,7781819,38908,39127,1961902,9809,9911 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,520,0.0098,7841819,39208,39427,1976902,9884,9986 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,524,0.0099,7901819,39508,39727,1991902,9959,10061 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,528,0.0099,7961819,39808,40027,2006902,10034,10136 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,532,0.0100,8021819,40108,40327,2021902,10109,10211 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,536,0.0101,8081819,40408,40627,2036902,10184,10286 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,540,0.0101,8141819,40708,40927,2051902,10259,10361 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,544,0.0103,8201819,41008,41227,2066902,10334,10436 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,548,0.0103,8261819,41308,41527,2081902,10409,10511 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,552,0.0104,8321819,41608,41827,2096902,10484,10586 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,556,0.0106,8381819,41908,42127,2111902,10559,10661 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,560,0.0106,8441819,42208,42427,2126902,10634,10736 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,564,0.0106,8501819,42508,42727,2141902,10709,10811 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,568,0.0107,8561819,42808,43027,2156902,10784,10886 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,572,0.0108,8621819,43108,43327,2171902,10859,10961 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,576,0.0109,8681819,43408,43627,2186902,10934,11036 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,580,0.0110,8741819,43708,43927,2201902,11009,11111 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,584,0.0110,8801819,44008,44227,2216902,11084,11186 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,588,0.0110,8861819,44308,44527,2231902,11159,11261 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,592,0.0111,8921819,44608,44827,2246902,11234,11336 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,596,0.0113,8981819,44908,45127,2261902,11309,11411 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,600,0.0113,9041819,45208,45427,2276902,11384,11486 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,604,0.0114,9101819,45508,45727,2291902,11459,11561 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,608,0.0115,9161819,45808,46027,2306902,11534,11636 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,612,0.0115,9221819,46108,46327,2321902,11609,11711 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,616,0.0115,9281819,46408,46627,2336902,11684,11786 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,620,0.0116,9341819,46708,46927,2351902,11759,11861 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,624,0.0117,9401819,47008,47227,2366902,11834,11936 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,628,0.0117,9461819,47308,47527,2381902,11909,12011 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,632,0.0118,9521819,47608,47827,2396902,11984,12086 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,636,0.0119,9581819,47908,48127,2411902,12059,12161 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,640,0.0119,9641819,48208,48427,2426902,12134,12236 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,644,0.0121,9701819,48508,48727,2441902,12209,12311 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,648,0.0121,9761819,48808,49027,2456902,12284,12386 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,652,0.0121,9821819,49108,49327,2471902,12359,12461 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,656,0.0122,9881819,49408,49627,2486902,12434,12536 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,660,0.0123,9941819,49708,49927,2501902,12509,12611 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,664,0.0123,10001819,50008,50227,2516902,12584,12686 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,668,0.0124,10061819,50308,50527,2531902,12659,12761 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,672,0.0124,10121819,50608,50827,2546902,12734,12836 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,676,0.0126,10181819,50908,51127,2561902,12809,12911 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,680,0.0126,10241819,51208,51427,2576902,12884,12986 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,684,0.0127,10301819,51508,51727,2591902,12959,13061 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,688,0.0128,10361819,51808,52027,2606902,13034,13136 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,692,0.0128,10421819,52108,52327,2621902,13109,13211 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,696,0.0129,10481819,52408,52627,2636902,13184,13286 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,700,0.0131,10541819,52708,52927,2651902,13259,13361 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,704,0.0131,10601819,53008,53227,2666902,13334,13436 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,708,0.0130,10661819,53308,53527,2681902,13409,13511 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,712,0.0131,10721819,53608,53827,2696902,13484,13586 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,716,0.0132,10781819,53908,54127,2711902,13559,13661 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,720,0.0132,10841819,54208,54427,2726902,13634,13736 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,724,0.0134,10901819,54508,54727,2741902,13709,13811 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,728,0.0134,10961819,54808,55027,2756902,13784,13886 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,732,0.0134,11021819,55108,55327,2771902,13859,13961 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,736,0.0135,11081819,55408,55627,2786902,13934,14036 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,740,0.0137,11141819,55708,55927,2801902,14009,14111 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,744,0.0138,11201819,56008,56227,2816902,14084,14186 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,748,0.0137,11261819,56308,56527,2831902,14159,14261 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,752,0.0138,11321819,56608,56827,2846902,14234,14336 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,756,0.0139,11381819,56908,57127,2861902,14309,14411 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,760,0.0140,11441819,57208,57427,2876902,14384,14486 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,764,0.0140,11501819,57508,57727,2891902,14459,14561 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,768,0.0141,11561819,57808,58027,2906902,14534,14636 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,772,0.0141,11621819,58108,58327,2921902,14609,14711 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,776,0.0142,11681819,58408,58627,2936902,14684,14786 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,780,0.0143,11741819,58708,58927,2951902,14759,14861 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,784,0.0144,11801819,59008,59227,2966902,14834,14936 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,788,0.0144,11861819,59308,59527,2981902,14909,15011 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,792,0.0145,11921819,59608,59827,2996902,14984,15086 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,796,0.0145,11981819,59908,60127,3011902,15059,15161 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,800,0.0147,12041819,60208,60427,3026902,15134,15236 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,804,0.0147,12101819,60508,60727,3041902,15209,15311 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,808,0.0148,12161819,60808,61027,3056902,15284,15386 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,812,0.0148,12221819,61108,61327,3071902,15359,15461 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,816,0.0150,12281819,61408,61627,3086902,15434,15536 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,820,0.0149,12341819,61708,61927,3101902,15509,15611 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,824,0.0150,12401819,62008,62227,3116902,15584,15686 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,828,0.0151,12461819,62308,62527,3131902,15659,15761 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,832,0.0152,12521819,62608,62827,3146902,15734,15836 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,836,0.0152,12581819,62908,63127,3161902,15809,15911 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,840,0.0153,12641819,63208,63427,3176902,15884,15986 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,844,0.0153,12701819,63508,63727,3191902,15959,16061 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,848,0.0154,12761819,63808,64027,3206902,16034,16136 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,852,0.0155,12821819,64108,64327,3221902,16109,16211 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,856,0.0156,12881819,64408,64627,3236902,16184,16286 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,860,0.0156,12941819,64708,64927,3251902,16259,16361 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,864,0.0157,13001819,65008,65227,3266902,16334,16436 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,868,0.0158,13061819,65308,65527,3281902,16409,16511 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,872,0.0159,13121819,65608,65827,3296902,16484,16586 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,876,0.0159,13181819,65908,66127,3311902,16559,16661 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,880,0.0160,13241819,66208,66427,3326902,16634,16736 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,884,0.0160,13301819,66508,66727,3341902,16709,16811 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,888,0.0161,13361819,66808,67027,3356902,16784,16886 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,892,0.0162,13421819,67108,67327,3371902,16859,16961 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,896,0.0163,13481819,67408,67627,3386902,16934,17036 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,900,0.0164,13541819,67708,67927,3401902,17009,17111 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,904,0.0165,13601819,68008,68227,3416902,17084,17186 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,908,0.0165,13661819,68308,68527,3431902,17159,17261 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,912,0.0166,13721819,68608,68827,3446902,17234,17336 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,916,0.0166,13781819,68908,69127,3461902,17309,17411 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,920,0.0167,13841819,69208,69427,3476902,17384,17486 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,924,0.0168,13901819,69508,69727,3491902,17459,17561 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,928,0.0169,13961819,69808,70027,3506902,17534,17636 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,932,0.0175,14021819,70108,70327,3521902,17609,17711 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,936,0.0170,14081819,70408,70627,3536902,17684,17786 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,940,0.0171,14141819,70708,70927,3551902,17759,17861 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,944,0.0171,14201819,71008,71227,3566902,17834,17936 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,948,0.0172,14261819,71308,71527,3581902,17909,18011 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,952,0.0172,14321819,71608,71827,3596902,17984,18086 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,956,0.0173,14381819,71908,72127,3611902,18059,18161 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,960,0.0174,14441819,72208,72427,3626902,18134,18236 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,964,0.0176,14501819,72508,72727,3641902,18209,18311 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,968,0.0178,14561819,72808,73027,3656902,18284,18386 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,972,0.0177,14621819,73108,73327,3671902,18359,18461 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,976,0.0178,14681819,73408,73627,3686902,18434,18536 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,980,0.0179,14741819,73708,73927,3701902,18509,18611 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,984,0.0179,14801819,74008,74227,3716902,18584,18686 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,988,0.0180,14861819,74308,74527,3731902,18659,18761 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,992,0.0181,14921819,74608,74827,3746902,18734,18836 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,996,0.0182,14981819,74908,75127,3761902,18809,18911 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,1000,0.0182,15041819,75208,75427,3776902,18884,18986 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,1004,0.0183,15101819,75508,75727,3791902,18959,19061 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,1008,0.0183,15161819,75808,76027,3806902,19034,19136 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,1012,0.0184,15221819,76108,76327,3821902,19109,19211 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,1016,0.0185,15281819,76408,76627,3836902,19184,19286 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,1020,0.0185,15341819,76708,76927,3851902,19259,19361 iter,ny,nx,Runtime,PM_LD_CMPL (total),PM_LD_CMPL (min), PM_LD_CMPL (max),PM_ST_CMPL (total),PM_ST_CMPL (min), PM_ST_CMPL (max) 200,32,1024,0.0186,15401819,77008,77227,3866902,19334,19436 mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.ld_st.bin.csv .
Once the run finished, let's plot it again in the course of the following cells (non-interactive: make graph_task2a
).
df_ldst = pd.read_csv("poisson2d.ld_st.bin.csv", skiprows=range(2, 50000, 2))
df_ldst["Grid Points"] = df_ldst["nx"] * df_ldst["ny"]
df_ldst.head()
iter | ny | nx | Runtime | PM_LD_CMPL (total) | PM_LD_CMPL (min) | PM_LD_CMPL (max) | PM_ST_CMPL (total) | PM_ST_CMPL (min) | PM_ST_CMPL (max) | Grid Points | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 200 | 32 | 4 | 0.0012 | 119819 | 598 | 817 | 32902 | 164 | 266 | 128 |
1 | 200 | 32 | 8 | 0.0013 | 161819 | 808 | 1027 | 56902 | 284 | 386 | 256 |
2 | 200 | 32 | 12 | 0.0014 | 221819 | 1108 | 1327 | 71902 | 359 | 461 | 384 |
3 | 200 | 32 | 16 | 0.0015 | 281819 | 1408 | 1627 | 86902 | 434 | 536 | 512 |
4 | 200 | 32 | 20 | 0.0015 | 341819 | 1708 | 1927 | 101902 | 509 | 611 | 640 |
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
df_ldst.set_index("Grid Points")["PM_LD_CMPL (min)"].plot(ax=ax1, legend=True);
df_ldst.set_index("Grid Points")["PM_ST_CMPL (min)"].plot(ax=ax2, legend=True);
Also this behaviour looks – at a first glance – linear. We can again fit a first-order polynom (and re-use our previously defined function curve_fit
)!
_fit, _cov = common.print_and_return_fit(
["PM_LD_CMPL (min)", "PM_ST_CMPL (min)"],
df_ldst.set_index("Grid Points"),
linear_function,
format_value=".4f"
)
fit_parameters = {**fit_parameters, **_fit}
fit_covariance = {**fit_covariance, **_cov}
Counter PM_LD_CMPL (min) is proportional to the grid points (nx*ny) by a factor of 2.3437 (± 0.000037) Counter PM_ST_CMPL (min) is proportional to the grid points (nx*ny) by a factor of 0.5860 (± 0.000019)
Let's overlay this in one common plot:
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
for ax, pmu_counter in zip([ax1, ax2], ["PM_LD_CMPL (min)", "PM_ST_CMPL (min)"]):
df_ldst.set_index("Grid Points")[pmu_counter].plot(ax=ax, legend=True);
ax.plot(
df_ldst["Grid Points"],
linear_function(df["Grid Points"], *fit_parameters[pmu_counter]),
linestyle="--",
label="Fit: {:.2f} * x + {:.2f}".format(*fit_parameters[pmu_counter])
)
ax.legend();
Did you expect more?
The reason is simple: Among the load and store instructions counted by PM_LD_CMPL
and PM_ST_CMPL
are vector instructions which can load and store multiple (in this case: two) values at a time. To see how many bytes are loaded and stored, we need to measure counters for vectorized loads and stores as well.
Please measure counters for vectorized loads and vectorized stores. See the TODOs in poisson2d.vld.c
and poisson2d.vst.c
(Note: These vector counters can not be measured together and need separate files and runs). Can you find out the name of the counters yourself, using papi_native_avail | grep VECTOR_
?
Compile, test, and bench-run your program again.
!papi_native_avail | grep VECTOR_
| PM_VECTOR_FLOP_CMPL | | PM_VECTOR_LD_CMPL | | PM_VECTOR_ST_CMPL |
make bench_task3
will submit benchmark runs of both vectorized counters to the batch system (as two subsequent runs of the individual files).
!make bench_task3
bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.vld.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.vld.bin.csv Job <24641> is submitted to default queue <batch>. <<Waiting for dispatch ...>> <<Starting on login1>> iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,4,0.0010,0,0,0 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,8,0.0011,114000,570,570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,12,0.0012,174000,870,870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,16,0.0012,234000,1170,1170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,20,0.0013,294000,1470,1470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,24,0.0014,354000,1770,1770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,28,0.0014,414000,2070,2070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,32,0.0015,474000,2370,2370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,36,0.0016,534000,2670,2670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,40,0.0016,594000,2970,2970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,44,0.0017,654000,3270,3270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,48,0.0018,714000,3570,3570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,52,0.0018,774000,3870,3870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,56,0.0019,834000,4170,4170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,60,0.0020,894000,4470,4470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,64,0.0021,954000,4770,4770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,68,0.0022,1014000,5070,5070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,72,0.0022,1074000,5370,5370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,76,0.0022,1134000,5670,5670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,80,0.0023,1194000,5970,5970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,84,0.0024,1254000,6270,6270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,88,0.0024,1314000,6570,6570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,92,0.0025,1374000,6870,6870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,96,0.0027,1434000,7170,7170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,100,0.0026,1494000,7470,7470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,104,0.0029,1554000,7770,7770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,108,0.0027,1614000,8070,8070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,112,0.0028,1674000,8370,8370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,116,0.0029,1734000,8670,8670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,120,0.0029,1794000,8970,8970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,124,0.0030,1854000,9270,9270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,128,0.0032,1914000,9570,9570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,132,0.0031,1974000,9870,9870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,136,0.0032,2034000,10170,10170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,140,0.0033,2094000,10470,10470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,144,0.0033,2154000,10770,10770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,148,0.0034,2214000,11070,11070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,152,0.0036,2274000,11370,11370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,156,0.0035,2334000,11670,11670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,160,0.0036,2394000,11970,11970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,164,0.0037,2454000,12270,12270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,168,0.0037,2514000,12570,12570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,172,0.0038,2574000,12870,12870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,176,0.0039,2634000,13170,13170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,180,0.0039,2694000,13470,13470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,184,0.0040,2754000,13770,13770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,188,0.0041,2814000,14070,14070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,192,0.0041,2874000,14370,14370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,196,0.0042,2934000,14670,14670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,200,0.0042,2994000,14970,14970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,204,0.0043,3054000,15270,15270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,208,0.0045,3114000,15570,15570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,212,0.0045,3174000,15870,15870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,216,0.0045,3234000,16170,16170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,220,0.0046,3294000,16470,16470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,224,0.0048,3354000,16770,16770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,228,0.0047,3414000,17070,17070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,232,0.0048,3474000,17370,17370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,236,0.0048,3534000,17670,17670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,240,0.0049,3594000,17970,17970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,244,0.0050,3654000,18270,18270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,248,0.0052,3714000,18570,18570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,252,0.0051,3774000,18870,18870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,256,0.0052,3834000,19170,19170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,260,0.0052,3894000,19470,19470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,264,0.0053,3954000,19770,19770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,268,0.0054,4014000,20070,20070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,272,0.0054,4074000,20370,20370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,276,0.0055,4134000,20670,20670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,280,0.0056,4194000,20970,20970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,284,0.0056,4254000,21270,21270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,288,0.0057,4314000,21570,21570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,292,0.0058,4374000,21870,21870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,296,0.0058,4434000,22170,22170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,300,0.0059,4494000,22470,22470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,304,0.0059,4554000,22770,22770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,308,0.0060,4614000,23070,23070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,312,0.0061,4674000,23370,23370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,316,0.0062,4734000,23670,23670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,320,0.0062,4794000,23970,23970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,324,0.0063,4854000,24270,24270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,328,0.0063,4914000,24570,24570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,332,0.0064,4974000,24870,24870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,336,0.0065,5034000,25170,25170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,340,0.0065,5094000,25470,25470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,344,0.0066,5154000,25770,25770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,348,0.0069,5214000,26070,26070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,352,0.0068,5274000,26370,26370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,356,0.0070,5334000,26670,26670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,360,0.0069,5394000,26970,26970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,364,0.0070,5454000,27270,27270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,368,0.0070,5514000,27570,27570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,372,0.0071,5574000,27870,27870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,376,0.0073,5634000,28170,28170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,380,0.0073,5694000,28470,28470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,384,0.0073,5754000,28770,28770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,388,0.0074,5814000,29070,29070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,392,0.0074,5874000,29370,29370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,396,0.0076,5934000,29670,29670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,400,0.0075,5994000,29970,29970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,404,0.0076,6054000,30270,30270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,408,0.0077,6114000,30570,30570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,412,0.0078,6174000,30870,30870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,416,0.0079,6234000,31170,31170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,420,0.0079,6294000,31470,31470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,424,0.0079,6354000,31770,31770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,428,0.0080,6414000,32070,32070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,432,0.0080,6474000,32370,32370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,436,0.0081,6534000,32670,32670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,440,0.0082,6594000,32970,32970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,444,0.0083,6654000,33270,33270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,448,0.0084,6714000,33570,33570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,452,0.0084,6774000,33870,33870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,456,0.0084,6834000,34170,34170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,460,0.0085,6894000,34470,34470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,464,0.0086,6954000,34770,34770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,468,0.0087,7014000,35070,35070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,472,0.0088,7074000,35370,35370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,476,0.0088,7134000,35670,35670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,480,0.0089,7194000,35970,35970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,484,0.0090,7254000,36270,36270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,488,0.0091,7314000,36570,36570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,492,0.0091,7374000,36870,36870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,496,0.0091,7434000,37170,37170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,500,0.0094,7494000,37470,37470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,504,0.0093,7554000,37770,37770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,508,0.0095,7614000,38070,38070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,512,0.0096,7674000,38370,38370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,516,0.0095,7734000,38670,38670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,520,0.0095,7794000,38970,38970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,524,0.0097,7854000,39270,39270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,528,0.0097,7914000,39570,39570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,532,0.0098,7974000,39870,39870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,536,0.0098,8034000,40170,40170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,540,0.0099,8094000,40470,40470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,544,0.0100,8154000,40770,40770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,548,0.0101,8214000,41070,41070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,552,0.0101,8274000,41370,41370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,556,0.0104,8334000,41670,41670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,560,0.0103,8394000,41970,41970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,564,0.0103,8454000,42270,42270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,568,0.0106,8514000,42570,42570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,572,0.0105,8574000,42870,42870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,576,0.0106,8634000,43170,43170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,580,0.0108,8694000,43470,43470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,584,0.0109,8754000,43770,43770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,588,0.0108,8814000,44070,44070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,592,0.0109,8874000,44370,44370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,596,0.0109,8934000,44670,44670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,600,0.0110,8994000,44970,44970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,604,0.0111,9054000,45270,45270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,608,0.0112,9114000,45570,45570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,612,0.0112,9174000,45870,45870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,616,0.0114,9234000,46170,46170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,620,0.0113,9294000,46470,46470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,624,0.0114,9354000,46770,46770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,628,0.0117,9414000,47070,47070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,632,0.0116,9474000,47370,47370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,636,0.0116,9534000,47670,47670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,640,0.0117,9594000,47970,47970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,644,0.0119,9654000,48270,48270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,648,0.0118,9714000,48570,48570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,652,0.0119,9774000,48870,48870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,656,0.0119,9834000,49170,49170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,660,0.0121,9894000,49470,49470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,664,0.0122,9954000,49770,49770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,668,0.0123,10014000,50070,50070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,672,0.0122,10074000,50370,50370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,676,0.0123,10134000,50670,50670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,680,0.0123,10194000,50970,50970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,684,0.0125,10254000,51270,51270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,688,0.0125,10314000,51570,51570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,692,0.0127,10374000,51870,51870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,696,0.0126,10434000,52170,52170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,700,0.0127,10494000,52470,52470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,704,0.0128,10554000,52770,52770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,708,0.0129,10614000,53070,53070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,712,0.0128,10674000,53370,53370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,716,0.0131,10734000,53670,53670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,720,0.0130,10794000,53970,53970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,724,0.0130,10854000,54270,54270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,728,0.0132,10914000,54570,54570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,732,0.0133,10974000,54870,54870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,736,0.0135,11034000,55170,55170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,740,0.0135,11094000,55470,55470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,744,0.0135,11154000,55770,55770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,748,0.0134,11214000,56070,56070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,752,0.0135,11274000,56370,56370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,756,0.0136,11334000,56670,56670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,760,0.0137,11394000,56970,56970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,764,0.0137,11454000,57270,57270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,768,0.0138,11514000,57570,57570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,772,0.0139,11574000,57870,57870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,776,0.0141,11634000,58170,58170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,780,0.0140,11694000,58470,58470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,784,0.0142,11754000,58770,58770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,788,0.0141,11814000,59070,59070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,792,0.0142,11874000,59370,59370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,796,0.0143,11934000,59670,59670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,800,0.0143,11994000,59970,59970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,804,0.0145,12054000,60270,60270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,808,0.0145,12114000,60570,60570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,812,0.0145,12174000,60870,60870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,816,0.0148,12234000,61170,61170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,820,0.0148,12294000,61470,61470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,824,0.0148,12354000,61770,61770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,828,0.0148,12414000,62070,62070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,832,0.0149,12474000,62370,62370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,836,0.0150,12534000,62670,62670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,840,0.0150,12594000,62970,62970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,844,0.0151,12654000,63270,63270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,848,0.0153,12714000,63570,63570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,852,0.0153,12774000,63870,63870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,856,0.0153,12834000,64170,64170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,860,0.0154,12894000,64470,64470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,864,0.0154,12954000,64770,64770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,868,0.0155,13014000,65070,65070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,872,0.0157,13074000,65370,65370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,876,0.0156,13134000,65670,65670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,880,0.0157,13194000,65970,65970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,884,0.0157,13254000,66270,66270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,888,0.0158,13314000,66570,66570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,892,0.0159,13374000,66870,66870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,896,0.0160,13434000,67170,67170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,900,0.0160,13494000,67470,67470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,904,0.0162,13554000,67770,67770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,908,0.0162,13614000,68070,68070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,912,0.0163,13674000,68370,68370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,916,0.0163,13734000,68670,68670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,920,0.0164,13794000,68970,68970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,924,0.0165,13854000,69270,69270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,928,0.0166,13914000,69570,69570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,932,0.0166,13974000,69870,69870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,936,0.0167,14034000,70170,70170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,940,0.0167,14094000,70470,70470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,944,0.0168,14154000,70770,70770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,948,0.0170,14214000,71070,71070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,952,0.0171,14274000,71370,71370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,956,0.0171,14334000,71670,71670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,960,0.0171,14394000,71970,71970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,964,0.0175,14454000,72270,72270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,968,0.0176,14514000,72570,72570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,972,0.0176,14574000,72870,72870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,976,0.0175,14634000,73170,73170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,980,0.0178,14694000,73470,73470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,984,0.0180,14754000,73770,73770 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,988,0.0178,14814000,74070,74070 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,992,0.0179,14874000,74370,74370 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,996,0.0181,14934000,74670,74670 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,1000,0.0180,14994000,74970,74970 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,1004,0.0182,15054000,75270,75270 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,1008,0.0181,15114000,75570,75570 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,1012,0.0183,15174000,75870,75870 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,1016,0.0183,15234000,76170,76170 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,1020,0.0186,15294000,76470,76470 iter,ny,nx,Runtime,PM_VECTOR_LD_CMPL (total),PM_VECTOR_LD_CMPL (min), PM_VECTOR_LD_CMPL (max) 200,32,1024,0.0182,15354000,76770,76770 mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.vld.bin.csv . bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.vst.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.vst.bin.csv Job <24642> is submitted to default queue <batch>. <<Waiting for dispatch ...>> <<Starting on login1>> iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,4,0.0010,200,1,1 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,8,0.0011,18200,91,91 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,12,0.0012,30200,151,151 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,16,0.0012,42200,211,211 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,20,0.0013,54200,271,271 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,24,0.0013,66200,331,331 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,28,0.0014,78200,391,391 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,32,0.0015,90200,451,451 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,36,0.0015,102200,511,511 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,40,0.0016,114200,571,571 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,44,0.0017,126200,631,631 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,48,0.0017,138200,691,691 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,52,0.0018,150200,751,751 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,56,0.0019,162200,811,811 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,60,0.0020,174200,871,871 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,64,0.0020,186200,931,931 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,68,0.0022,198200,991,991 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,72,0.0023,210200,1051,1051 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,76,0.0022,222200,1111,1111 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,80,0.0023,234200,1171,1171 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,84,0.0024,246200,1231,1231 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,88,0.0024,258200,1291,1291 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,92,0.0025,270200,1351,1351 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,96,0.0025,282200,1411,1411 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,100,0.0026,294200,1471,1471 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,104,0.0027,306200,1531,1531 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,108,0.0028,318200,1591,1591 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,112,0.0028,330200,1651,1651 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,116,0.0029,342200,1711,1711 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,120,0.0030,354200,1771,1771 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,124,0.0030,366200,1831,1831 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,128,0.0031,378200,1891,1891 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,132,0.0032,390200,1951,1951 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,136,0.0032,402200,2011,2011 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,140,0.0033,414200,2071,2071 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,144,0.0033,426200,2131,2131 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,148,0.0035,438200,2191,2191 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,152,0.0035,450200,2251,2251 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,156,0.0035,462200,2311,2311 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,160,0.0036,474200,2371,2371 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,164,0.0038,486200,2431,2431 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,168,0.0037,498200,2491,2491 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,172,0.0038,510200,2551,2551 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,176,0.0038,522200,2611,2611 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,180,0.0039,534200,2671,2671 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,184,0.0040,546200,2731,2731 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,188,0.0040,558200,2791,2791 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,192,0.0041,570200,2851,2851 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,196,0.0042,582200,2911,2911 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,200,0.0044,594200,2971,2971 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,204,0.0043,606200,3031,3031 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,208,0.0044,618200,3091,3091 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,212,0.0044,630200,3151,3151 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,216,0.0045,642200,3211,3211 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,220,0.0046,654200,3271,3271 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,224,0.0046,666200,3331,3331 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,228,0.0047,678200,3391,3391 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,232,0.0048,690200,3451,3451 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,236,0.0048,702200,3511,3511 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,240,0.0049,714200,3571,3571 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,244,0.0050,726200,3631,3631 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,248,0.0050,738200,3691,3691 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,252,0.0051,750200,3751,3751 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,256,0.0052,762200,3811,3811 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,260,0.0052,774200,3871,3871 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,264,0.0053,786200,3931,3931 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,268,0.0054,798200,3991,3991 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,272,0.0054,810200,4051,4051 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,276,0.0055,822200,4111,4111 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,280,0.0055,834200,4171,4171 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,284,0.0056,846200,4231,4231 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,288,0.0057,858200,4291,4291 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,292,0.0057,870200,4351,4351 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,296,0.0058,882200,4411,4411 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,300,0.0059,894200,4471,4471 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,304,0.0059,906200,4531,4531 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,308,0.0060,918200,4591,4591 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,312,0.0061,930200,4651,4651 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,316,0.0061,942200,4711,4711 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,320,0.0062,954200,4771,4771 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,324,0.0063,966200,4831,4831 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,328,0.0063,978200,4891,4891 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,332,0.0064,990200,4951,4951 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,336,0.0065,1002200,5011,5011 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,340,0.0066,1014200,5071,5071 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,344,0.0066,1026200,5131,5131 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,348,0.0067,1038200,5191,5191 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,352,0.0069,1050200,5251,5251 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,356,0.0068,1062200,5311,5311 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,360,0.0068,1074200,5371,5371 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,364,0.0069,1086200,5431,5431 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,368,0.0070,1098200,5491,5491 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,372,0.0071,1110200,5551,5551 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,376,0.0071,1122200,5611,5611 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,380,0.0072,1134200,5671,5671 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,384,0.0073,1146200,5731,5731 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,388,0.0073,1158200,5791,5791 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,392,0.0074,1170200,5851,5851 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,396,0.0075,1182200,5911,5911 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,400,0.0075,1194200,5971,5971 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,404,0.0076,1206200,6031,6031 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,408,0.0077,1218200,6091,6091 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,412,0.0077,1230200,6151,6151 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,416,0.0080,1242200,6211,6211 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,420,0.0078,1254200,6271,6271 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,424,0.0079,1266200,6331,6331 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,428,0.0080,1278200,6391,6391 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,432,0.0081,1290200,6451,6451 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,436,0.0082,1302200,6511,6511 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,440,0.0082,1314200,6571,6571 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,444,0.0083,1326200,6631,6631 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,448,0.0083,1338200,6691,6691 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,452,0.0084,1350200,6751,6751 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,456,0.0085,1362200,6811,6811 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,460,0.0085,1374200,6871,6871 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,464,0.0087,1386200,6931,6931 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,468,0.0086,1398200,6991,6991 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,472,0.0087,1410200,7051,7051 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,476,0.0088,1422200,7111,7111 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,480,0.0090,1434200,7171,7171 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,484,0.0089,1446200,7231,7231 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,488,0.0090,1458200,7291,7291 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,492,0.0092,1470200,7351,7351 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,496,0.0092,1482200,7411,7411 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,500,0.0092,1494200,7471,7471 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,504,0.0093,1506200,7531,7531 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,508,0.0094,1518200,7591,7591 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,512,0.0095,1530200,7651,7651 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,516,0.0096,1542200,7711,7711 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,520,0.0096,1554200,7771,7771 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,524,0.0096,1566200,7831,7831 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,528,0.0097,1578200,7891,7891 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,532,0.0097,1590200,7951,7951 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,536,0.0098,1602200,8011,8011 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,540,0.0100,1614200,8071,8071 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,544,0.0099,1626200,8131,8131 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,548,0.0100,1638200,8191,8191 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,552,0.0101,1650200,8251,8251 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,556,0.0102,1662200,8311,8311 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,560,0.0102,1674200,8371,8371 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,564,0.0105,1686200,8431,8431 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,568,0.0104,1698200,8491,8491 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,572,0.0105,1710200,8551,8551 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,576,0.0105,1722200,8611,8611 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,580,0.0108,1734200,8671,8671 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,584,0.0108,1746200,8731,8731 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,588,0.0109,1758200,8791,8791 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,592,0.0109,1770200,8851,8851 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,596,0.0109,1782200,8911,8911 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,600,0.0111,1794200,8971,8971 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,604,0.0111,1806200,9031,9031 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,608,0.0112,1818200,9091,9091 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,612,0.0112,1830200,9151,9151 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,616,0.0114,1842200,9211,9211 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,620,0.0113,1854200,9271,9271 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,624,0.0114,1866200,9331,9331 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,628,0.0114,1878200,9391,9391 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,632,0.0116,1890200,9451,9451 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,636,0.0116,1902200,9511,9511 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,640,0.0117,1914200,9571,9571 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,644,0.0118,1926200,9631,9631 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,648,0.0118,1938200,9691,9691 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,652,0.0121,1950200,9751,9751 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,656,0.0121,1962200,9811,9811 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,660,0.0121,1974200,9871,9871 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,664,0.0121,1986200,9931,9931 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,668,0.0122,1998200,9991,9991 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,672,0.0122,2010200,10051,10051 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,676,0.0124,2022200,10111,10111 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,680,0.0123,2034200,10171,10171 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,684,0.0124,2046200,10231,10231 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,688,0.0126,2058200,10291,10291 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,692,0.0127,2070200,10351,10351 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,696,0.0126,2082200,10411,10411 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,700,0.0128,2094200,10471,10471 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,704,0.0127,2106200,10531,10531 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,708,0.0128,2118200,10591,10591 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,712,0.0129,2130200,10651,10651 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,716,0.0130,2142200,10711,10711 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,720,0.0130,2154200,10771,10771 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,724,0.0131,2166200,10831,10831 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,728,0.0131,2178200,10891,10891 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,732,0.0132,2190200,10951,10951 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,736,0.0134,2202200,11011,11011 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,740,0.0134,2214200,11071,11071 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,744,0.0134,2226200,11131,11131 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,748,0.0135,2238200,11191,11191 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,752,0.0136,2250200,11251,11251 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,756,0.0136,2262200,11311,11311 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,760,0.0137,2274200,11371,11371 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,764,0.0138,2286200,11431,11431 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,768,0.0138,2298200,11491,11491 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,772,0.0139,2310200,11551,11551 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,776,0.0139,2322200,11611,11611 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,780,0.0140,2334200,11671,11671 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,784,0.0141,2346200,11731,11731 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,788,0.0142,2358200,11791,11791 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,792,0.0142,2370200,11851,11851 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,796,0.0144,2382200,11911,11911 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,800,0.0144,2394200,11971,11971 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,804,0.0144,2406200,12031,12031 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,808,0.0146,2418200,12091,12091 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,812,0.0146,2430200,12151,12151 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,816,0.0146,2442200,12211,12211 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,820,0.0147,2454200,12271,12271 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,824,0.0148,2466200,12331,12331 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,828,0.0149,2478200,12391,12391 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,832,0.0149,2490200,12451,12451 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,836,0.0150,2502200,12511,12511 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,840,0.0151,2514200,12571,12571 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,844,0.0152,2526200,12631,12631 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,848,0.0151,2538200,12691,12691 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,852,0.0152,2550200,12751,12751 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,856,0.0153,2562200,12811,12811 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,860,0.0154,2574200,12871,12871 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,864,0.0155,2586200,12931,12931 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,868,0.0155,2598200,12991,12991 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,872,0.0156,2610200,13051,13051 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,876,0.0156,2622200,13111,13111 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,880,0.0157,2634200,13171,13171 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,884,0.0158,2646200,13231,13231 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,888,0.0159,2658200,13291,13291 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,892,0.0159,2670200,13351,13351 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,896,0.0160,2682200,13411,13411 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,900,0.0160,2694200,13471,13471 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,904,0.0162,2706200,13531,13531 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,908,0.0162,2718200,13591,13591 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,912,0.0163,2730200,13651,13651 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,916,0.0163,2742200,13711,13711 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,920,0.0164,2754200,13771,13771 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,924,0.0165,2766200,13831,13831 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,928,0.0166,2778200,13891,13891 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,932,0.0168,2790200,13951,13951 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,936,0.0167,2802200,14011,14011 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,940,0.0169,2814200,14071,14071 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,944,0.0169,2826200,14131,14131 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,948,0.0169,2838200,14191,14191 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,952,0.0170,2850200,14251,14251 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,956,0.0170,2862200,14311,14311 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,960,0.0171,2874200,14371,14371 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,964,0.0175,2886200,14431,14431 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,968,0.0175,2898200,14491,14491 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,972,0.0176,2910200,14551,14551 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,976,0.0176,2922200,14611,14611 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,980,0.0178,2934200,14671,14671 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,984,0.0178,2946200,14731,14731 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,988,0.0179,2958200,14791,14791 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,992,0.0178,2970200,14851,14851 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,996,0.0181,2982200,14911,14911 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,1000,0.0180,2994200,14971,14971 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,1004,0.0181,3006200,15031,15031 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,1008,0.0182,3018200,15091,15091 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,1012,0.0183,3030200,15151,15151 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,1016,0.0183,3042200,15211,15211 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,1020,0.0184,3054200,15271,15271 iter,ny,nx,Runtime,PM_VECTOR_ST_CMPL (total),PM_VECTOR_ST_CMPL (min), PM_VECTOR_ST_CMPL (max) 200,32,1024,0.0182,3066200,15331,15331 mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.vst.bin.csv .
Let's plot it again, as soon as the run finishes! Non-interactively, call graph_task2b
.
Because we couldn't measure the two vector counters at the same time, we have two CSV files to read in now. We combine them into one common dataframe df_vldvst
in the following.
df_vld = pd.read_csv("poisson2d.vld.bin.csv", skiprows=range(2, 50000, 2))
df_vst = pd.read_csv("poisson2d.vst.bin.csv", skiprows=range(2, 50000, 2))
df_vldvst = pd.concat([df_vld.set_index("nx"), df_vst.set_index("nx")[['PM_VECTOR_ST_CMPL (total)', 'PM_VECTOR_ST_CMPL (min)', ' PM_VECTOR_ST_CMPL (max)']]], axis=1).reset_index()
df_vldvst["Grid Points"] = df_vldvst["nx"] * df_vldvst["ny"]
df_vldvst.head()
nx | iter | ny | Runtime | PM_VECTOR_LD_CMPL (total) | PM_VECTOR_LD_CMPL (min) | PM_VECTOR_LD_CMPL (max) | PM_VECTOR_ST_CMPL (total) | PM_VECTOR_ST_CMPL (min) | PM_VECTOR_ST_CMPL (max) | Grid Points | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | 200 | 32 | 0.0010 | 0 | 0 | 0 | 200 | 1 | 1 | 128 |
1 | 8 | 200 | 32 | 0.0011 | 114000 | 570 | 570 | 18200 | 91 | 91 | 256 |
2 | 12 | 200 | 32 | 0.0012 | 174000 | 870 | 870 | 30200 | 151 | 151 | 384 |
3 | 16 | 200 | 32 | 0.0012 | 234000 | 1170 | 1170 | 42200 | 211 | 211 | 512 |
4 | 20 | 200 | 32 | 0.0013 | 294000 | 1470 | 1470 | 54200 | 271 | 271 | 640 |
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
df_vldvst.set_index("Grid Points")["PM_VECTOR_LD_CMPL (min)"].plot(ax=ax1, legend=True);
df_vldvst.set_index("Grid Points")["PM_VECTOR_ST_CMPL (min)"].plot(ax=ax2, legend=True);
Also here seems to be a linear correlation. Let's do our fitting and plot directly.
_fit, _cov = common.print_and_return_fit(
["PM_VECTOR_LD_CMPL (min)", "PM_VECTOR_ST_CMPL (min)"],
df_vldvst.set_index("Grid Points"),
linear_function,
format_value=".4f",
)
fit_parameters = {**fit_parameters, **_fit}
fit_covariance = {**fit_covariance, **_cov}
Counter PM_VECTOR_LD_CMPL (min) is proportional to the grid points (nx*ny) by a factor of 2.3439 (± 0.000111) Counter PM_VECTOR_ST_CMPL (min) is proportional to the grid points (nx*ny) by a factor of 0.4688 (± 0.000012)
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
for ax, pmu_counter in zip([ax1, ax2], ["PM_VECTOR_LD_CMPL (min)", "PM_VECTOR_ST_CMPL (min)"]):
df_vldvst.set_index("Grid Points")[pmu_counter].plot(ax=ax, legend=True);
ax.plot(
df_vldvst["Grid Points"],
linear_function(df["Grid Points"], *fit_parameters[pmu_counter]),
linestyle="--",
label="Fit: {:.2f} * x + {:.2f}".format(*fit_parameters[pmu_counter])
)
ax.legend();
Let's try to make sense of those numbers.
Vector loads and vector stores use two 8 Byte values at a time. When we measured loads and stores with LD_CMPL
and ST_CMPL
in part A of this task, we measured total number of stores and loads; that is: vector and scalar versions of the instructions. In order to convert the load and store instructions into bytes loaded and stored, we need to separate them. The difference of total instructions and vector instructions yield scalar instructions. We multiply the scalar instructions by 8 Byte (double precision) and the vector instructions by 16 Byte (two loads or stores of double precision). That yields the loaded or stored data (or, more precisely, the instruction-equivalent data).
To formualize it, see the following equations, as an example for load ($ld$), with $b$ denoting data loaded in bytes and $n$ denoting the number of instructions.
\begin{align} b_\text{ld} &= b_\text{ld}^\text{scalar} + b_\text{ld}^\text{vector}\\ b_\text{ld}^\text{scalar} &= n_\text{ld}^\text{scalar} * 8\,\text{Byte} \\ b_\text{ld}^\text{vector} &= n_\text{ld}^\text{vector} * 16\,\text{Byte} \\ n_\text{ld}^\text{scalar} &= n_\text{ld}^\text{total} - n_\text{ld}^\text{vector}\\ \Rightarrow b_\text{ld} &= n_\text{ld}^\text{scalar}* 8 \,\text{Byte} + n_\text{ld}^\text{vector} * 16\,\text{Byte} \\ & = (n_\text{ld}^\text{scalar}+2 n_\text{ld}^\text{vector}) * 8\,Byte \\ & = (n_\text{ld}^\text{total} - n_\text{ld}^\text{vector} + 2 n_\text{ld}^\text{vector}) * 8\,Byte \\ & = (n_\text{ld}^\text{total} + n_\text{ld}^\text{vector}) *8\,Byte \end{align}We are going to print this in the next cell. In case you look at this Notebook non-interactively, call graph_task2b-2
.
df_byte = pd.DataFrame()
df_byte["Loads"] = (df_vldvst.set_index("Grid Points")["PM_VECTOR_LD_CMPL (min)"] + df_ldst.set_index("Grid Points")["PM_LD_CMPL (min)"])*8
df_byte["Stores"] = (df_vldvst.set_index("Grid Points")["PM_VECTOR_ST_CMPL (min)"] + df_ldst.set_index("Grid Points")["PM_ST_CMPL (min)"])*8
ax = df_byte.plot()
ax.set_ylabel("Bytes");
Let's quantify the difference by, again, fitting a linear function to the data.
_fit, _cov = common.print_and_return_fit(
["Loads", "Stores"],
df_byte,
linear_function
)
fit_parameters = {**fit_parameters, **_fit}
fit_covariance = {**fit_covariance, **_cov}
Counter Loads is proportional to the grid points (nx*ny) by a factor of 37.5010 (± 0.000592) Counter Stores is proportional to the grid points (nx*ny) by a factor of 8.4379 (± 0.000247)
Analagously to the proportionality factors, this much is loaded/stored per grid point.
df_bandwidth = pd.DataFrame()
df_bandwidth["Bandwidth / Byte/Cycle"] = (df_byte["Loads"] + df_byte["Stores"]) / df.set_index("Grid Points")["PM_RUN_CYC (min)"]
Let's display it as a function of grid points. And also compare it to the available L1 cache bandwidth in a second (sub-)plot. Non-interactive users, call make graph_task2c
.
fig, (ax1, ax2) = plt.subplots(nrows=2, sharex=True)
for ax in [ax1, ax2]:
df_bandwidth["Bandwidth / Byte/Cycle"].plot(ax=ax, legend=True, label="Jacobi Bandwidth")
ax.set_ylabel("Byte/Cycle")
ax2.axhline(2*16, color=sns.color_palette()[1], label="L1 Bandwidth");
ax2.legend();
As you can see, we are quite a bit away from the available L1 cache bandwidth. Can you think of reasons why?
If you still have time, feel free to work on the following extended task.
TASK: Please measure counters for vectorized floating point operations and scalar floating point operations. The two counters can also not be measured during the same run. So please see the TODOs in poisson2d.sflops.c
and poisson2d.vflops.c
. By now you should be able to find out the names of the counters by yourself (Hint: they include the words »scalar« and »vector«…).
As usual, compile, test, and bench-run your program.
!make bench_task4
bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.sflop.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.sflop.bin.csv Job <24645> is submitted to default queue <batch>. <<Waiting for dispatch ...>> <<Starting on login1>> iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,4,0.0010,96000,480,480 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,8,0.0011,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,12,0.0012,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,16,0.0012,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,20,0.0013,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,24,0.0013,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,28,0.0014,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,32,0.0015,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,36,0.0015,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,40,0.0016,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,44,0.0017,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,48,0.0017,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,52,0.0018,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,56,0.0022,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,60,0.0019,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,64,0.0021,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,68,0.0022,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,72,0.0021,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,76,0.0022,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,80,0.0023,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,84,0.0025,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,88,0.0024,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,92,0.0025,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,96,0.0025,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,100,0.0026,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,104,0.0027,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,108,0.0027,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,112,0.0028,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,116,0.0028,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,120,0.0031,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,124,0.0030,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,128,0.0030,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,132,0.0031,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,136,0.0032,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,140,0.0032,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,144,0.0033,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,148,0.0034,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,152,0.0035,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,156,0.0035,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,160,0.0036,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,164,0.0036,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,168,0.0037,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,172,0.0038,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,176,0.0038,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,180,0.0039,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,184,0.0040,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,188,0.0040,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,192,0.0041,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,196,0.0042,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,200,0.0042,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,204,0.0043,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,208,0.0043,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,212,0.0044,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,216,0.0045,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,220,0.0045,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,224,0.0046,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,228,0.0047,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,232,0.0047,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,236,0.0048,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,240,0.0049,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,244,0.0049,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,248,0.0051,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,252,0.0051,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,256,0.0053,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,260,0.0052,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,264,0.0053,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,268,0.0054,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,272,0.0054,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,276,0.0054,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,280,0.0055,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,284,0.0056,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,288,0.0056,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,292,0.0057,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,296,0.0058,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,300,0.0058,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,304,0.0059,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,308,0.0060,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,312,0.0060,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,316,0.0062,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,320,0.0062,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,324,0.0062,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,328,0.0063,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,332,0.0064,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,336,0.0065,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,340,0.0065,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,344,0.0066,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,348,0.0066,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,352,0.0067,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,356,0.0068,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,360,0.0069,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,364,0.0069,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,368,0.0070,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,372,0.0072,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,376,0.0071,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,380,0.0071,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,384,0.0072,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,388,0.0073,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,392,0.0074,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,396,0.0076,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,400,0.0075,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,404,0.0076,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,408,0.0076,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,412,0.0077,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,416,0.0078,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,420,0.0078,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,424,0.0079,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,428,0.0079,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,432,0.0080,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,436,0.0081,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,440,0.0082,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,444,0.0082,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,448,0.0084,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,452,0.0083,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,456,0.0084,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,460,0.0085,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,464,0.0085,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,468,0.0086,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,472,0.0087,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,476,0.0089,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,480,0.0088,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,484,0.0089,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,488,0.0089,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,492,0.0090,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,496,0.0091,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,500,0.0092,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,504,0.0092,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,508,0.0093,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,512,0.0094,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,516,0.0094,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,520,0.0095,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,524,0.0096,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,528,0.0096,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,532,0.0098,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,536,0.0097,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,540,0.0098,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,544,0.0099,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,548,0.0100,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,552,0.0101,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,556,0.0101,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,560,0.0102,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,564,0.0103,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,568,0.0104,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,572,0.0105,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,576,0.0105,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,580,0.0106,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,584,0.0107,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,588,0.0107,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,592,0.0108,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,596,0.0109,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,600,0.0110,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,604,0.0111,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,608,0.0111,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,612,0.0112,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,616,0.0112,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,620,0.0113,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,624,0.0114,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,628,0.0115,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,632,0.0115,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,636,0.0115,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,640,0.0116,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,644,0.0118,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,648,0.0117,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,652,0.0119,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,656,0.0119,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,660,0.0121,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,664,0.0120,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,668,0.0122,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,672,0.0121,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,676,0.0124,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,680,0.0123,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,684,0.0125,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,688,0.0124,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,692,0.0125,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,696,0.0126,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,700,0.0127,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,704,0.0126,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,708,0.0127,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,712,0.0129,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,716,0.0128,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,720,0.0129,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,724,0.0132,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,728,0.0131,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,732,0.0131,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,736,0.0133,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,740,0.0133,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,744,0.0133,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,748,0.0134,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,752,0.0136,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,756,0.0136,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,760,0.0136,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,764,0.0136,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,768,0.0138,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,772,0.0138,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,776,0.0139,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,780,0.0139,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,784,0.0140,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,788,0.0140,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,792,0.0141,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,796,0.0142,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,800,0.0143,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,804,0.0143,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,808,0.0144,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,812,0.0144,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,816,0.0145,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,820,0.0146,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,824,0.0148,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,828,0.0147,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,832,0.0148,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,836,0.0149,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,840,0.0150,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,844,0.0150,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,848,0.0150,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,852,0.0151,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,856,0.0152,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,860,0.0152,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,864,0.0153,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,868,0.0154,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,872,0.0156,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,876,0.0156,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,880,0.0156,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,884,0.0157,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,888,0.0157,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,892,0.0158,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,896,0.0159,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,900,0.0159,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,904,0.0161,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,908,0.0162,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,912,0.0164,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,916,0.0163,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,920,0.0164,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,924,0.0165,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,928,0.0166,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,932,0.0166,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,936,0.0167,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,940,0.0167,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,944,0.0168,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,948,0.0169,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,952,0.0172,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,956,0.0171,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,960,0.0172,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,964,0.0175,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,968,0.0175,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,972,0.0176,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,976,0.0177,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,980,0.0178,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,984,0.0178,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,988,0.0179,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,992,0.0179,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,996,0.0182,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,1000,0.0181,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,1004,0.0182,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,1008,0.0182,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,1012,0.0184,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,1016,0.0184,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,1020,0.0186,0,0,0 iter,ny,nx,Runtime,PM_SCALAR_FLOP_CMPL (total),PM_SCALAR_FLOP_CMPL (min), PM_SCALAR_FLOP_CMPL (max) 200,32,1024,0.0182,0,0,0 mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.sflop.bin.csv . bsub -W 60 -nnodes 1 -Is -P TRN003 jsrun -n 1 -c 1 -g ALL_GPUS ./bench.sh poisson2d.vflop.bin /gpfs/wolf/trn003/scratch/aherten//poisson2d.vflop.bin.csv Job <24646> is submitted to default queue <batch>. <<Waiting for dispatch ...>> <<Starting on login1>> iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,4,0.0010,0,0,0 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,8,0.0011,150000,750,750 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,12,0.0012,246000,1230,1230 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,16,0.0012,342000,1710,1710 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,20,0.0013,438000,2190,2190 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,24,0.0013,534000,2670,2670 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,28,0.0014,630000,3150,3150 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,32,0.0015,726000,3630,3630 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,36,0.0016,822000,4110,4110 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,40,0.0016,918000,4590,4590 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,44,0.0017,1014000,5070,5070 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,48,0.0017,1110000,5550,5550 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,52,0.0018,1206000,6030,6030 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,56,0.0019,1302000,6510,6510 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,60,0.0019,1398000,6990,6990 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,64,0.0020,1494000,7470,7470 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,68,0.0022,1590000,7950,7950 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,72,0.0021,1686000,8430,8430 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,76,0.0022,1782000,8910,8910 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,80,0.0023,1878000,9390,9390 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,84,0.0025,1974000,9870,9870 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,88,0.0024,2070000,10350,10350 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,92,0.0026,2166000,10830,10830 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,96,0.0025,2262000,11310,11310 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,100,0.0026,2358000,11790,11790 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,104,0.0027,2454000,12270,12270 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,108,0.0027,2550000,12750,12750 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,112,0.0029,2646000,13230,13230 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,116,0.0029,2742000,13710,13710 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,120,0.0029,2838000,14190,14190 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,124,0.0030,2934000,14670,14670 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,128,0.0031,3030000,15150,15150 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,132,0.0031,3126000,15630,15630 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,136,0.0032,3222000,16110,16110 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,140,0.0032,3318000,16590,16590 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,144,0.0033,3414000,17070,17070 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,148,0.0036,3510000,17550,17550 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,152,0.0035,3606000,18030,18030 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,156,0.0035,3702000,18510,18510 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,160,0.0036,3798000,18990,18990 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,164,0.0036,3894000,19470,19470 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,168,0.0037,3990000,19950,19950 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,172,0.0038,4086000,20430,20430 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,176,0.0038,4182000,20910,20910 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,180,0.0039,4278000,21390,21390 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,184,0.0040,4374000,21870,21870 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,188,0.0041,4470000,22350,22350 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,192,0.0041,4566000,22830,22830 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,196,0.0042,4662000,23310,23310 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,200,0.0042,4758000,23790,23790 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,204,0.0043,4854000,24270,24270 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,208,0.0044,4950000,24750,24750 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,212,0.0044,5046000,25230,25230 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,216,0.0045,5142000,25710,25710 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,220,0.0046,5238000,26190,26190 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,224,0.0046,5334000,26670,26670 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,228,0.0048,5430000,27150,27150 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,232,0.0049,5526000,27630,27630 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,236,0.0048,5622000,28110,28110 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,240,0.0049,5718000,28590,28590 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,244,0.0049,5814000,29070,29070 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,248,0.0050,5910000,29550,29550 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,252,0.0051,6006000,30030,30030 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,256,0.0051,6102000,30510,30510 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,260,0.0052,6198000,30990,30990 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,264,0.0053,6294000,31470,31470 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,268,0.0054,6390000,31950,31950 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,272,0.0054,6486000,32430,32430 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,276,0.0054,6582000,32910,32910 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,280,0.0055,6678000,33390,33390 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,284,0.0056,6774000,33870,33870 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,288,0.0057,6870000,34350,34350 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,292,0.0057,6966000,34830,34830 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,296,0.0058,7062000,35310,35310 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,300,0.0059,7158000,35790,35790 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,304,0.0059,7254000,36270,36270 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,308,0.0060,7350000,36750,36750 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,312,0.0062,7446000,37230,37230 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,316,0.0061,7542000,37710,37710 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,320,0.0062,7638000,38190,38190 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,324,0.0062,7734000,38670,38670 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,328,0.0063,7830000,39150,39150 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,332,0.0064,7926000,39630,39630 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,336,0.0065,8022000,40110,40110 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,340,0.0065,8118000,40590,40590 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,344,0.0066,8214000,41070,41070 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,348,0.0066,8310000,41550,41550 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,352,0.0067,8406000,42030,42030 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,356,0.0068,8502000,42510,42510 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,360,0.0068,8598000,42990,42990 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,364,0.0069,8694000,43470,43470 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,368,0.0070,8790000,43950,43950 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,372,0.0070,8886000,44430,44430 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,376,0.0071,8982000,44910,44910 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,380,0.0072,9078000,45390,45390 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,384,0.0072,9174000,45870,45870 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,388,0.0073,9270000,46350,46350 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,392,0.0074,9366000,46830,46830 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,396,0.0074,9462000,47310,47310 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,400,0.0075,9558000,47790,47790 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,404,0.0075,9654000,48270,48270 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,408,0.0076,9750000,48750,48750 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,412,0.0077,9846000,49230,49230 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,416,0.0079,9942000,49710,49710 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,420,0.0078,10038000,50190,50190 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,424,0.0080,10134000,50670,50670 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,428,0.0080,10230000,51150,51150 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,432,0.0080,10326000,51630,51630 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,436,0.0083,10422000,52110,52110 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,440,0.0082,10518000,52590,52590 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,444,0.0083,10614000,53070,53070 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,448,0.0083,10710000,53550,53550 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,452,0.0083,10806000,54030,54030 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,456,0.0084,10902000,54510,54510 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,460,0.0085,10998000,54990,54990 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,464,0.0085,11094000,55470,55470 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,468,0.0086,11190000,55950,55950 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,472,0.0087,11286000,56430,56430 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,476,0.0087,11382000,56910,56910 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,480,0.0088,11478000,57390,57390 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,484,0.0089,11574000,57870,57870 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,488,0.0089,11670000,58350,58350 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,492,0.0091,11766000,58830,58830 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,496,0.0091,11862000,59310,59310 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,500,0.0091,11958000,59790,59790 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,504,0.0092,12054000,60270,60270 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,508,0.0093,12150000,60750,60750 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,512,0.0094,12246000,61230,61230 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,516,0.0096,12342000,61710,61710 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,520,0.0096,12438000,62190,62190 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,524,0.0095,12534000,62670,62670 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,528,0.0098,12630000,63150,63150 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,532,0.0097,12726000,63630,63630 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,536,0.0097,12822000,64110,64110 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,540,0.0098,12918000,64590,64590 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,544,0.0100,13014000,65070,65070 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,548,0.0102,13110000,65550,65550 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,552,0.0102,13206000,66030,66030 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,556,0.0101,13302000,66510,66510 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,560,0.0103,13398000,66990,66990 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,564,0.0103,13494000,67470,67470 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,568,0.0104,13590000,67950,67950 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,572,0.0105,13686000,68430,68430 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,576,0.0105,13782000,68910,68910 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,580,0.0107,13878000,69390,69390 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,584,0.0108,13974000,69870,69870 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,588,0.0107,14070000,70350,70350 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,592,0.0108,14166000,70830,70830 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,596,0.0109,14262000,71310,71310 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,600,0.0110,14358000,71790,71790 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,604,0.0110,14454000,72270,72270 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,608,0.0111,14550000,72750,72750 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,612,0.0114,14646000,73230,73230 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,616,0.0112,14742000,73710,73710 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,620,0.0113,14838000,74190,74190 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,624,0.0114,14934000,74670,74670 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,628,0.0116,15030000,75150,75150 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,632,0.0115,15126000,75630,75630 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,636,0.0117,15222000,76110,76110 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,640,0.0116,15318000,76590,76590 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,644,0.0118,15414000,77070,77070 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,648,0.0117,15510000,77550,77550 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,652,0.0119,15606000,78030,78030 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,656,0.0119,15702000,78510,78510 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,660,0.0120,15798000,78990,78990 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,664,0.0120,15894000,79470,79470 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,668,0.0121,15990000,79950,79950 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,672,0.0121,16086000,80430,80430 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,676,0.0123,16182000,80910,80910 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,680,0.0122,16278000,81390,81390 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,684,0.0125,16374000,81870,81870 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,688,0.0124,16470000,82350,82350 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,692,0.0126,16566000,82830,82830 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,696,0.0125,16662000,83310,83310 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,700,0.0127,16758000,83790,83790 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,704,0.0128,16854000,84270,84270 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,708,0.0128,16950000,84750,84750 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,712,0.0128,17046000,85230,85230 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,716,0.0128,17142000,85710,85710 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,720,0.0129,17238000,86190,86190 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,724,0.0130,17334000,86670,86670 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,728,0.0130,17430000,87150,87150 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,732,0.0132,17526000,87630,87630 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,736,0.0132,17622000,88110,88110 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,740,0.0133,17718000,88590,88590 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,744,0.0133,17814000,89070,89070 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,748,0.0134,17910000,89550,89550 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,752,0.0134,18006000,90030,90030 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,756,0.0136,18102000,90510,90510 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,760,0.0136,18198000,90990,90990 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,764,0.0136,18294000,91470,91470 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,768,0.0137,18390000,91950,91950 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,772,0.0139,18486000,92430,92430 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,776,0.0139,18582000,92910,92910 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,780,0.0139,18678000,93390,93390 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,784,0.0140,18774000,93870,93870 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,788,0.0140,18870000,94350,94350 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,792,0.0142,18966000,94830,94830 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,796,0.0142,19062000,95310,95310 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,800,0.0144,19158000,95790,95790 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,804,0.0143,19254000,96270,96270 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,808,0.0144,19350000,96750,96750 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,812,0.0145,19446000,97230,97230 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,816,0.0145,19542000,97710,97710 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,820,0.0146,19638000,98190,98190 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,824,0.0147,19734000,98670,98670 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,828,0.0147,19830000,99150,99150 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,832,0.0148,19926000,99630,99630 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,836,0.0151,20022000,100110,100110 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,840,0.0150,20118000,100590,100590 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,844,0.0150,20214000,101070,101070 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,848,0.0151,20310000,101550,101550 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,852,0.0152,20406000,102030,102030 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,856,0.0152,20502000,102510,102510 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,860,0.0152,20598000,102990,102990 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,864,0.0153,20694000,103470,103470 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,868,0.0154,20790000,103950,103950 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,872,0.0155,20886000,104430,104430 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,876,0.0155,20982000,104910,104910 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,880,0.0157,21078000,105390,105390 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,884,0.0157,21174000,105870,105870 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,888,0.0158,21270000,106350,106350 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,892,0.0158,21366000,106830,106830 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,896,0.0159,21462000,107310,107310 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,900,0.0161,21558000,107790,107790 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,904,0.0162,21654000,108270,108270 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,908,0.0161,21750000,108750,108750 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,912,0.0163,21846000,109230,109230 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,916,0.0164,21942000,109710,109710 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,920,0.0165,22038000,110190,110190 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,924,0.0164,22134000,110670,110670 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,928,0.0166,22230000,111150,111150 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,932,0.0166,22326000,111630,111630 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,936,0.0167,22422000,112110,112110 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,940,0.0168,22518000,112590,112590 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,944,0.0168,22614000,113070,113070 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,948,0.0169,22710000,113550,113550 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,952,0.0170,22806000,114030,114030 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,956,0.0170,22902000,114510,114510 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,960,0.0171,22998000,114990,114990 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,964,0.0176,23094000,115470,115470 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,968,0.0176,23190000,115950,115950 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,972,0.0177,23286000,116430,116430 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,976,0.0177,23382000,116910,116910 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,980,0.0178,23478000,117390,117390 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,984,0.0178,23574000,117870,117870 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,988,0.0179,23670000,118350,118350 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,992,0.0180,23766000,118830,118830 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,996,0.0181,23862000,119310,119310 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,1000,0.0182,23958000,119790,119790 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,1004,0.0182,24054000,120270,120270 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,1008,0.0182,24150000,120750,120750 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,1012,0.0184,24246000,121230,121230 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,1016,0.0185,24342000,121710,121710 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,1020,0.0184,24438000,122190,122190 iter,ny,nx,Runtime,PM_VECTOR_FLOP_CMPL (total),PM_VECTOR_FLOP_CMPL (min), PM_VECTOR_FLOP_CMPL (max) 200,32,1024,0.0182,24534000,122670,122670 mv /gpfs/wolf/trn003/scratch/aherten//poisson2d.vflop.bin.csv .
df_sflop = pd.read_csv("poisson2d.sflop.bin.csv", skiprows=range(2, 50000, 2))
df_vflop = pd.read_csv("poisson2d.vflop.bin.csv", skiprows=range(2, 50000, 2))
df_flop = pd.concat([df_sflop.set_index("nx"), df_vflop.set_index("nx")[['PM_VECTOR_FLOP_CMPL (total)', 'PM_VECTOR_FLOP_CMPL (min)', ' PM_VECTOR_FLOP_CMPL (max)']]], axis=1).reset_index()
df_flop.head()
nx | iter | ny | Runtime | PM_SCALAR_FLOP_CMPL (total) | PM_SCALAR_FLOP_CMPL (min) | PM_SCALAR_FLOP_CMPL (max) | PM_VECTOR_FLOP_CMPL (total) | PM_VECTOR_FLOP_CMPL (min) | PM_VECTOR_FLOP_CMPL (max) | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | 200 | 32 | 0.0010 | 96000 | 480 | 480 | 0 | 0 | 0 |
1 | 8 | 200 | 32 | 0.0011 | 0 | 0 | 0 | 150000 | 750 | 750 |
2 | 12 | 200 | 32 | 0.0012 | 0 | 0 | 0 | 246000 | 1230 | 1230 |
3 | 16 | 200 | 32 | 0.0012 | 0 | 0 | 0 | 342000 | 1710 | 1710 |
4 | 20 | 200 | 32 | 0.0013 | 0 | 0 | 0 | 438000 | 2190 | 2190 |
Again, the name of the vector counter is a bit misleading; not floating point operations are measured but floating point instructions. To get real floating point operations, each value needs to be multiplied by the vector width (2). We can plot the values afterwards (non-interactive: make graph_task4
).
df_flop["Grid Points"] = df_flop["nx"] * df_flop["ny"]
df_flop["Vector FlOps (min)"] = df_flop["PM_VECTOR_FLOP_CMPL (min)"] * 2
df_flop["Scalar FlOps (min)"] = df_flop["PM_SCALAR_FLOP_CMPL (min)"]
df_flop.set_index("Grid Points")[["Scalar FlOps (min)", "Vector FlOps (min)"]].plot();
_fit, _cov = common.print_and_return_fit(
["Scalar FlOps (min)", "Vector FlOps (min)"],
df_flop.set_index("Grid Points"),
linear_function
)
fit_parameters = {**fit_parameters, **_fit}
fit_covariance = {**fit_covariance, **_cov}
Counter Scalar FlOps (min) is proportional to the grid points (nx*ny) by a factor of -0.0003 (± 0.0002) Counter Vector FlOps (min) is proportional to the grid points (nx*ny) by a factor of 7.5004 (± 0.0002)
Interesting! We seem to be using the vector registers of our system very well. Basically all operations are vector operations!
With that measured, we can determine the Arithmetic Intensity; the balance of floating point operations to bytes transmitted:
\begin{align} \text{AI}^\text{emp} = I_\text{flop} / I_\text{mem} \text{,} \end{align}with $I$ denoting the respective amount. This is the emperically determined Arithmetic Intensity.
In the non-interactive version of the Notebook, please plot the graph calling make graph_task4-2
in the terminal.
I_flop_scalar = df_flop.set_index("Grid Points")["Scalar FlOps (min)"]
I_flop_vector = df_flop.set_index("Grid Points")["Vector FlOps (min)"]
I_mem_load = df_byte["Loads"]
I_mem_store = df_byte["Stores"]
df_ai = pd.DataFrame()
df_ai["Arithmetic Intensity"] = (I_flop_scalar + I_flop_vector) / (I_mem_load + I_mem_store)
ax = df_ai.plot();
ax.set_ylabel("Byte/FlOp");
Thinking back to the first lecture of the tutorial, what Arithemtic Intensity did you expect?
If you still still have time, you might venture into your own benchmarking adventure.
Maybe you noticed already, for instance in Task 2 C: At the very right to very large numbers of grid points, the behaviour of the graph changed. Something is happening there!
TASK: Revisit the counters measured above for a larger range of nx
. Right now, we only studied nx
until 1000. New effects appear above that value – partly only well above, though ($nx > 15000$).
You're on your own here. Edit the bench.sh
script to change the range and the stepping increments.
Good luck!