Kernel-benchmark
Benchmark of kernels in ChASE and others
Module loaded
module load GCC
module load CMake/3.14.0
module load ParaStationMPI/5.2.2-1
module load Boost/1.69.0-Python-2.7.16
module load imkl
Compile
mkdir build
cd build
cmake ..
make -j
HEEVD function
Execution
multi-threading version
./HEEVD/heevd.exe --s=${SIZE} --omp:threads=${NUM_THREAD}
where ${SIZE}
specifies the matrix size, and by ${NUM_THREAD}
parse the number of threads to be used.
MPI version
MPI version runs independantly mutli-threading HEEVD on each proc.
mpirun -np ${PROCS} ./HEEVD/heevd_mpi.exe --s=${SIZE} --omp:threads=${NUM_THREAD}
Improved MPI version
In this version, on each node, HEEVD is executed on only one proc in the node, and then the result is broadcasted to other procs within the same node
mpirun -np ${PROCS} ./HEEVD/heevd_shm_mpi.exe --s=${SIZE} --omp:threads=${NUM_THREAD}
Update QR
Paramter
This is a simulator of the QR inside ChASE. For the simulation, several parameters are determined:
-
${M}
: Row number of matrix to be factored, which should be equal to the matrix size in ChASE -
${NEV}
: Number of eigenvector to be computed -
${NEX}
: Extra number of vectors to be used to comute eigenpairs -
${LOCKED}
: Number of eigenparis locked in each step of iteration. This parameter should be dynamic in ChASE, but here I set it to be static for simplication. -
${THREADS}
: the number of threads to be used
Execution
Chase-like update QR
./updateQR/chaseQR.exe --m=$M --nev=${NEV} --nex=${NEX} --locked=${LOCKED} --omp:threads=${THREADS}
New proposed update QR
./updateQR/updateQR.exe --m=$M --nev=${NEV} --nex=${NEX} --locked=${LOCKED} --omp:threads=${THREADS}