From e9acd0235b73958e64c140ed7e8c55e2f35abc15 Mon Sep 17 00:00:00 2001 From: ebert1 <ja.ebert@fz-juelich.de> Date: Thu, 29 Apr 2021 18:14:30 +0200 Subject: [PATCH] Add content for several sections - Data Preparation - Code Adjustments and Runscripts - Log Analysis Also adjust heading capitalization and mention which code was used. --- README.md | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 20ed70c..276e668 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,38 @@ MLPerf v0.7 on Juwels Booster ============ -# General explanation +# General Explanation * Singularity instead of Docker * VM offload of container prep +* Used NVIDIA submission code * Added new runscripts, left code intact otherwise. -# Data Preparation (very short) +# Data Preparation +Used NVIDIA and MLPerf guides. # Container Preparation Put general container prep code in this repo (e.g. pytorch_fm51). Point to alterations in Dockerfiles in training v0.7 Repo. -# Code Adjustments and runscripts -Just give pointer here +# Code Adjustments and Runscripts +We used the `run_and_time.sh` scripts as entry points for our +experiments. +In general, all calls to scripts related to "binding" were removed. +DGX-specific variables were removed or adjusted, as well as paths that +were out of place for our system. -# Log analysis -Commands to analyse logs. +# Log Analysis + +Commands for obtaining runtime, executed in results log directory: +- Single node: `awk '/run_start/ { start=substr($5, 0, length($5) - 1); } /run_stop/ {print ($5 - start) / 1000 / 60}' result_*.txt` +- Multi node: `awk '/run_start/ { start=substr($6, 0, length($6) - 1); } /run_stop/ {print ($6 - start) / 1000 / 60}' result_*.txt` + +Commands for obtaining samples/second per benchmark, executed in +results log directory: +| Benchmark | Command | +|- +| Bert (single node) | `grep training_sequences_per_second * | awk '{ print $5 }' | cut -d , -f 1 | sort -n | tail -n 1` | +| Bert (multi node) | `grep training_sequences_per_second * | awk '{ print $6 }' | cut -d , -f 1 | sort -n | tail -n 1` | +| GNMT | `grep 'Performance: ' * | awk '{ print $6 }' | sort -n | tail -n 1` | +| Transformer | `grep '| epoch [0-9]\+ |' * | awk '{ print $15 }' | sort -n | tail -n 1` | # Copy results here. -- GitLab