diff --git a/README.md b/README.md index 20ed70c5fbe58b439c0354d349e4ed2be5726466..276e66835f0b78396ad554bfe4be0838063fd8a4 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,38 @@ MLPerf v0.7 on Juwels Booster ============ -# General explanation +# General Explanation * Singularity instead of Docker * VM offload of container prep +* Used NVIDIA submission code * Added new runscripts, left code intact otherwise. -# Data Preparation (very short) +# Data Preparation +Used NVIDIA and MLPerf guides. # Container Preparation Put general container prep code in this repo (e.g. pytorch_fm51). Point to alterations in Dockerfiles in training v0.7 Repo. -# Code Adjustments and runscripts -Just give pointer here +# Code Adjustments and Runscripts +We used the `run_and_time.sh` scripts as entry points for our +experiments. +In general, all calls to scripts related to "binding" were removed. +DGX-specific variables were removed or adjusted, as well as paths that +were out of place for our system. -# Log analysis -Commands to analyse logs. +# Log Analysis + +Commands for obtaining runtime, executed in results log directory: +- Single node: `awk '/run_start/ { start=substr($5, 0, length($5) - 1); } /run_stop/ {print ($5 - start) / 1000 / 60}' result_*.txt` +- Multi node: `awk '/run_start/ { start=substr($6, 0, length($6) - 1); } /run_stop/ {print ($6 - start) / 1000 / 60}' result_*.txt` + +Commands for obtaining samples/second per benchmark, executed in +results log directory: +| Benchmark | Command | +|- +| Bert (single node) | `grep training_sequences_per_second * | awk '{ print $5 }' | cut -d , -f 1 | sort -n | tail -n 1` | +| Bert (multi node) | `grep training_sequences_per_second * | awk '{ print $6 }' | cut -d , -f 1 | sort -n | tail -n 1` | +| GNMT | `grep 'Performance: ' * | awk '{ print $6 }' | sort -n | tail -n 1` | +| Transformer | `grep '| epoch [0-9]\+ |' * | awk '{ print $15 }' | sort -n | tail -n 1` | # Copy results here.