Skip to content
Snippets Groups Projects
Commit e9acd023 authored by Jan Ebert's avatar Jan Ebert
Browse files

Add content for several sections

- Data Preparation
- Code Adjustments and Runscripts
- Log Analysis

Also adjust heading capitalization and mention which code was used.
parent f959fc77
Branches
No related tags found
No related merge requests found
MLPerf v0.7 on Juwels Booster MLPerf v0.7 on Juwels Booster
============ ============
# General explanation # General Explanation
* Singularity instead of Docker * Singularity instead of Docker
* VM offload of container prep * VM offload of container prep
* Used NVIDIA submission code
* Added new runscripts, left code intact otherwise. * Added new runscripts, left code intact otherwise.
# Data Preparation (very short) # Data Preparation
Used NVIDIA and MLPerf guides.
# Container Preparation # Container Preparation
Put general container prep code in this repo (e.g. pytorch_fm51). Put general container prep code in this repo (e.g. pytorch_fm51).
Point to alterations in Dockerfiles in training v0.7 Repo. Point to alterations in Dockerfiles in training v0.7 Repo.
# Code Adjustments and runscripts # Code Adjustments and Runscripts
Just give pointer here We used the `run_and_time.sh` scripts as entry points for our
experiments.
In general, all calls to scripts related to "binding" were removed.
DGX-specific variables were removed or adjusted, as well as paths that
were out of place for our system.
# Log analysis # Log Analysis
Commands to analyse logs.
Commands for obtaining runtime, executed in results log directory:
- Single node: `awk '/run_start/ { start=substr($5, 0, length($5) - 1); } /run_stop/ {print ($5 - start) / 1000 / 60}' result_*.txt`
- Multi node: `awk '/run_start/ { start=substr($6, 0, length($6) - 1); } /run_stop/ {print ($6 - start) / 1000 / 60}' result_*.txt`
Commands for obtaining samples/second per benchmark, executed in
results log directory:
| Benchmark | Command |
|-
| Bert (single node) | `grep training_sequences_per_second * | awk '{ print $5 }' | cut -d , -f 1 | sort -n | tail -n 1` |
| Bert (multi node) | `grep training_sequences_per_second * | awk '{ print $6 }' | cut -d , -f 1 | sort -n | tail -n 1` |
| GNMT | `grep 'Performance: ' * | awk '{ print $6 }' | sort -n | tail -n 1` |
| Transformer | `grep '| epoch [0-9]\+ |' * | awk '{ print $15 }' | sort -n | tail -n 1` |
# Copy results here. # Copy results here.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment