From a1a617c2e95541aadff109bbcf7b972d7e68d953 Mon Sep 17 00:00:00 2001 From: ebert1 <ja.ebert@fz-juelich.de> Date: Thu, 21 Jul 2022 12:15:51 +0200 Subject: [PATCH] Update README for most recent submission script --- README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 0f997e4..e5728e9 100644 --- a/README.md +++ b/README.md @@ -32,13 +32,13 @@ breaks. Please just try again in that case. ### Starting Training -In `./run_scripts/tr1-13B-round1_juwels_pipe.sbatch`, adjust the +In `./run_scripts/tr11-176B-ml_juwels_pipe.sbatch`, adjust the `#SBATCH` variables on top as desired (most interesting is the number of `--nodes`) and execute: ```shell cd run_scripts -sbatch tr1-13B-round1_juwels_pipe.sbatch +sbatch tr11-176B-ml_juwels_pipe.sbatch ``` Please always run the scripts from the `run_scripts` directory. We @@ -49,11 +49,11 @@ also need to change the `GPUS_PER_NODE` variable accordingly, as we do not yet bother with parsing the `SLURM_GRES` value. The script we currently work with, -`./run_scripts/tr1-13B-round1_juwels_pipe.sbatch`, is the oldest +`./run_scripts/tr11-176B-ml_juwels_pipe.sbatch`, is the most recent training sbatch script from the [BigScience documentation -repository](https://github.com/bigscience-workshop/bigscience). This -matches the current data structure we use for testing; a newer version -that assumes later PyTorch versions has different data structure +repository](https://github.com/bigscience-workshop/bigscience). We +patched this to match the current data structure we use for testing; +the original version from BigScience has different data structure requirements due to the many different corpora BigScience is training on. @@ -94,4 +94,4 @@ Variables that need to be set by you: You can do even more runs with the saved checkpoints by editing `DATA_OUTPUT_PATH` in `StartLongRun.bash` and running the script again. -Checkpointing happens every `SAVE_INTERVAL` iterations, which is a variable set in `run_scripts/tr1-13B-round1_juwels_pipe.sbatch`. Checkpointing does not happen automatically at the end of the job runtime. (If this is a feature you request, let us know) \ No newline at end of file +Checkpointing happens every `SAVE_INTERVAL` iterations, which is a variable set in `./run_scripts/tr11-176B-ml_juwels_pipe.sbatch`. Checkpointing does not happen automatically at the end of the job runtime. (If this is a feature you request, let us know) -- GitLab