jenia jitsev · b02c076d
--- a/Home.md
+++ b/Home.md
@@ -161,35 +161,46 @@ To conduct distributed training across many nodes (each node having up to 4 GPUs

 For JUWELS Booster for instance, adapt `juwelsbooster.sh` with desired number of nodes. For an example run, you can use `juwelsbooster.sh run_cub.sh`, adapting `run_cub.sh` accordingly.    

+Further up-to-date `sbatch` scripts are in `/p/scratch/ccstdl/ebert1/dalle` (`hvd_dalle.sbatch`, `hvd_vae.sbatch`). You can copy these into your local DALLE-pytorch clone:
+
+```sh
+cp /p/scratch/ccstdl/ebert1/dalle/*.sbatch ~/$USER/DALLE-pytorch
+```
+
 ### Queue Training - DeepSpeed

-Run 200 steps across 4xV100 using DeepSpeed (zero-optimization disabled)
+There are up-to-date `sbatch` scripts in `/p/scratch/ccstdl/ebert1/dalle` (`run_dalle.sbatch`, `run_vae.sbatch`). You can copy these into your local DALLE-pytorch clone:
+
+```sh
+cp /p/scratch/ccstdl/ebert1/dalle/*.sbatch ~/$USER/DALLE-pytorch
+```
+
+An example of a run -
+on two nodes across 8xV100 using DeepSpeed (zero-optimization disabled)
 ```sh

 #!/usr/bin/env bash

-#SBATCH --nodes 1
+#SBATCH --nodes 2
 #SBATCH --tasks-per-node 4
-#SBATCH --gres gpu
+#SBATCH --gres gpu:4
 #SBATCH -A cstdl
-#SBATCH --partition develgpus
+#SBATCH --partition develbooster
+
+DATASET_PATH=YOUR_DATA_PATH/flickr30k_images/flickr30k_images
+VAE_PATH=vae.pt

 module purge
-module load Stages/2020 GCC OpenMPI PyTorch torchvision DeepSpeed
+module use "$OTHERSTAGES"
+module load Stages/Devel-2020
+module load GCC/9.3.0 OpenMPI DeepSpeed
+# source env/bin/activate

 # ...define vars
+export WANDB_MODE=disabled

-deepspeed train_dalle.py \
-# ...options
-            --wds=jpg,txt \
-            --reversible \
-            --lr_decay \
-            --taming \
-            --shift_tokens \
-            --rotary_emb \
-            --truncate_captions \
-            --flops_profiler \
-            --distributed_backend="deepspeed" | tee "$LOGFILE"
+srun --cpu-bind=none \
+     python -u train_dalle.py --image_text_folder "$DATASET_PATH" --deepspeed --fp16

 ```