... | ... | @@ -57,9 +57,14 @@ Once the partitions are configured correctly, you should be able to start a DALL |
|
|
|
|
|
If everything runs fine, you can change the paths in the `sbatch` scripts according to your locations.
|
|
|
|
|
|
## (Data Parallel) Training with Horovod
|
|
|
Example of a job:
|
|
|
## (Data Parallel)
|
|
|
|
|
|
### Horovod
|
|
|
|
|
|
Example: Run on a 4xV100 dev instance (2 hour time limit) on JUWELS.
|
|
|
`--flops_profiler` will stop training at after 200 steps.
|
|
|
Fill out `HOME_PATH`, `CHECKPOINT_NAME`, and `LOGS_PATH` first.
|
|
|
Change remaining parameters as needed.
|
|
|
```sh
|
|
|
#!/usr/bin/env bash
|
|
|
|
... | ... | |