... | ... | @@ -130,9 +130,9 @@ srun -A cstdl --cpu-bind=none \ |
|
|
|
|
|
```
|
|
|
|
|
|
## Training with DeepSpeed
|
|
|
Example of a job:
|
|
|
### DeepSpeed
|
|
|
|
|
|
Run 200 steps across 4xV100 using DeepSpeed (zero-optimization disabled)
|
|
|
```sh
|
|
|
|
|
|
#!/usr/bin/env bash
|
... | ... | @@ -162,6 +162,71 @@ deepspeed train_dalle.py \ |
|
|
|
|
|
```
|
|
|
|
|
|
#### Configure DeepSpeed Zero Optimization
|
|
|
> Stage 0, 1, 2, and 3 refer to disabled, optimizer state partitioning, and optimizer+gradient state partitioning, and optimizer+gradient+parameter partitioning, respectively.
|
|
|
|
|
|
In order to change the DeepSpeed stage, find the python dict in `train_dalle.py` with the name `deepspeed_config` and modify as such:
|
|
|
- Stage 1: Optimizer State Partitioning
|
|
|
```python
|
|
|
deepspeed_config = {
|
|
|
"zero_optimization": {
|
|
|
"stage": 1,
|
|
|
}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
- Stage 2 (aka ZeRO-Offload): Optimizer + Gradient State Partitioning
|
|
|
Offload the optimizer (e.g. Adam) state to the CPU and partition across GPUs/nodes
|
|
|
```python
|
|
|
deepspeed_config = {
|
|
|
"zero_optimization": {
|
|
|
"stage": 2,
|
|
|
"cpu_offload": True,
|
|
|
"contiguous_gradients": True,
|
|
|
"overlap_comm": True
|
|
|
}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
- Stage 3: Optimizer + Gradient + Parameter partitioning
|
|
|
```python
|
|
|
deepspeed_config = {
|
|
|
"zero_optimization": {
|
|
|
"stage": 3,
|
|
|
"offload_param": {
|
|
|
"device": "cpu",
|
|
|
"pin_memory": True,
|
|
|
},
|
|
|
"offload_optimizer": {
|
|
|
"device": "cpu",
|
|
|
"pin_memory": True,
|
|
|
},
|
|
|
}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
- DeepSpeed Infinity (Requires NVMe drive): Optimizer + Gradient + Parameter + Checkpoint partitioning
|
|
|
Takes advantage of fast read/write to NVMe drives to offload the optimizer state to the CPU and partition across GPUs/nodes.
|
|
|
```python
|
|
|
deepspeed_config = {
|
|
|
"zero_optimization": {
|
|
|
"stage": 3,
|
|
|
"offload_param": {
|
|
|
"device": "nvme",
|
|
|
"nvme_path": "./local_nvme",
|
|
|
"pin_memory": True,
|
|
|
},
|
|
|
"offload_optimizer": {
|
|
|
"device": "nvme",
|
|
|
"nvme_path": "./local_nvme",
|
|
|
"pin_memory": True,
|
|
|
},
|
|
|
},
|
|
|
}
|
|
|
```
|
|
|
|
|
|
- For a lot more configuration options to tune, see https://www.deepspeed.ai/docs/config-json
|
|
|
|
|
|
## Monitoring a Job
|
|
|
|
|
|
You can interactively attach to a running job via `srun --pty --jobid <job-id> bash`. For example, you may now analyze GPU usage using `nvidia-smi`. |
|
|
\ No newline at end of file |