Skip to content
Snippets Groups Projects
Commit a64941f6 authored by Jan Ebert's avatar Jan Ebert
Browse files

Avoid MPI terminology

parent 7274481b
Branches
No related tags found
No related merge requests found
...@@ -467,9 +467,8 @@ For initialization, FSDP first defines a hierarchy of distinct, but ...@@ -467,9 +467,8 @@ For initialization, FSDP first defines a hierarchy of distinct, but
possibly nested, submodules ("units") for the model. This process is possibly nested, submodules ("units") for the model. This process is
also called "wrapping" in FSDP terminology and can be controlled using also called "wrapping" in FSDP terminology and can be controlled using
the `auto_wrap_policy` argument to `FullyShardedDataParallel`. The the `auto_wrap_policy` argument to `FullyShardedDataParallel`. The
parameters in each unit are then split and distributed ("sharded", or parameters in each unit are then split and distributed ("sharded") to
scattered) to all GPUs. In the end, each GPU contains its own, all GPUs. In the end, each GPU contains its own, distinct model shard.
distinct model shard.
Whenever we do a forward pass with the model, we sequentially pass Whenever we do a forward pass with the model, we sequentially pass
through units in the following way: FSDP automatically collects the through units in the following way: FSDP automatically collects the
...@@ -511,7 +510,7 @@ shard. This also means that we have to execute saving and loading on ...@@ -511,7 +510,7 @@ shard. This also means that we have to execute saving and loading on
every process, since the data is fully distinct. every process, since the data is fully distinct.
The example also contains an unused `save_model_singular` function The example also contains an unused `save_model_singular` function
that gathers the full model on the CPU and then saves it in a single that collects the full model on the CPU and then saves it in a single
checkpoint file which can then be loaded in a single process. Keep in checkpoint file which can then be loaded in a single process. Keep in
mind that this way of checkpointing is slower and limited by CPU mind that this way of checkpointing is slower and limited by CPU
memory. memory.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment