Use a simple auto wrapping policy

This way, the example actually shows a way to properly distribute a module using FSDP.

Use a simple auto wrapping policy
1707daa1 · Jan Ebert · c040e4d9 · 1707daa1
Commit 1707daa1 authored 1 year ago by Jan Ebert
--- a/pytorch-fsdp-example/main.py
+++ b/pytorch-fsdp-example/main.py
@@ -264,6 +264,11 @@ def main():
    model = fsdp.FullyShardedDataParallel(
        model,
        device_id=local_rank,
+        auto_wrap_policy=functools.partial(
+            fsdp.wrap.size_based_auto_wrap_policy,
+            # Wrap every 1B parameters.
+            min_num_params=int(1e9),
+        ),
    )
    loss_func = torch.nn.CrossEntropyLoss()