Skip to content
Snippets Groups Projects
Commit e72c8976 authored by Jan Ebert's avatar Jan Ebert
Browse files

Use square root LR scaling

Also adapt documentation regarding the default change.
parent 1707daa1
Branches
No related tags found
No related merge requests found
......@@ -446,13 +446,12 @@ number of processes we use. That is because we only configure the
batch size" is obtained by multiplying the local batch size times the
number of processes. If we scale up the number of processes, we obtain
a larger batch size; this, in turn, this will change what learning
rate we should use. A very simple heuristic is to just scale the base
learning rate you would use for the local batch size proportional to
the number of processes: for example, we just multiply the base
learning rate times the number of processes. This is automatically
done in the code so that it "just works" with a large number of
processes, but ideally you would tune the learning rate manually for
the global batch size you use.
rate we should use. A simple heuristic is to multiply the base
learning rate you would use for the local batch size by the square
root of the number of processes. This can be done by supplying the
`--scale-lr` argument so that it "just works" with an increasing
number of processes, but ideally you would tune the learning rate
manually for the global batch size you use.
## FSDP
......
......@@ -223,7 +223,7 @@ def main():
lr = args.lr
if args.scale_lr:
# Scale learning rate according to number of processes.
lr *= torch.distributed.get_world_size()
lr *= torch.distributed.get_world_size()**0.5
opt = torch.optim.AdamW(model.parameters(), lr=lr)
# Maximum value of default dtype.
......
......@@ -275,7 +275,7 @@ def main():
lr = args.lr
if args.scale_lr:
# Scale learning rate according to number of processes.
lr *= torch.distributed.get_world_size()
lr *= torch.distributed.get_world_size()**0.5
opt = torch.optim.AdamW(model.parameters(), lr=lr)
# Maximum value of default dtype.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment