diff --git a/README.md b/README.md
index 74a9ccf6f7b1b5110989fce4a309badc32f0a6a7..a11c5ede7204afdf0b0d8ffa66b46a5d8f6bc45e 100644
--- a/README.md
+++ b/README.md
@@ -467,9 +467,8 @@ For initialization, FSDP first defines a hierarchy of distinct, but
 possibly nested, submodules ("units") for the model. This process is
 also called "wrapping" in FSDP terminology and can be controlled using
 the `auto_wrap_policy` argument to `FullyShardedDataParallel`. The
-parameters in each unit are then split and distributed ("sharded", or
-scattered) to all GPUs. In the end, each GPU contains its own,
-distinct model shard.
+parameters in each unit are then split and distributed ("sharded") to
+all GPUs. In the end, each GPU contains its own, distinct model shard.
 
 Whenever we do a forward pass with the model, we sequentially pass
 through units in the following way: FSDP automatically collects the
@@ -511,7 +510,7 @@ shard. This also means that we have to execute saving and loading on
 every process, since the data is fully distinct.
 
 The example also contains an unused `save_model_singular` function
-that gathers the full model on the CPU and then saves it in a single
+that collects the full model on the CPU and then saves it in a single
 checkpoint file which can then be loaded in a single process. Keep in
 mind that this way of checkpointing is slower and limited by CPU
 memory.