From fd280110ab8e2253879045efbf3db7fdee4057c6 Mon Sep 17 00:00:00 2001 From: Anton Alyakin Date: Wed, 7 Aug 2024 02:21:30 -0400 Subject: [PATCH 1/4] fixed the init_module and deepspeed docs --- docs/source-fabric/advanced/model_init.rst | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/docs/source-fabric/advanced/model_init.rst b/docs/source-fabric/advanced/model_init.rst index f5f76e8aa087b..61dbf00e28fd5 100644 --- a/docs/source-fabric/advanced/model_init.rst +++ b/docs/source-fabric/advanced/model_init.rst @@ -69,7 +69,7 @@ When training distributed models with :doc:`FSDP/TP ` or D .. code-block:: python - # Recommended for FSDP, TP and DeepSpeed + # Recommended for FSDP and TP with fabric.init_module(empty_init=True): model = GPT3() # parameters are placed on the meta-device @@ -79,6 +79,18 @@ When training distributed models with :doc:`FSDP/TP ` or D optimizer = torch.optim.Adam(model.parameters()) optimizer = fabric.setup_optimizers(optimizer) +With DeepSpeed Stage 3, the use of :meth:`~lightning.fabric.fabric.Fabric.init_module` context manager is necessesary for the model to be sharded correctly instead of attempted to be put on the GPU in its entirety. Deepspeed, however, requires the models and optimizer to be set up jointly. + +.. code-block:: python + + # Required with DeepSpeed Stage 3 + with fabric.init_module(empty_init=True): + model = GPT3() + + optimizer = torch.optim.Adam(model.parameters()) + model, optimizer = fabric.setup(model, optimizer) + + .. note:: Empty-init is experimental and the behavior may change in the future. For distributed models, it is required that all user-defined modules that manage parameters implement a ``reset_parameters()`` method (all PyTorch built-in modules have this too). From fe53b0c0ae8faa55015f680afe3c3178fe19c0dc Mon Sep 17 00:00:00 2001 From: Anton Alyakin Date: Wed, 7 Aug 2024 02:53:38 -0400 Subject: [PATCH 2/4] extra newline removal --- docs/source-fabric/advanced/model_init.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source-fabric/advanced/model_init.rst b/docs/source-fabric/advanced/model_init.rst index 61dbf00e28fd5..ce0ffe5c92e8a 100644 --- a/docs/source-fabric/advanced/model_init.rst +++ b/docs/source-fabric/advanced/model_init.rst @@ -90,7 +90,6 @@ With DeepSpeed Stage 3, the use of :meth:`~lightning.fabric.fabric.Fabric.init_m optimizer = torch.optim.Adam(model.parameters()) model, optimizer = fabric.setup(model, optimizer) - .. note:: Empty-init is experimental and the behavior may change in the future. For distributed models, it is required that all user-defined modules that manage parameters implement a ``reset_parameters()`` method (all PyTorch built-in modules have this too). From 30e414bf3c916e7093fd7b58d0c38f0dd2716a69 Mon Sep 17 00:00:00 2001 From: Anton Alyakin Date: Wed, 7 Aug 2024 02:56:52 -0400 Subject: [PATCH 3/4] module_init sentence phrasing --- docs/source-fabric/advanced/model_init.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source-fabric/advanced/model_init.rst b/docs/source-fabric/advanced/model_init.rst index ce0ffe5c92e8a..3cc718eec1da6 100644 --- a/docs/source-fabric/advanced/model_init.rst +++ b/docs/source-fabric/advanced/model_init.rst @@ -79,7 +79,7 @@ When training distributed models with :doc:`FSDP/TP ` or D optimizer = torch.optim.Adam(model.parameters()) optimizer = fabric.setup_optimizers(optimizer) -With DeepSpeed Stage 3, the use of :meth:`~lightning.fabric.fabric.Fabric.init_module` context manager is necessesary for the model to be sharded correctly instead of attempted to be put on the GPU in its entirety. Deepspeed, however, requires the models and optimizer to be set up jointly. +With DeepSpeed Stage 3, the use of :meth:`~lightning.fabric.fabric.Fabric.init_module` context manager is necessesary for the model to be sharded correctly instead of attempted to be put on the GPU in its entirety. Deepspeed requires the models and optimizer to be set up jointly. .. code-block:: python From f0562025045eb8f2706237691fb64419a23896b4 Mon Sep 17 00:00:00 2001 From: Jirka B Date: Thu, 3 Apr 2025 12:53:58 -0400 Subject: [PATCH 4/4] typo --- docs/source-fabric/advanced/model_init.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source-fabric/advanced/model_init.rst b/docs/source-fabric/advanced/model_init.rst index 3cc718eec1da6..f7e11f2dc4210 100644 --- a/docs/source-fabric/advanced/model_init.rst +++ b/docs/source-fabric/advanced/model_init.rst @@ -79,7 +79,7 @@ When training distributed models with :doc:`FSDP/TP ` or D optimizer = torch.optim.Adam(model.parameters()) optimizer = fabric.setup_optimizers(optimizer) -With DeepSpeed Stage 3, the use of :meth:`~lightning.fabric.fabric.Fabric.init_module` context manager is necessesary for the model to be sharded correctly instead of attempted to be put on the GPU in its entirety. Deepspeed requires the models and optimizer to be set up jointly. +With DeepSpeed Stage 3, the use of :meth:`~lightning.fabric.fabric.Fabric.init_module` context manager is necessary for the model to be sharded correctly instead of attempted to be put on the GPU in its entirety. Deepspeed requires the models and optimizer to be set up jointly. .. code-block:: python