Skip to content

Commit 98005bb

Browse files
authored
Add Studio badge to tensor parallel docs (#19913)
1 parent 896c2a6 commit 98005bb

File tree

6 files changed

+29
-12
lines changed

6 files changed

+29
-12
lines changed

docs/source-fabric/advanced/model_parallel/tp.rst

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,11 @@ Tensor parallelism is a technique for training large models by distributing laye
66
However, for smaller models, the communication overhead may outweigh its benefits.
77
This method is most effective for models with very large layers, significantly enhancing performance and memory efficiency.
88

9-
.. note:: This is an experimental feature.
9+
.. raw:: html
10+
11+
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-lightning-fabric">
12+
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
13+
</a>
1014

1115

1216
----
@@ -197,9 +201,10 @@ Later in the code, when you call ``fabric.setup(model)``, Fabric will apply the
197201
198202
fabric.print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e9:.02f} GB")
199203
200-
|
201204
202-
When measuring the peak memory consumption, we should see that doubling the number of GPUs reduces the memory consuption roughly by half:
205+
.. note:: Tensor Parallelism in Lightning Fabric as well as PyTorch is experimental. The APIs may change in the future.
206+
207+
When measuring the peak memory consumption, we should see that doubling the number of GPUs reduces the memory consumption roughly by half:
203208

204209

205210
.. list-table::

docs/source-fabric/advanced/model_parallel/tp_fsdp.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,11 @@ This hybrid approach balances the trade-offs of each method, optimizing memory u
77

88
The :doc:`Tensor Parallelism documentation <tp>` and a general understanding of `FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_ are a prerequisite for this tutorial.
99

10-
.. note:: This is an experimental feature.
10+
.. raw:: html
11+
12+
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-lightning-fabric">
13+
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
14+
</a>
1115

1216

1317
----
@@ -182,7 +186,7 @@ Finally, the tensor parallelism will apply to each group, splitting the sharded
182186
183187
fabric.print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e9:.02f} GB")
184188
185-
|
189+
.. note:: 2D Parallelism in Lightning Fabric as well as PyTorch is experimental. The APIs may change in the future.
186190

187191
Beyond this toy example, we recommend you study our `LLM 2D Parallel Example (Llama 3) <https://github.com/Lightning-AI/pytorch-lightning/tree/master/examples/fabric/tensor_parallel>`_.
188192

docs/source-pytorch/advanced/model_parallel/tp.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,11 @@ Tensor parallelism is a technique for training large models by distributing laye
66
However, for smaller models, the communication overhead may outweigh its benefits.
77
This method is most effective for models with very large layers, significantly enhancing performance and memory efficiency.
88

9-
.. note:: This is an experimental feature.
9+
.. raw:: html
10+
11+
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning">
12+
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
13+
</a>
1014

1115

1216
----
@@ -215,9 +219,9 @@ When ``trainer.fit(...)`` (or ``validate()``, ``test``, etc.) gets called, the T
215219
216220
trainer.print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e9:.02f} GB")
217221
218-
|
222+
.. note:: Tensor Parallelism in PyTorch Lightning as well as PyTorch is experimental. The APIs may change in the future.
219223

220-
When measuring the peak memory consumption, we should see that doubling the number of GPUs reduces the memory consuption roughly by half:
224+
When measuring the peak memory consumption, we should see that doubling the number of GPUs reduces the memory consumption roughly by half:
221225

222226

223227
.. list-table::

docs/source-pytorch/advanced/model_parallel/tp_fsdp.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,11 @@ This hybrid approach balances the trade-offs of each method, optimizing memory u
77

88
The :doc:`Tensor Parallelism documentation <tp>` and a general understanding of `FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_ are a prerequisite for this tutorial.
99

10-
.. note:: This is an experimental feature.
10+
.. raw:: html
11+
12+
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning">
13+
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
14+
</a>
1115

1216

1317
----
@@ -190,7 +194,7 @@ Finally, the tensor parallelism will apply to each group, splitting the sharded
190194
trainer.print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e9:.02f} GB")
191195
192196
193-
|
197+
.. note:: 2D Parallelism in PyTorch Lightning as well as PyTorch is experimental. The APIs may change in the future.
194198

195199
Beyond this toy example, we recommend you study our `LLM 2D Parallel Example (Llama 3) <https://github.com/Lightning-AI/pytorch-lightning/tree/master/examples/pytorch/tensor_parallel>`_.
196200

examples/fabric/tensor_parallel/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## Tensor Parallel and 2D Parallel
22

3-
This example shows how to apply tensor-parallelism to your model (here Llama 2 7B) with the `ModelParallelStrategy`, and how it can be combined with FSDP (2D parallelism).
3+
This example shows how to apply tensor-parallelism to your model (here Llama 3 7B) with the `ModelParallelStrategy`, and how it can be combined with FSDP (2D parallelism).
44
PyTorch 2.3+ and a machine with at least 4 GPUs and 24 GB memory each are required to run this example.
55

66
```bash

examples/pytorch/tensor_parallel/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## Tensor Parallel and 2D Parallel
22

3-
This example shows how to apply tensor-parallelism to your model (here Llama 2 7B) with the `ModelParallelStrategy`, and how it can be combined with FSDP (2D parallelism).
3+
This example shows how to apply tensor-parallelism to your model (here Llama 3 7B) with the `ModelParallelStrategy`, and how it can be combined with FSDP (2D parallelism).
44
PyTorch 2.3+ and a machine with at least 4 GPUs and 24 GB memory each are required to run this example.
55

66
```bash

0 commit comments

Comments
 (0)