Add Studio badge to tensor parallel docs (#19913)

awaelchli · web-flow · commit 98005bbed0b7 · 2024-05-28T09:04:55.000-04:00
diff --git a/docs/source-fabric/advanced/model_parallel/tp.rst b/docs/source-fabric/advanced/model_parallel/tp.rst
@@ -6,7 +6,11 @@ Tensor parallelism is a technique for training large models by distributing laye
 However, for smaller models, the communication overhead may outweigh its benefits.
 This method is most effective for models with very large layers, significantly enhancing performance and memory efficiency.
 
-.. note:: This is an experimental feature.
+.. raw:: html
+
+    <a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-lightning-fabric">
+      <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
+    </a>
 
 
 ----
@@ -197,9 +201,10 @@ Later in the code, when you call ``fabric.setup(model)``, Fabric will apply the
 
         fabric.print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e9:.02f} GB")
 
-|
 
-When measuring the peak memory consumption, we should see that doubling the number of GPUs reduces the memory consuption roughly by half:
+.. note:: Tensor Parallelism in Lightning Fabric as well as PyTorch is experimental. The APIs may change in the future.
+
+When measuring the peak memory consumption, we should see that doubling the number of GPUs reduces the memory consumption roughly by half:
 
 
 .. list-table::
diff --git a/docs/source-fabric/advanced/model_parallel/tp_fsdp.rst b/docs/source-fabric/advanced/model_parallel/tp_fsdp.rst
@@ -7,7 +7,11 @@ This hybrid approach balances the trade-offs of each method, optimizing memory u
 
 The :doc:`Tensor Parallelism documentation <tp>` and a general understanding of `FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_ are a prerequisite for this tutorial.
 
-.. note:: This is an experimental feature.
+.. raw:: html
+
+    <a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-lightning-fabric">
+      <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
+    </a>
 
 
 ----
@@ -182,7 +186,7 @@ Finally, the tensor parallelism will apply to each group, splitting the sharded
 
         fabric.print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e9:.02f} GB")
 
-|
+.. note:: 2D Parallelism in Lightning Fabric as well as PyTorch is experimental. The APIs may change in the future.
 
 Beyond this toy example, we recommend you study our `LLM 2D Parallel Example (Llama 3) <https://github.com/Lightning-AI/pytorch-lightning/tree/master/examples/fabric/tensor_parallel>`_.
 
diff --git a/docs/source-pytorch/advanced/model_parallel/tp.rst b/docs/source-pytorch/advanced/model_parallel/tp.rst
@@ -6,7 +6,11 @@ Tensor parallelism is a technique for training large models by distributing laye
 However, for smaller models, the communication overhead may outweigh its benefits.
 This method is most effective for models with very large layers, significantly enhancing performance and memory efficiency.
 
-.. note:: This is an experimental feature.
+.. raw:: html
+
+    <a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning">
+      <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
+    </a>
 
 
 ----
@@ -215,9 +219,9 @@ When ``trainer.fit(...)`` (or ``validate()``, ``test``, etc.) gets called, the T
 
         trainer.print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e9:.02f} GB")
 
-|
+.. note:: Tensor Parallelism in PyTorch Lightning as well as PyTorch is experimental. The APIs may change in the future.
 
-When measuring the peak memory consumption, we should see that doubling the number of GPUs reduces the memory consuption roughly by half:
+When measuring the peak memory consumption, we should see that doubling the number of GPUs reduces the memory consumption roughly by half:
 
 
 .. list-table::
diff --git a/docs/source-pytorch/advanced/model_parallel/tp_fsdp.rst b/docs/source-pytorch/advanced/model_parallel/tp_fsdp.rst
@@ -7,7 +7,11 @@ This hybrid approach balances the trade-offs of each method, optimizing memory u
 
 The :doc:`Tensor Parallelism documentation <tp>` and a general understanding of `FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_ are a prerequisite for this tutorial.
 
-.. note:: This is an experimental feature.
+.. raw:: html
+
+    <a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning">
+      <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
+    </a>
 
 
 ----
@@ -190,7 +194,7 @@ Finally, the tensor parallelism will apply to each group, splitting the sharded
         trainer.print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e9:.02f} GB")
 
 
-|
+.. note:: 2D Parallelism in PyTorch Lightning as well as PyTorch is experimental. The APIs may change in the future.
 
 Beyond this toy example, we recommend you study our `LLM 2D Parallel Example (Llama 3) <https://github.com/Lightning-AI/pytorch-lightning/tree/master/examples/pytorch/tensor_parallel>`_.
 
diff --git a/examples/fabric/tensor_parallel/README.md b/examples/fabric/tensor_parallel/README.md
@@ -1,6 +1,6 @@
 ## Tensor Parallel and 2D Parallel
 
-This example shows how to apply tensor-parallelism to your model (here Llama 2 7B) with the `ModelParallelStrategy`, and how it can be combined with FSDP (2D parallelism).
+This example shows how to apply tensor-parallelism to your model (here Llama 3 7B) with the `ModelParallelStrategy`, and how it can be combined with FSDP (2D parallelism).
 PyTorch 2.3+ and a machine with at least 4 GPUs and 24 GB memory each are required to run this example.
 
 ```bash
diff --git a/examples/pytorch/tensor_parallel/README.md b/examples/pytorch/tensor_parallel/README.md
@@ -1,6 +1,6 @@
 ## Tensor Parallel and 2D Parallel
 
-This example shows how to apply tensor-parallelism to your model (here Llama 2 7B) with the `ModelParallelStrategy`, and how it can be combined with FSDP (2D parallelism).
+This example shows how to apply tensor-parallelism to your model (here Llama 3 7B) with the `ModelParallelStrategy`, and how it can be combined with FSDP (2D parallelism).
 PyTorch 2.3+ and a machine with at least 4 GPUs and 24 GB memory each are required to run this example.
 
 ```bash