Skip to content

Commit c1c07e5

Browse files
author
Andrey Cheptsov
committed
Move AMD cluster note to training section
1 parent 96e6162 commit c1c07e5

1 file changed

Lines changed: 7 additions & 5 deletions

File tree

examples/accelerators/amd/README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -98,11 +98,12 @@ Here are examples of a [service](https://dstack.ai/docs/services) that deploy
9898

9999
To request multiple GPUs, specify the quantity after the GPU name, separated by a colon, e.g., `MI300X:4`.
100100

101-
If you're using multiple AMD nodes, validate cluster networking with the
102-
[NCCL/RCCL tests](https://dstack.ai/examples/clusters/nccl-rccl-tests/) example.
103-
104101
## Fine-tuning
105102

103+
If you're planning multi-node AMD training, validate cluster networking first
104+
with the [NCCL/RCCL tests](https://dstack.ai/examples/clusters/nccl-rccl-tests/)
105+
example.
106+
106107
=== "TRL"
107108

108109
Below is an example of LoRA fine-tuning Llama 3.1 8B using [TRL](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.html)
@@ -234,8 +235,9 @@ $ dstack apply -f <configuration file>
234235
[Axolotl](https://github.com/ROCm/rocm-blogs/tree/release/blogs/artificial-intelligence/axolotl),
235236
[TRL](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/fine-tuning-and-inference.html),
236237
and [ROCm Bitsandbytes](https://github.com/ROCm/bitsandbytes)
237-
2. Run [NCCL/RCCL tests](https://dstack.ai/examples/clusters/nccl-rccl-tests/)
238-
to validate multi-node AMD cluster networking.
238+
2. For multi-node training, run
239+
[NCCL/RCCL tests](https://dstack.ai/examples/clusters/nccl-rccl-tests/)
240+
to validate AMD cluster networking.
239241
3. Check [dev environments](https://dstack.ai/docs/dev-environments),
240242
[tasks](https://dstack.ai/docs/tasks), and
241243
[services](https://dstack.ai/docs/services).

0 commit comments

Comments
 (0)