doc: Remove suggestion to build extensions in parallel

Flamefire · Flamefire · commit b4b7f0d771ef · 2026-03-12T17:01:04.000+01:00
As the extensions share the build folder building them in parallel can
cause failures or wrong results due to extensions overwriting
the files of other extensions.

Signed-off-by: Alexander Grund &lt;alexander.grund@tu-dresden.de&gt;
diff --git a/docs/_tutorials/advanced-install.md b/docs/_tutorials/advanced-install.md
@@ -73,21 +73,13 @@ Available `DS_BUILD` options include:
 * `DS_BUILD_TRANSFORMER_INFERENCE` builds the transformer-inference op.
 * `DS_BUILD_STOCHASTIC_TRANSFORMER` builds the stochastic transformer op.
 
-To speed up the build-all process, you can parallelize the compilation process with:
-
-```bash
-DS_BUILD_OPS=1 pip install deepspeed --global-option="build_ext" --global-option="-j8"
-```
-
-This should complete the full build 2-3 times faster. You can adjust `-j` to specify how many cpu-cores are to be used during the build. In the example it is set to 8 cores.
-
 You can also build a binary wheel and install it on multiple machines that have the same type of GPUs and the same software environment (CUDA toolkit, PyTorch, Python, etc.)
 
 ```bash
-DS_BUILD_OPS=1 python -m build --wheel --no-isolation --config-setting="--build-option=build_ext" --config-setting="--build-option=-j8"
+DS_BUILD_OPS=1 python -m build --wheel --no-isolation"
 ```
 
-This will create a pypi binary wheel under `dist`, e.g., ``dist/deepspeed-0.3.13+8cd046f-cp38-cp38-linux_x86_64.whl`` and then you can install it directly on multiple machines, in our example:
+This will create a PyPI binary wheel under `dist`, e.g., `dist/deepspeed-0.3.13+8cd046f-cp38-cp38-linux_x86_64.whl`, and then you can install it directly on multiple machines, in our example:
 
 ```bash
 pip install dist/deepspeed-0.3.13+8cd046f-cp38-cp38-linux_x86_64.whl
diff --git a/docs/_tutorials/ds4sci_evoformerattention.md b/docs/_tutorials/ds4sci_evoformerattention.md
@@ -17,7 +17,7 @@ tags: training inference
 
 `DS4Sci_EvoformerAttention` is implemented based on [CUTLASS](https://github.com/NVIDIA/cutlass). You need to clone the CUTLASS repository and specify the path to it in the environment variable `CUTLASS_PATH`.
 CUTLASS setup detection can be ignored by setting ```CUTLASS_PATH="DS_IGNORE_CUTLASS_DETECTION"```, which is useful if you have a well setup compiler (e.g., compiling in a conda package with cutlass and the cuda compilers installed).
-CUTLASS location can be automatically inferred using pypi's [nvidia-cutlass](https://pypi.org/project/nvidia-cutlass/) package by setting ```CUTLASS_PATH="DS_USE_CUTLASS_PYTHON_BINDINGS"```. Note that this is discouraged as ```nvidia-cutlass``` is not maintained anymore and outdated.
+CUTLASS location can be automatically inferred using PyPI's [nvidia-cutlass](https://pypi.org/project/nvidia-cutlass/) package by setting ```CUTLASS_PATH="DS_USE_CUTLASS_PYTHON_BINDINGS"```. Note that this is discouraged as ```nvidia-cutlass``` is not maintained anymore and outdated.
 
 You can always simply clone cutlass and setup ```CUTLASS_PATH```:
 ```shell