Skip to content

Commit b4b7f0d

Browse files
committed
doc: Remove suggestion to build extensions in parallel
As the extensions share the build folder building them in parallel can cause failures or wrong results due to extensions overwriting the files of other extensions. Signed-off-by: Alexander Grund <alexander.grund@tu-dresden.de>
1 parent b6346bf commit b4b7f0d

File tree

2 files changed

+3
-11
lines changed

2 files changed

+3
-11
lines changed

docs/_tutorials/advanced-install.md

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -73,21 +73,13 @@ Available `DS_BUILD` options include:
7373
* `DS_BUILD_TRANSFORMER_INFERENCE` builds the transformer-inference op.
7474
* `DS_BUILD_STOCHASTIC_TRANSFORMER` builds the stochastic transformer op.
7575

76-
To speed up the build-all process, you can parallelize the compilation process with:
77-
78-
```bash
79-
DS_BUILD_OPS=1 pip install deepspeed --global-option="build_ext" --global-option="-j8"
80-
```
81-
82-
This should complete the full build 2-3 times faster. You can adjust `-j` to specify how many cpu-cores are to be used during the build. In the example it is set to 8 cores.
83-
8476
You can also build a binary wheel and install it on multiple machines that have the same type of GPUs and the same software environment (CUDA toolkit, PyTorch, Python, etc.)
8577

8678
```bash
87-
DS_BUILD_OPS=1 python -m build --wheel --no-isolation --config-setting="--build-option=build_ext" --config-setting="--build-option=-j8"
79+
DS_BUILD_OPS=1 python -m build --wheel --no-isolation"
8880
```
8981
90-
This will create a pypi binary wheel under `dist`, e.g., ``dist/deepspeed-0.3.13+8cd046f-cp38-cp38-linux_x86_64.whl`` and then you can install it directly on multiple machines, in our example:
82+
This will create a PyPI binary wheel under `dist`, e.g., `dist/deepspeed-0.3.13+8cd046f-cp38-cp38-linux_x86_64.whl`, and then you can install it directly on multiple machines, in our example:
9183
9284
```bash
9385
pip install dist/deepspeed-0.3.13+8cd046f-cp38-cp38-linux_x86_64.whl

docs/_tutorials/ds4sci_evoformerattention.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ tags: training inference
1717

1818
`DS4Sci_EvoformerAttention` is implemented based on [CUTLASS](https://github.com/NVIDIA/cutlass). You need to clone the CUTLASS repository and specify the path to it in the environment variable `CUTLASS_PATH`.
1919
CUTLASS setup detection can be ignored by setting ```CUTLASS_PATH="DS_IGNORE_CUTLASS_DETECTION"```, which is useful if you have a well setup compiler (e.g., compiling in a conda package with cutlass and the cuda compilers installed).
20-
CUTLASS location can be automatically inferred using pypi's [nvidia-cutlass](https://pypi.org/project/nvidia-cutlass/) package by setting ```CUTLASS_PATH="DS_USE_CUTLASS_PYTHON_BINDINGS"```. Note that this is discouraged as ```nvidia-cutlass``` is not maintained anymore and outdated.
20+
CUTLASS location can be automatically inferred using PyPI's [nvidia-cutlass](https://pypi.org/project/nvidia-cutlass/) package by setting ```CUTLASS_PATH="DS_USE_CUTLASS_PYTHON_BINDINGS"```. Note that this is discouraged as ```nvidia-cutlass``` is not maintained anymore and outdated.
2121

2222
You can always simply clone cutlass and setup ```CUTLASS_PATH```:
2323
```shell

0 commit comments

Comments
 (0)