Skip to content

Commit af90eb9

Browse files
committed
add xpu support
[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci update typos and bug fixes [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci xpu seeding PR1 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci add seeding for pytorch utilities mp_fabric xpu forking xpu multiprocess pytorch add header for xpu rename change to lightning.pytorch [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Teardown from lightning-xpu (from #PR- 3) From #3 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci add torch.xpu.stream to ddp update docs [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci update _LIGHTNING_XPU_AVAILABLE to _lightning_xpu_available correct fabric imports.py 1. remove xpu.py from _graveyard 2. correct _lightning_xpu_available() usage fix _try_import function not defined issue in fabric add docs [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci
1 parent 9d7bc82 commit af90eb9

File tree

28 files changed

+380
-59
lines changed

28 files changed

+380
-59
lines changed

docs/source-fabric/fundamentals/launch.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,8 +93,9 @@ This is essentially the same as running ``python path/to/your/script.py``, but i
9393
itself and are expected to be parsed there.
9494
9595
Options:
96-
--accelerator [cpu|gpu|cuda|mps|tpu]
96+
--accelerator [cpu|gpu|cuda|mps|tpu|xpu]
9797
The hardware accelerator to run on.
98+
Install Lightning-XPU to enable ``xpu``.
9899
--strategy [ddp|dp|deepspeed] Strategy for how to run across multiple
99100
devices.
100101
--devices TEXT Number of devices to run on (``int``), which

docs/source-pytorch/common/index.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
../advanced/model_parallel
1818
Train on single or multiple GPUs <../accelerators/gpu>
1919
Train on single or multiple HPUs <../integrations/hpu/index>
20+
Train on single or multiple XPUs <../integrations/xpu/index>
2021
Train on single or multiple IPUs <../accelerators/ipu>
2122
Train on single or multiple TPUs <../accelerators/tpu>
2223
Train on MPS <../accelerators/mps>
@@ -168,6 +169,13 @@ How-to Guides
168169
:col_css: col-md-4
169170
:height: 180
170171

172+
.. displayitem::
173+
:header: Train on single or multiple XPUs
174+
:description: Train models faster with XPU accelerators
175+
:button_link: ../integrations/xpu/index.html
176+
:col_css: col-md-4
177+
:height: 180
178+
171179
.. displayitem::
172180
:header: Train on single or multiple IPUs
173181
:description: Train models faster with IPU accelerators

docs/source-pytorch/common_usecases.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,13 @@ Customize and extend Lightning for things like custom hardware or distributed st
133133
:button_link: integrations/hpu/index.html
134134
:height: 100
135135

136+
.. displayitem::
137+
:header: Train on single or multiple XPUs
138+
:description: Train models faster with XPUs.
139+
:col_css: col-md-12
140+
:button_link: integrations/xpu/index.html
141+
:height: 100
142+
136143
.. displayitem::
137144
:header: Train on single or multiple IPUs
138145
:description: Train models faster with IPUs.

docs/source-pytorch/conf.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,11 @@ def _load_py_module(name: str, location: str) -> ModuleType:
9696
target_dir="docs/source-pytorch/integrations/hpu",
9797
checkout="tags/1.0.0",
9898
)
99+
assist_local.AssistantCLI.pull_docs_files(
100+
gh_user_repo="Lightning-AI/lightning-XPU",
101+
target_dir="docs/source-pytorch/integrations/xpu",
102+
checkout="tags/1.0.0",
103+
)
99104

100105
if not _FAST_DOCS_DEV:
101106
fetch_external_assets(
@@ -324,6 +329,7 @@ def _load_py_module(name: str, location: str) -> ModuleType:
324329
"torchmetrics": ("https://torchmetrics.readthedocs.io/en/stable/", None),
325330
"graphcore": ("https://docs.graphcore.ai/en/latest/", None),
326331
"habana": ("https://lightning-ai.github.io/lightning-Habana/", None),
332+
"intel-xpu": ("https://lightning-ai.github.io/lightning-XPU/", None),
327333
}
328334

329335
# -- Options for todo extension ----------------------------------------------

docs/source-pytorch/extensions/accelerator.rst

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Currently there are accelerators for:
1212
- :doc:`TPU <../accelerators/tpu>`
1313
- :doc:`IPU <../accelerators/ipu>`
1414
- :doc:`HPU <../integrations/hpu/index>`
15+
- :doc:`XPU <../integrations/xpu/index>`
1516
- :doc:`MPS <../accelerators/mps>`
1617

1718
The Accelerator is part of the Strategy which manages communication across multiple devices (distributed communication).
@@ -32,16 +33,16 @@ Create a Custom Accelerator
3233
.. warning:: This is an :ref:`experimental <versioning:Experimental API>` feature.
3334

3435
Here is how you create a new Accelerator.
35-
Let's pretend we want to integrate the fictional XPU accelerator and we have access to its hardware through a library
36-
``xpulib``.
36+
Let's pretend we want to integrate the fictional YPU accelerator and we have access to its hardware through a library
37+
``ypulib``.
3738

3839
.. code-block:: python
3940
40-
import xpulib
41+
import ypulib
4142
4243
43-
class XPUAccelerator(Accelerator):
44-
"""Support for a hypothetical XPU, optimized for large-scale machine learning."""
44+
class YPUAccelerator(Accelerator):
45+
"""Support for a hypothetical YPU, optimized for large-scale machine learning."""
4546
4647
@staticmethod
4748
def parse_devices(devices: Any) -> Any:
@@ -52,29 +53,29 @@ Let's pretend we want to integrate the fictional XPU accelerator and we have acc
5253
@staticmethod
5354
def get_parallel_devices(devices: Any) -> Any:
5455
# Here, convert the device indices to actual device objects
55-
return [torch.device("xpu", idx) for idx in devices]
56+
return [torch.device("ypu", idx) for idx in devices]
5657
5758
@staticmethod
5859
def auto_device_count() -> int:
5960
# Return a value for auto-device selection when `Trainer(devices="auto")`
60-
return xpulib.available_devices()
61+
return ypulib.available_devices()
6162
6263
@staticmethod
6364
def is_available() -> bool:
64-
return xpulib.is_available()
65+
return ypulib.is_available()
6566
6667
def get_device_stats(self, device: Union[str, torch.device]) -> Dict[str, Any]:
6768
# Return optional device statistics for loggers
6869
return {}
6970
7071
71-
Finally, add the XPUAccelerator to the Trainer:
72+
Finally, add the YPUAccelerator to the Trainer:
7273

7374
.. code-block:: python
7475
7576
from lightning.pytorch import Trainer
7677
77-
accelerator = XPUAccelerator()
78+
accelerator = YPUAccelerator()
7879
trainer = Trainer(accelerator=accelerator, devices=2)
7980
8081
@@ -90,28 +91,28 @@ If you wish to switch to a custom accelerator from the CLI without code changes,
9091

9192
.. code-block:: python
9293
93-
class XPUAccelerator(Accelerator):
94+
class YPUAccelerator(Accelerator):
9495
...
9596
9697
@classmethod
9798
def register_accelerators(cls, accelerator_registry):
9899
accelerator_registry.register(
99-
"xpu",
100+
"ypu",
100101
cls,
101-
description=f"XPU Accelerator - optimized for large-scale machine learning.",
102+
description=f"YPU Accelerator - optimized for large-scale machine learning.",
102103
)
103104
104105
Now, this is possible:
105106

106107
.. code-block:: python
107108
108-
trainer = Trainer(accelerator="xpu")
109+
trainer = Trainer(accelerator="ypu")
109110
110111
Or if you are using the Lightning CLI, for example:
111112

112113
.. code-block:: bash
113114
114-
python train.py fit --trainer.accelerator=xpu --trainer.devices=2
115+
python train.py fit --trainer.accelerator=ypu --trainer.devices=2
115116
116117
117118
----------

docs/source-pytorch/glossary/index.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
GPU <../accelerators/gpu>
1919
Half precision <../common/precision>
2020
HPU <../integrations/hpu/index>
21+
XPU <../integrations/xpu/index>
2122
Inference <../deploy/production_intermediate>
2223
IPU <../accelerators/ipu>
2324
Lightning CLI <../cli/lightning_cli>
@@ -159,6 +160,13 @@ Glossary
159160
:button_link: ../integrations/hpu/index.html
160161
:height: 100
161162

163+
.. displayitem::
164+
:header: XPU
165+
:description: Intel® Graphics Cards for faster training
166+
:col_css: col-md-12
167+
:button_link: ../integrations/xpu/index.html
168+
:height: 100
169+
162170
.. displayitem::
163171
:header: Inference
164172
:description: Making predictions by applying a trained model to unlabeled examples
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
.. _xpu:
2+
3+
Accelerator: XPU training
4+
=========================
5+
6+
.. raw:: html
7+
8+
<div class="display-card-container">
9+
<div class="row">
10+
11+
.. Add callout items below this line
12+
13+
.. displayitem::
14+
:header: Basic
15+
:description: Learn the basics of single and multi-XPU core training.
16+
:col_css: col-md-4
17+
:button_link: basic.html
18+
:height: 150
19+
:tag: basic
20+
21+
.. displayitem::
22+
:header: Intermediate
23+
:description: Enable state-of-the-art scaling with advanced mix-precision settings.
24+
:col_css: col-md-4
25+
:button_link: intermediate.html
26+
:height: 150
27+
:tag: intermediate
28+
29+
.. displayitem::
30+
:header: Advanced
31+
:description: Explore state-of-the-art scaling with additional advanced configurations.
32+
:col_css: col-md-4
33+
:button_link: advanced.html
34+
:height: 150
35+
:tag: advanced
36+
37+
.. raw:: html
38+
39+
</div>
40+
</div>
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
:orphan:
2+
3+
######################
4+
Level 19: Explore XPUs
5+
######################
6+
7+
Explore Intel® Graphics Cards (XPU) for model scaling.
8+
9+
----
10+
11+
.. raw:: html
12+
13+
<div class="display-card-container">
14+
<div class="row">
15+
16+
.. Add callout items below this line
17+
18+
.. displayitem::
19+
:header: Train models on XPUs
20+
:description: Learn the basics of single and multi-XPU core training.
21+
:col_css: col-md-6
22+
:button_link: ../integrations/xpu/basic.html
23+
:height: 150
24+
:tag: basic
25+
26+
.. displayitem::
27+
:header: Optimize models training on XPUs
28+
:description: Enable state-of-the-art scaling with advanced mixed-precision settings.
29+
:col_css: col-md-6
30+
:button_link: ../integrations/xpu/intermediate.html
31+
:height: 150
32+
:tag: intermediate
33+
34+
.. raw:: html
35+
36+
</div>
37+
</div>
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
11
# validation HPU connectors
22
lightning-habana >=0.1.0
33
lightning-graphcore >=0.1.0.rc4
4+
5+
# validation XPU connectors
6+
lightning-xpu >=0.1.0

src/lightning/fabric/accelerators/__init__.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,13 @@
2222

2323
ACCELERATOR_REGISTRY = _AcceleratorRegistry()
2424
_register_classes(ACCELERATOR_REGISTRY, "register_accelerators", sys.modules[__name__], Accelerator)
25+
26+
from lightning.fabric.utilities.imports import _lightning_xpu_available
27+
28+
_ACCELERATORS_BASE_MODULE = "lightning.fabric.accelerators"
29+
ACCELERATOR_REGISTRY = _AcceleratorRegistry()
30+
call_register_accelerators(ACCELERATOR_REGISTRY, _ACCELERATORS_BASE_MODULE)
31+
if _lightning_xpu_available() and "xpu" not in ACCELERATOR_REGISTRY:
32+
from lightning_xpu.fabric import XPUAccelerator
33+
34+
XPUAccelerator.register_accelerators(ACCELERATOR_REGISTRY)

0 commit comments

Comments
 (0)