enable xpu in test_trainer #37774

yao-matrix · 2025-04-25T02:17:56Z

128 PASSED, 27 FAILED for lacking of optimizer_update_32bit xpu op in BNB, the implementation in under going, once done, will pass after PR merged to bnb.

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

github-actions · 2025-04-25T02:18:09Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

yao-matrix · 2025-04-25T02:24:23Z

src/transformers/testing_utils.py

@@ -2993,6 +3008,9 @@ def _device_agnostic_dispatch(device: str, dispatch_table: dict[str, Callable],
    BACKEND_EMPTY_CACHE["xpu"] = torch.xpu.empty_cache
    BACKEND_MANUAL_SEED["xpu"] = torch.xpu.manual_seed
    BACKEND_DEVICE_COUNT["xpu"] = torch.xpu.device_count
+    BACKEND_RESET_MAX_MEMORY_ALLOCATED["xpu"] = torch.xpu.reset_peak_memory_stats


xpu map reset_peak_memory_stats to RESET_MAX_MEMORY_ALLOCATED, since they have same functionality. CUDA's impl is same. The difference is CUDA keeps the reset_max_memory_allocated API for backward-compatibility, XPU doesn't expose this API.

yao-matrix · 2025-04-25T02:25:22Z

tests/trainer/test_trainer.py

@@ -1243,7 +1246,6 @@ def test_mixed_bf16(self):

        # will add more specific tests once there are some bugs to fix

-    @require_non_xpu
    @require_torch_gpu


require_torch_gpu is enough to not test on xpu.

yao-matrix · 2025-04-25T02:26:12Z

tests/trainer/test_trainer.py

@@ -3992,7 +3994,7 @@ def test_fp16_full_eval(self):
            # perfect world: fp32_init/2 == fp16_eval
            self.assertAlmostEqual(fp16_eval, fp32_init / 2, delta=5_000)

-    @require_non_xpu
+    @require_torch_gpu


this is a gpu only test, since it's related to nvfuser, using the correct decorator to reflect the truth

yao-matrix · 2025-04-25T02:27:21Z

@ydshieh, pls help review, thx.

Rocketknight1 · 2025-04-25T11:44:08Z

cc @IlyasMoutawwakil

enable xpu in test_trainer

9b0a97b

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

github-actions bot marked this pull request as draft April 25, 2025 02:18

fix style

78f01fc

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

yao-matrix commented Apr 25, 2025

View reviewed changes

yao-matrix marked this pull request as ready for review April 25, 2025 02:26

Merge branch 'main' into test-trainer-xpu

b001059

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable xpu in test_trainer #37774

enable xpu in test_trainer #37774

yao-matrix commented Apr 25, 2025 •

edited

Loading

github-actions bot commented Apr 25, 2025

yao-matrix Apr 25, 2025

yao-matrix Apr 25, 2025

yao-matrix Apr 25, 2025

yao-matrix commented Apr 25, 2025

Rocketknight1 commented Apr 25, 2025

enable xpu in test_trainer #37774

Are you sure you want to change the base?

enable xpu in test_trainer #37774

Conversation

yao-matrix commented Apr 25, 2025 • edited Loading

github-actions bot commented Apr 25, 2025

yao-matrix Apr 25, 2025

Choose a reason for hiding this comment

yao-matrix Apr 25, 2025

Choose a reason for hiding this comment

yao-matrix Apr 25, 2025

Choose a reason for hiding this comment

yao-matrix commented Apr 25, 2025

Rocketknight1 commented Apr 25, 2025

yao-matrix commented Apr 25, 2025 •

edited

Loading