vLLM FP8 quantized support for SFT/GRPO #3414

Datta0 · 2025-10-06T07:15:35Z

Depends on unslothai/unsloth-zoo#313

unsloth/kernels/fp8.py

danielhanchen · 2025-10-14T13:10:43Z

unsloth/kernels/fp8.py

+        del W_deq
+        return grad_X, None, None
+
+@torch.compile


Can you check if torch.compile(fullgraph = True, dynamic = True) works better.

Also try using:

from unsloth_zoo.temporary_patches.common import torch_compile_options, torch_compile @torch_compile def ...

See if perf changes

I noticed no performance difference between the three when trying out Qwen3-8B between any of the 3

unsloth/kernels/fp8.py

unsloth/kernels/utils.py

danielhanchen · 2025-10-14T13:12:36Z

unsloth/kernels/utils.py

    if weight_fake_quantizer is not None:
        W = weight_fake_quantizer(W)

+    W_quant = next((x for x in [getattr(W, "quant_state", None), getattr(base_layer, "weight_scale_inv", None), getattr(base_layer, "weight_scale", None)] if x is not None), None)


Tbh best to make an if elif to make it faster

My only worry is someone mistakenly changing when I add if..else cuz if tensor would fail when tensor exists.
one needs to explicitly do if tensor is not None or something like that
I thought this is a safer way to let people continue this/avoid that

But can change if you feel its better that way

Ok its fine

unsloth/kernels/utils.py

danielhanchen

Nice work

unsloth/kernels/fp8.py

danielhanchen · 2025-10-15T05:57:02Z

unsloth/kernels/utils.py

    if weight_fake_quantizer is not None:
        W = weight_fake_quantizer(W)

+    W_quant = next((x for x in [getattr(W, "quant_state", None), getattr(base_layer, "weight_scale_inv", None), getattr(base_layer, "weight_scale", None)] if x is not None), None)


Ok its fine

unsloth/kernels/fp8.py

danielhanchen

Some changes left

Datta0 added 13 commits October 1, 2025 06:07

Prefer loading model from pretrained instead of config

6468032

Fixup FP8 forward pass and inference

e3184a3

[WIP] Fix lora forwards

6ef6884

Infer block size from weight shapes

51d6626

reconstruct weights from fp8 quants for lora matmul

9888e87

Return weight transpose and fix dtype

91db140

Refactor FP8 operations

bff4612

Fix naming :)

fb1849c

Saner compile

85791f3

do not depend on transformers

4a4f7e2

[WIP] fix training

0b93d94

Update comment

fb61bf6

fixup training

039fa9d

Datta0 mentioned this pull request Oct 6, 2025

vLLM FP8 quantized support for SFT/GRPO unslothai/unsloth-zoo#313

Merged

Datta0 added 11 commits October 8, 2025 07:55

Merge remote-tracking branch 'origin/main' into vllm_fp8

5d6e10b

use dequant kernel from deepseek

182e3ce

Differentiate between fp8 and fbgemmfp8

c8e7261

fixup differentiation b/w fp8 and fbgemm_fp8

a3a0a3d

make inputs contiguous if required

5603730

Improve dequant

bfb45b1

More robust handling

3f277fa

Fixup backward pass for fbgemm_fp8

dc4c855

refactor and use bf16 for dequant

5b7d755

Use torch fp8 block matmul

da7d3f9

Disable torch block matmul for now

5af9f62

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/utils.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/utils.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 14, 2025

View reviewed changes

unsloth/kernels/utils.py Outdated Show resolved Hide resolved

danielhanchen requested changes Oct 14, 2025

View reviewed changes

safer import and cosmetics

5e90163

danielhanchen reviewed Oct 15, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen requested changes Oct 15, 2025

View reviewed changes

Datta0 added 2 commits October 15, 2025 06:51

more cosmectics

80a0449

add torchao operations

dd4bf13

Datta0 changed the title ~~vLLM FP8-E4M3 block quantized support~~ vLLM FP8 quantized support for SFT/GRPO Oct 15, 2025

danielhanchen reviewed Oct 16, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 16, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 16, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 16, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 16, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 16, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 16, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen reviewed Oct 16, 2025

View reviewed changes

unsloth/kernels/fp8.py Outdated Show resolved Hide resolved

danielhanchen requested changes Oct 16, 2025

View reviewed changes

Spaceeeeeee

82c8eef

danielhanchen merged commit 092418f into unslothai:main Oct 16, 2025

Uh oh!

vLLM FP8 quantized support for SFT/GRPO #3414

vLLM FP8 quantized support for SFT/GRPO #3414

Conversation

Datta0 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Datta0 commented Oct 6, 2025 •

edited

Loading