-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
vLLM FP8 quantized support for SFT/GRPO #3414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
unsloth/kernels/fp8.py
Outdated
del W_deq | ||
return grad_X, None, None | ||
|
||
@torch.compile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if torch.compile(fullgraph = True, dynamic = True)
works better.
Also try using:
from unsloth_zoo.temporary_patches.common import torch_compile_options, torch_compile
@torch_compile
def ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if perf changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed no performance difference between the three when trying out Qwen3-8B between any of the 3
if weight_fake_quantizer is not None: | ||
W = weight_fake_quantizer(W) | ||
|
||
W_quant = next((x for x in [getattr(W, "quant_state", None), getattr(base_layer, "weight_scale_inv", None), getattr(base_layer, "weight_scale", None)] if x is not None), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very smart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh best to make an if elif to make it faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only worry is someone mistakenly changing when I add if..else
cuz if tensor
would fail when tensor exists.
one needs to explicitly do if tensor is not None
or something like that
I thought this is a safer way to let people continue this/avoid that
But can change if you feel its better that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok its fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work
if weight_fake_quantizer is not None: | ||
W = weight_fake_quantizer(W) | ||
|
||
W_quant = next((x for x in [getattr(W, "quant_state", None), getattr(base_layer, "weight_scale_inv", None), getattr(base_layer, "weight_scale", None)] if x is not None), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok its fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes left
Depends on unslothai/unsloth-zoo#313