-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[webgpu] Fix GatherBlockQuantized on Intel ADL/TGL platforms
#26526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The first commit is only the early draft to demonstrate the fixing. |
|
Thanks for the fix! LGTM |
onnxruntime/contrib_ops/webgpu/quantization/gather_block_quantized.cc
Outdated
Show resolved
Hide resolved
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
tested on tgl with qwen3-0.6b and int4 embeddings - works! |
|
Not sure why CI failed. Probably kick off a retry. |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
/azp run web_Release / build_onnxruntime_web,web_Debug / build_onnxruntime_web,Test Linux TensorRT x64 Release,Test Linux CUDA x64 Release |
|
No pipelines are associated with this pull request. |
Description
The
GatherBlockQuantizedoperation was using incorrectdata_indicesduring execution on Intel Alder Lake (ADL) and Tiger Lake (TGL) platforms.This change sets the proper
data_indices, resolving correctness issues encountered with the Phi-4-mini model on these architectures.Motivation and Context
See above.