[bugfix][torchair] fix wasted NPU memory buffer allocation for quantized deepseek with unquantized MTP layer #616
vllm_ascend_test_full.yaml
on: pull_request
changes
5s
Matrix: multicard e2e test - full
Matrix: singlecard e2e test - full
Annotations
1 error
test-full
Canceling since a higher priority waiting request for test-full-refs/pull/3068/merge exists
|