[Perf] Add new npu_fused_infer_attention_score op to improve perfomance in splitfuse cases and resolve long-seq mask problems #1282
vllm_ascend_test_pd.yaml
on: pull_request
Matrix: vLLM Ascend prefilling decoding disaggregation test
Waiting for pending jobs
Annotations
1 error
e2e test / pd-disaggregation
Canceling since a higher priority waiting request for static-8-01-cards exists
|