[Perf] Add new npu_fused_infer_attention_score op to improve perfomance in splitfuse cases and resolve long-seq mask problems #8129
Artifacts
Produced during runtime
Name | Size | Digest | |
---|---|---|---|
vllm-ascend-ubuntu-24.04-arm-py3.11-wheel
|
526 KB |
sha256:62909b2ad2ec6b9dea0320a566013e6651da7a5621b2101b3a6d56f09442cac6
|
|
vllm-ascend-ubuntu-24.04-py3.11-wheel
|
535 KB |
sha256:b158ccccbd0b721b133159b53e806c8755def82d2da27f6707e88600781fbba8
|
|