[Perf] Add new npu_fused_infer_attention_score op to improve perfomance in splitfuse cases and resolve long-seq mask problems #8365
Artifacts
Produced during runtime
Name | Size | Digest | |
---|---|---|---|
vllm-ascend-ubuntu-24.04-arm-py3.11-wheel
|
573 KB |
sha256:dd658bc03b9c584e09ecb9da3380c37670f1b26d6c33382335dfa4f9f29e01e2
|
|
vllm-ascend-ubuntu-24.04-py3.11-wheel
|
582 KB |
sha256:4b6fab25c7491ee96abf5850e1d4bbcdb9f7b9ca9d30b83a38dea11f82b0704b
|
|