[Perf] Add new npu_fused_infer_attention_score op to improve perfomance in splitfuse cases and resolve long-seq mask problems #12888
Job | Run time |
---|---|
5s | |
6m 36s | |
14m 43s | |
9m 10s | |
12m 2s | |
42m 36s |
Job | Run time |
---|---|
5s | |
6m 36s | |
14m 43s | |
9m 10s | |
12m 2s | |
42m 36s |