Significant inference speed difference between vllm-ascend and MindIE

### Your current environment

Environment: Two machines, each with 8 * Ascend910B-64GB GPUs.

MindIE image: mindie_2.0.T18.B010-800I-A2-py3.11-openeuler24.03-lts-aarch64

Vllm-ascend image: 0.8.5rc1-torch_npu2.5.1-cann8.1.rc1-python3.10-oe2203lts

Model weights: DeepSeek-V3-0324-w8a8-modelers (https://modelers.cn/models/Modelers_Park/DeepSeek-V3-0324-w8a8/tree/main)


### Describe the problem

When testing inference speeds using different inference frameworks. 

Inference speeds observed:
vllm-ascend: Approximately 1.4 Tokens/s
MindIE: Approximately 14 Tokens/s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Significant inference speed difference between vllm-ascend and MindIE #1621

Your current environment

Describe the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Significant inference speed difference between vllm-ascend and MindIE #1621

Description

Your current environment

Describe the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions