Skip to content

[RFC]: Custom Ascendc Kernel Of 'Prepare Input' in Multi-Step Feature. #807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wonderful199082 opened this issue May 11, 2025 · 1 comment
Labels
RFC Request For Comments

Comments

@wonderful199082
Copy link

Motivation.

In the current implementation of vLLM_Ascend V0 Engine, the advance_step function in attention.py contains a section of Python-based logic that handles the update of input_tokens, seq_lens, input_positions, and slot_mapping.

This logic was marked with a clear TODO:

# TODO optimize these codes using ascendc just like flash attention backend using cuda

indicating an explicit need for optimization using custom operators.

Proposed Change.

This RFC proposes to replace the above Python logic with a highly optimized custom operator implemented in AscendC, designed to execute directly on the NPU for improved efficiency in multi-step decoding scenarios.

The logic covered by this operator includes:

  • Updating model_input.input_tokens
  • Updating model_input.input_positions
  • Incrementing and updating seq_lens_tensor
  • Computing slot_mapping using block_tables

Feedback Period.

This RFC will be open for feedback until [2025-05-18], which is one week from the initial submission date.

Please leave your comments, questions, or suggestions before this date. The author will address all feedback and revise the proposal accordingly if needed.

CC List.

@Yikun @wangxiyuan

Any Other Things.

No response

@wonderful199082 wonderful199082 added the RFC Request For Comments label May 11, 2025
@wangxiyuan
Copy link
Collaborator

Nice, welcome for the contribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request For Comments
Projects
None yet
Development

No branches or pull requests

2 participants