Skip to content

Conversation

linfeng-yuan
Copy link
Collaborator

… vLLM-Ascend

What this PR does / why we need it?

Update guidance on implementing and registering models for users. [Only documentation]

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Not involve codes and functions.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 9, 2025
@wangxiyuan
Copy link
Collaborator

mv this doc to developer guide

@linfeng-yuan linfeng-yuan force-pushed the model_docs branch 2 times, most recently from 20e817a to cc2016c Compare June 9, 2025 18:49
… vLLM-Ascend

Signed-off-by: linfeng-yuan <1102311262@qq.com>
@Yikun
Copy link
Collaborator

Yikun commented Jun 9, 2025

@shen-shanshan could help review.

This guide demonstrates how to integrate novel or customized models into vLLM-Ascend. For foundational concepts, it is highly recommended to refer to:
[Adding a New Model - vLLM Documentation](https://docs.vllm.ai/en/stable/contributing/model/)

### 1. Implementing Models Using PyTorch and Ascend Extension for PyTorch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the CI failure:

WARNING: Non-consecutive header level increase; H1 to H3 [myst.header]

Use ## before ###.

@shen-shanshan shen-shanshan self-assigned this Jun 10, 2025
- `*Model` (main architecture)
- `*DecoderLayer` (transformer block)
- `*Attention` & `*MLP` (specific computation unit)
`*` denotes your model's unique identifier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add this line into a NOTE block.



(4) **Attention Backend Integration**:
Import attention via `from vllm.attention import Attention` can automatically leverage vLLM-Ascend's attention backend routing (see: `get_attn_backend_cls()` in [vllm_ascend/platform.py](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/platform.py))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure adding a . at the end of each sentences.

Import attention via `from vllm.attention import Attention` can automatically leverage vLLM-Ascend's attention backend routing (see: `get_attn_backend_cls()` in [vllm_ascend/platform.py](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/platform.py))

(5) **Tensor Parallelism**:
Use vLLM's parallel layers (`ColumnParallelLinear`, `VocabParallelEmbedding`, etc.), but note Ascend-specific customizations implemented in [vllm_ascend/ops/](https://github.com/vllm-project/vllm-ascend/tree/main/vllm_ascend/ops) directory (RMSNorm, VocabParallelEmbedding, etc.).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a little bit confused, please make it clear of when to use vLLM's parallel layers and when to use ascend-specific customizations.


For a complete implementation reference, see: [vllm_ascend/models/deepseek_v2.py](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/models/deepseek_v2.py)

### 2. Registering Custom Models as Out-of-Tree Plugins in vLLM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registering Custom Out-of-Tree Models using ModelRegistry Plugins in vLLM

Comment on lines +200 to +216
**Key Note**
The first argument of `vllm.ModelRegistry.register_model()` indicates the unique architecture identifier which must match 'architectures' in `config.json` of the model.

```json
{
"architectures": [
"CustomModelForCausalLM"
],
}
```

```json
{
"architectures": [
"DeepseekV2ForCausalLM"
],
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add this line into a NOTE block.

If you're registering a novel model architecture not present in vLLM (creating a completely new class), current logs won't provide explicit confirmation by default. It's recommended to add the following logging statement at the end of the `register_model` method in [vllm/models\_executor/models/registry.py](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/registry.py) :

```
logger.warning(f"model_arch: {model_arch} has been registered here!")
Copy link
Collaborator

@shen-shanshan shen-shanshan Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use warning? I think use info may be better.

@shen-shanshan
Copy link
Collaborator

shen-shanshan commented Jun 10, 2025

@linfeng-yuan I think it's better to add more contents like:

  1. The mapping from torch ops to torch_npu ops, you can add the frequently used interfaces and offer link to more details like this.
  2. After adding a new model, we need to do functional test, benchmark and accuracy test, you can offer links of how to do these things.
  3. After adding a new model, we should also update our model support doc.
  4. Offer link of how to add a multi-modal model.

@wangxiyuan
Copy link
Collaborator

developer_guide/index should be updated as well

@wangxiyuan wangxiyuan mentioned this pull request Jun 17, 2025
40 tasks
@wangxiyuan
Copy link
Collaborator

@shen-shanshan it's good to create a new one with co-author, if the path is not updated for a long time.

@shen-shanshan
Copy link
Collaborator

@shen-shanshan it's good to create a new one with co-author, if the path is not updated for a long time.

OK.

wangxiyuan pushed a commit that referenced this pull request Jun 27, 2025
### What this PR does / why we need it?
Add guidance on how to implement and register new models.

Modified based on PR
#1126, thanks for the
contribution of @linfeng-yuan.

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
zhanghw0354 pushed a commit to zhanghw0354/vllm-ascend that referenced this pull request Jun 30, 2025
…project#1426)

### What this PR does / why we need it?
Add guidance on how to implement and register new models.

Modified based on PR
vllm-project#1126, thanks for the
contribution of @linfeng-yuan.

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: zhanghw0354 <zhanghaiwen_yewu@cmss.chinamobile.com>
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Jun 30, 2025
…project#1426)

### What this PR does / why we need it?
Add guidance on how to implement and register new models.

Modified based on PR
vllm-project#1126, thanks for the
contribution of @linfeng-yuan.

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants