-
Notifications
You must be signed in to change notification settings - Fork 454
[docs] Update guidance on how to implement and register new models #1126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mv this doc to developer guide |
20e817a
to
cc2016c
Compare
… vLLM-Ascend Signed-off-by: linfeng-yuan <1102311262@qq.com>
@shen-shanshan could help review. |
This guide demonstrates how to integrate novel or customized models into vLLM-Ascend. For foundational concepts, it is highly recommended to refer to: | ||
[Adding a New Model - vLLM Documentation](https://docs.vllm.ai/en/stable/contributing/model/) | ||
|
||
### 1. Implementing Models Using PyTorch and Ascend Extension for PyTorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the CI failure:
WARNING: Non-consecutive header level increase; H1 to H3 [myst.header]
Use ##
before ###
.
- `*Model` (main architecture) | ||
- `*DecoderLayer` (transformer block) | ||
- `*Attention` & `*MLP` (specific computation unit) | ||
`*` denotes your model's unique identifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add this line into a NOTE
block.
|
||
|
||
(4) **Attention Backend Integration**: | ||
Import attention via `from vllm.attention import Attention` can automatically leverage vLLM-Ascend's attention backend routing (see: `get_attn_backend_cls()` in [vllm_ascend/platform.py](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/platform.py)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure adding a .
at the end of each sentences.
Import attention via `from vllm.attention import Attention` can automatically leverage vLLM-Ascend's attention backend routing (see: `get_attn_backend_cls()` in [vllm_ascend/platform.py](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/platform.py)) | ||
|
||
(5) **Tensor Parallelism**: | ||
Use vLLM's parallel layers (`ColumnParallelLinear`, `VocabParallelEmbedding`, etc.), but note Ascend-specific customizations implemented in [vllm_ascend/ops/](https://github.com/vllm-project/vllm-ascend/tree/main/vllm_ascend/ops) directory (RMSNorm, VocabParallelEmbedding, etc.). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a little bit confused, please make it clear of when to use vLLM's parallel layers and when to use ascend-specific customizations.
|
||
For a complete implementation reference, see: [vllm_ascend/models/deepseek_v2.py](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/models/deepseek_v2.py) | ||
|
||
### 2. Registering Custom Models as Out-of-Tree Plugins in vLLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registering Custom Out-of-Tree Models using ModelRegistry Plugins in vLLM
**Key Note** | ||
The first argument of `vllm.ModelRegistry.register_model()` indicates the unique architecture identifier which must match 'architectures' in `config.json` of the model. | ||
|
||
```json | ||
{ | ||
"architectures": [ | ||
"CustomModelForCausalLM" | ||
], | ||
} | ||
``` | ||
|
||
```json | ||
{ | ||
"architectures": [ | ||
"DeepseekV2ForCausalLM" | ||
], | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add this line into a NOTE
block.
If you're registering a novel model architecture not present in vLLM (creating a completely new class), current logs won't provide explicit confirmation by default. It's recommended to add the following logging statement at the end of the `register_model` method in [vllm/models\_executor/models/registry.py](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/registry.py) : | ||
|
||
``` | ||
logger.warning(f"model_arch: {model_arch} has been registered here!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use warning
? I think use info
may be better.
@linfeng-yuan I think it's better to add more contents like:
|
developer_guide/index should be updated as well |
@shen-shanshan it's good to create a new one with co-author, if the path is not updated for a long time. |
OK. |
### What this PR does / why we need it? Add guidance on how to implement and register new models. Modified based on PR #1126, thanks for the contribution of @linfeng-yuan. --------- Signed-off-by: shen-shanshan <467638484@qq.com>
…project#1426) ### What this PR does / why we need it? Add guidance on how to implement and register new models. Modified based on PR vllm-project#1126, thanks for the contribution of @linfeng-yuan. --------- Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: zhanghw0354 <zhanghaiwen_yewu@cmss.chinamobile.com>
…project#1426) ### What this PR does / why we need it? Add guidance on how to implement and register new models. Modified based on PR vllm-project#1126, thanks for the contribution of @linfeng-yuan. --------- Signed-off-by: shen-shanshan <467638484@qq.com>
… vLLM-Ascend
What this PR does / why we need it?
Update guidance on implementing and registering models for users. [Only documentation]
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Not involve codes and functions.