-
Notifications
You must be signed in to change notification settings - Fork 281
Description
Overview
In our roadmap, we plan to support guided decoding in 2025 Q1 as shown here (#71).
Currently:
I have tested vllm/examples/offline_inference/structured_outputs.py
directly on NPU device and the experiment results showed that guided decoding is natively supported on NPU with outlines
backend.
Plus, I have analysed the code in vLLM and have found that the tensors related to guide logits computation are all on npu
device, which have also demonstrated that guided decoding is natively supported on NPU.
However, there are still some problems need to be fixed, such as incomplete json output and inference speed is too slow.
Feel free to feedback your issues when using guided decoding with vllm-ascend, and we will try to fix them if we can.
Usage
Coming soon ...
Roadmap
Community news
- [RFC][core][V1] generalize structured output manager and backends vllm#17503
- [V1] Add
structural_tag
support using xgrammar vllm#17085 - [V0][V1][Core] Add outlines integration for V1, and update V0 integration. vllm#15975
- [V1][Experimental] Jump-forward decoding vllm#15490
- [V1][Feature] Enable Speculative Decoding with Structured Outputs vllm#14702
- Add support for xgrammar backend on aarch64:
- Add support for reasoning model (DeepSeek-R1):
Adaptation for vllm-ascend
- [5/N] Refactor for structured output module:
- [V1][Structured Output] Minor modification to
_validate_structured_output()
vllm#16748 - [V1][Structured Output] Move xgrammar related utils to
backend_xgrammar.py
vllm#16578 - [V1][Platform] Remove
supports_structured_output()
in platform #531 - [V1][Platform] Add
supports_structured_output()
method to Platform #475 - [V1][Structured Output] Add
supports_structured_output()
method to Platform vllm#16148
- [V1][Structured Output] Minor modification to
- [2/N] Bugfix for xgrammar backend:
- [1/N] Bugfix for guidance backend: